Scale AIposted 1 day ago
$160,000 - $225,600/Yr
Full-time • Mid Level
San Francisco, CA

About the position

Scale is looking for an AI/ML Infrastructure Engineer to join our Machine Learning Infrastructure team to build out our Training Platform. You will partner closely with Machine Learning researchers to understand their requirements and apply your own domain expertise and our compute resources to accelerate experimentation throughput. The ideal candidate is someone who has strong fundamentals in machine learning, backend system design, and has prior ML Infrastructure experience. You should also be comfortable with infrastructure and large scale system design, as well as diagnosing both model performance and system failures.

Responsibilities

  • Build highly available, observable, performant, and cost-effective APIs for model training.
  • Participate in our team’s on call process to ensure the availability of our services.
  • Own projects end-to-end, from requirements, scoping, design, to implementation, in a highly collaborative and cross-functional environment.
  • Exercise good taste in building systems and tools and know when to make build vs. buy tradeoffs, with an eye for cost efficiency.

Requirements

  • 4+ years of experience building machine learning training pipelines or inference services in a production setting.
  • Experience with distributed training techniques such as DeepSpeed, FSDP, etc.
  • Experience building, deploying, and monitoring complex microservice architectures.
  • Experience with Python, Docker, Kubernetes, and Infrastructure as code (e.g. terraform).

Nice-to-haves

  • Experience with LLM inference latency optimization techniques, e.g. kernel fusion, quantization, dynamic batching, etc.
  • Experience working with a cloud technology stack (eg. AWS or GCP).

Benefits

  • Comprehensive health, dental and vision coverage
  • Retirement benefits
  • Learning and development stipend
  • Generous PTO
  • Commuter stipend (may be eligible)
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service