Staff Machine Learning Engineer, AI Platform

General MotorsSunnyvale, CA
138d$165,000 - $253,400Remote

About The Position

We are seeking an experienced, technical oriented, impact delivering-driven expert in ML Training Infrastructure with a strong ability to execute hands-on technical work. In this role, you will be responsible for designing and building scalable, reliable, and high-performance AI/ML platform infrastructure to support advanced AI research and model development initiatives. As a Staff ML System Engineer, you will collaborate closely with machine learning engineers, research scientists, and other partners to develop state-of-the-art AI solutions that enable the future of intelligent driving technologies across General Motors vehicles.

Requirements

  • Bachelors or higher degree in Computer Science or equivalent major or equivalent experience.
  • 7+ years professional software engineering experience.
  • 3+ years specialized experience in AI/ML infrastructure, e.g., enabling distributed training for scaling large ML models.
  • Strong programming skills in Python, with proficiency in frameworks such as PyTorch (preferred), TensorFlow, or similar.
  • Experience with distributed computing, GPU computing, and cloud environments (AWS, GCP, Azure).
  • Willingness to travel to Sunnyvale, CA as needed.
  • Comfortable working in highly ambiguous and dynamic environments.

Nice To Haves

  • Self-motivated, strong execution, impact-delivering oriented.
  • Extensive knowledge and experience with PyTorch 2.x+ and distributed training framework.
  • Experience with design and development of training framework that supports FSDP, Pipeline Parallelism and other scalable solutions to training large foundational models.
  • Experience with profiling, analysis, debugging and optimizing training and dataloading performance.
  • Excellent communication skills to resolve controversial, make consensus, communicate risks and give constructive feedback.

Responsibilities

  • Lead the design and development of scalable, reliable, high-performance ML framework to support model training at scale.
  • Lead model training performance analysis and optimization solutions to scale distributed training workflows and maximize resource utilization across heterogeneous hardware environments, and save cost.
  • Raise the bar on system observability, debuggability, and operational excellence, and user experience.
  • Collaborate with cross-functional teams to integrate new features and technologies into the platform.

Benefits

  • Medical, dental, vision insurance.
  • Health Savings Account.
  • Flexible Spending Accounts.
  • Retirement savings plan.
  • Sickness and accident benefits.
  • Life insurance.
  • Paid vacation & holidays.
  • Tuition assistance programs.
  • Employee assistance program.
  • GM vehicle discounts.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Senior

Industry

Transportation Equipment Manufacturing

Education Level

Bachelor's degree

Number of Employees

5,001-10,000 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service