ML Compiler and Performance Engineer

MetaBellevue, WA
1d$74 - $217,000

About The Position

Meta's Training and Inference Accelerators (MTIA) team is developing novel HW to enable efficient execution of AI training and inference workloads. In this role, you will have end-to-end responsibility for the performance of in-production AI models in their transition from stock HW to MTIA chips, with a focus on models that require multi-node compute. To learn more about MTIA, explore the links below: - https://ai.meta.com/blog/meta-training-inference-accelerator-AI-MTIA/ - https://ai.meta.com/blog/next-generation-meta-training-inference-accelerator-AI-MTIA/ - https://dl.acm.org/doi/full/10.1145/3695053.3731409

Requirements

  • Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience
  • Experience developing and deploying optimizations at the level of PyTorch/Aten or comparable stacks

Nice To Haves

  • A Masters degree and 4+ years in-domain experience
  • A PhD degree and 2+ years in-domain experience
  • Experience optimizing multi-node distributed compute
  • Experience optimizing runtimes and/or kernels for accelerator platforms

Responsibilities

  • Identifying bottlenecks and quantifying opportunities for improving performance
  • In-depth, end-to-end performance analysis and reporting
  • Developing optimizations to address identified bottlenecks
  • Optimizing compute/communication overlap
  • Work closely with other compiler teams as well as client teams (Recommendation Systems, Generative AI, etc)
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service