About The Position

XPENG is a leading smart technology company at the forefront of innovation, integrating advanced AI and autonomous driving technologies into its vehicles, including electric vehicles (EVs), electric vertical take-off and landing (eVTOL) aircraft, and robotics. With a strong focus on intelligent mobility, XPENG is dedicated to reshaping the future of transportation through cutting-edge R&D in AI, machine learning, and smart connectivity. The Mission: The challenge of Vision-Language-Action (VLA) models and Foundation Models isn't just their intelligence—it's their real-time execution at the edge. We are seeking a high-caliber Staff Machine Learning Engineer to bridge the gap between massive research models and production-ready L4 autonomous driving systems. You will lead the effort to optimize and deploy our VLA models onto vehicle-grade compute platforms for our global fleet.

Requirements

  • Proven Track Record: 5-8 years of experience in model deployment, quantization, or high-performance computing (HPC).
  • Core Technical Skills: Mastery of Modern C++ and deep experience with CUDA or other hardware acceleration libraries.
  • Deep Learning Expertise: Strong familiarity with PyTorch and deep knowledge of inference engines like TensorRT, ONNX Runtime, or TVM.
  • Quantization Depth: Hands-on experience with INT8/FP8/INT4 quantization and knowledge of the unique challenges in quantizing Large Language Models (LLMs) or Transformers.
  • Platform Knowledge: Solid understanding of computer architecture (Cache, Memory Bandwidth, SIMD) and experience with embedded/edge compute constraints.
  • Systems Thinking: Ability to debug complex performance bottlenecks across the entire software stack.

Nice To Haves

  • Experience with VLA/VLM or other Foundation Model deployment.
  • Background in autonomous driving, robotics, or real-time safety-critical systems.
  • Contributions to open-source inference or compiler projects.

Responsibilities

  • Lead Optimization Strategy: Own the end-to-end quantization and optimization roadmap for large-scale multimodal models (Transformers, VLMs).
  • Model Compression: Apply and innovate in PTQ (Post-Training Quantization), QAT (Quantization-Aware Training), and pruning techniques to fit VLA models into strict memory and power envelopes.
  • Hardware-Software Co-design: Collaborate directly with model researchers to ensure architectures are "deployment-friendly" and with platform teams to influence future hardware requirements.
  • Production Excellence: Develop and maintain robust, safety-critical deployment stacks in Modern C++, ensuring 24/7 stability and deterministic performance on the road.

Benefits

  • A fun, supportive and engaging environment
  • Infrastructures and computational resources to support your ML model development/research.
  • Opportunity to work on cutting edge technologies with the top talent in the field.
  • Opportunity to make significant impact on transportation revolution by the means of advancing autonomous driving
  • Competitive compensation package
  • Snacks, lunches, dinners, and fun activities
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service