Staff Software Developer- AI Frameworks, GPU Optimization

Advanced Micro Devices, Inc•San Jose, CA

About The Position

AMD is looking for a world class AI frameworks engineer who can provide technical leadership in the development of various AI frameworks in the AMD ecosystem. You will play a pivotal role in developing and optimizing deep learning frameworks for AMD GPUs. You will engage with both internal GPU library teams and open-source maintainers to ensure seamless integration of optimizations, utilizing cutting-edge compiler technologies and advanced engineering principles to drive continuous improvement. If you are passionate about AI/ML frameworks, software architecture, and/or compilers this is your opportunity. You will be working in one of the core areas, such as AI/ML frameworks (e.g. PyTorch, vLLM, SGLang), AI runtime components, and/or optimization tooling to accelerate AI/ML workloads on AMD GPUs. You will collaborate closely with AI researchers to drive the development of framework components to efficiently map AI models to run on latest AMD GPUs. You should be someone who can work in a dynamic development environment, with excellent leadership and collaboration skills. You will work with multiple engineering teams that are geographically dispersed. You will work on next generation framework software, guiding other senior developers and domain experts.

Requirements

BS, MS or PhD in Computer Science, Computer Engineering, Electrical Engineering, or related technical fields.
This role is not eligible for visa sponsorship.

Nice To Haves

GPU Kernel Development & Optimization: Experienced in designing and optimizing GPU kernels for deep learning on AMD GPUs using HIP, CUDA, and assembly (ASM). Strong knowledge of AMD architectures (GCN, RDNA) and low-level programming to maximize performance for AI operations, leveraging tools like Compute Kernel (CK), CUTLASS, and Triton for multi-GPU and multi-platform performance.
Experience with AI software framework, such as PyTorch, vLLM, SGLang, benchmarking and profiling.
Experience using profiling and benchmark tooling for large models.
Experience with model optimization, such as low-precision quantization (MXFP4, FP8, INT4), sparsity.
Solid understanding of model architectures, LLMs, MoE, diffusion.
Proficient in C++ programming.
Experience developing and debugging in Python.
Team player and ready to work with a geographically distributed team.

Responsibilities

Optimize Deep Learning Frameworks: Enhance and optimize frameworks like PyTorch, vLLM, SGLang for AMD GPUs in open-source repositories.
Develop GPU Kernels: Create and optimize GPU kernels to maximize performance for specific AI operations.
Develop & Optimize Models: Design and optimize deep learning models using quantization specifically for AMD GPU performance.
Collaborate with GPU Library Teams: Work closely with internal teams to analyze and improve training and inference performance on AMD GPUs.
Collaborate with Open-Source Maintainers: Engage with framework maintainers to ensure code changes are aligned with requirements and integrated upstream.
Software Engineering Best Practices: Apply sound engineering principles to ensure robust, maintainable solutions.