We're enhancing the shopping experience on Amazon through the conversational capabilities of large language models, and we're looking for innovative professionals who are passionate about technology and customer experience. You'll have the opportunity to drive breakthrough innovations in LLM inference and post-training efficiency while working alongside talented scientists, engineers, and technical program managers (TPMs) to create solutions that serve our customers. If you're excited about optimizing the computational heart of AI systems, collaborating with a dynamic team, and contributing to this evolving field, we'd love to have you join our mission to unlock unprecedented LLM performance! Key job responsibilities We're looking for an experienced Software Development Engineer with deep expertise in GPU/customized chip kernel optimization and ML acceleration to lead projects in architecting, designing, developing, and optimizing high-performance kernel implementations for large language model. You'll guide your team in creating and optimizing innovative kernels, custom operators, and low-level optimizations that maximize hardware utilization and minimize computational overhead. In this role, you'll establish best practices for kernel development, memory management, and parallel computing that dramatically reduce inference latency and boost throughput for transformer-based models. You'll work with your team to develop kernel fusion techniques, attention mechanism optimizations, and matrix multiplication accelerations at scale, partnering with engineers and scientists in a fast-paced environment to deliver measurable performance gains. You'll also drive technical roadmap, performance benchmarking, and optimizations focused on kernel-level improvements.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level