About The Position

We're enhancing the shopping experience on Amazon through the conversational capabilities of large language models, and we're looking for innovative professionals who are passionate about technology and customer experience. You'll have the opportunity to drive breakthrough innovations in LLM inference and post-training efficiency while working alongside talented scientists, engineers, and technical program managers (TPMs) to create solutions that serve our customers. If you're excited about optimizing the computational heart of AI systems, collaborating with a dynamic team, and contributing to this evolving field, we'd love to have you join our mission to unlock unprecedented LLM performance! Key job responsibilities We're looking for an experienced Software Development Engineer with deep expertise in GPU/customized chip kernel optimization and ML acceleration to lead projects in architecting, designing, developing, and optimizing high-performance kernel implementations for large language model. You'll guide your team in creating and optimizing innovative kernels, custom operators, and low-level optimizations that maximize hardware utilization and minimize computational overhead. In this role, you'll establish best practices for kernel development, memory management, and parallel computing that dramatically reduce inference latency and boost throughput for transformer-based models. You'll work with your team to develop kernel fusion techniques, attention mechanism optimizations, and matrix multiplication accelerations at scale, partnering with engineers and scientists in a fast-paced environment to deliver measurable performance gains. You'll also drive technical roadmap, performance benchmarking, and optimizations focused on kernel-level improvements.

Requirements

  • 5+ years of non-internship professional software development experience
  • 5+ years of programming with at least one software programming language experience
  • 5+ years of leading design or architecture (design patterns, reliability and scaling) of new and existing systems experience
  • Experience as a mentor, tech lead or leading an engineering team
  • Experience with vLLM, SGLang, TensorRT or similar platforms in production environments
  • Experience with CUDA kernels or ML/low-level kernels

Nice To Haves

  • 5+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience
  • Bachelor's degree in computer science or equivalent
  • Experience with Machine Learning and Large Language Model fundamentals, including architecture, training/inference lifecycles, and optimization of model execution

Responsibilities

  • Lead projects in architecting, designing, developing, and optimizing high-performance kernel implementations for large language model.
  • Guide your team in creating and optimizing innovative kernels, custom operators, and low-level optimizations that maximize hardware utilization and minimize computational overhead.
  • Establish best practices for kernel development, memory management, and parallel computing that dramatically reduce inference latency and boost throughput for transformer-based models.
  • Work with your team to develop kernel fusion techniques, attention mechanism optimizations, and matrix multiplication accelerations at scale, partnering with engineers and scientists in a fast-paced environment to deliver measurable performance gains.
  • Drive technical roadmap, performance benchmarking, and optimizations focused on kernel-level improvements.

Benefits

  • health insurance (medical, dental, vision, prescription, Basic Life & AD&D insurance and option for Supplemental life plans, EAP, Mental Health Support, Medical Advice Line, Flexible Spending Accounts, Adoption and Surrogacy Reimbursement coverage)
  • 401(k) matching
  • paid time off
  • parental leave
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service