We are now looking for a Senior System Software Engineer to work on Dynamo. NVIDIA is hiring software engineers for its GPU-accelerated deep learning software team. Academic and commercial groups around the world are using GPUs to power a revolution in AI, enabling breakthroughs in problems from image classification to speech recognition to natural language processing. We are a fast-paced team building Generative AI inference platform to make design and deployment of new AI models easier and accessible to all users. What you'll be doing: In this role, you will develop open source software to serve inference of trained AI models running on GPUs. You will Contribute to the development of disaggregated serving for Dynamo-supported inference engines (vLLM, SGLang, TRT-LLM) and expand to support multi-modal models for embedding disaggregation. Innovate in the management and transfer of large KV caches across heterogeneous memory and storage hierarchies, using the NVIDIA Optimized Transfer Library (NIXL) for low-latency, cost-effective data movement. Build new features to the Dynamo Rust Runtime Core Library and design, implement, and optimize distributed inference components in Rust and Python. Balance a variety of objectives: build robust, scalable, high performance software components to support our distributed inference workloads; work with team leads to prioritize features and capabilities; load-balance asynchronous requests across available resources; optimize prediction throughput under latency constraints; and integrate the latest open source technology.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level