Senior SWQA Test Development Engineer

NVIDIA•Santa Clara, CA

About The Position

NVIDIA is the world leader in accelerated computing and AI. Our technologies power the most advanced AI platforms, including NeMo microservices and NVIDIA Inference Microservices (NIM), enabling scalable, production‑grade AI deployment across cloud and enterprise environments. We are looking for a senior, technically strong test development engineer to drive quality, automation, and technical leadership in this rapidly evolving space.

Requirements

BS or higher degree in CS/EE/CE majors (or equivalent experience)
8+ years of experience in software development, test development, or quality engineering roles
Strong proficiency in Python and test automation frameworks
Experience testing distributed systems, microservices, or cloud‑native platforms
Solid understanding of Linux, Docker, Kubernetes, and CI/CD pipelines
Proven ability to lead technically, review designs, and mentor other engineers
Strong debugging skills and ability to reason about complex, system‑level failures
Excellent communication skills and experience working across geographically distributed teams

Nice To Haves

Experience testing AI/ML platforms, LLM pipelines, or inference services
Hands‑on exposure to NeMo, NIM, or model‑as‑a‑service platforms
Experience with performance, scale, and reliability testing in production‑like environments
Applying AI tools to enhance test development, automation, and diagnostics
Prior ownership of quality for customer‑facing or production‑critical services

Responsibilities

Own and drive end‑to‑end quality from design through release and production readiness
Lead test strategy, planning, and execution across functional, integration, system, performance, and reliability testing
Design, build, and maintain test frameworks and automation for microservice‑based, containerized AI systems
Provide technical leadership and mentorship to less senior engineers including guiding test design, automation practices, and quality standards
Partner closely with cross functional teams to influence architecture and improve testability
Validate LLM and AI inference workflows, including model lifecycle, APIs, CLIs, deployment configurations, and scaling scenarios
Drive defect triage, root‑cause analysis, and quality metrics, ensuring issues are addressed systematically and efficiently
Leverage AI‑assisted testing techniques to improve coverage, efficiency, and signal‑to‑noise in test results