Machine Learning Infra Engineer

Nuance Labs•Seattle, WA

4d•Onsite

About The Position

Nuance Labs is building the next generation of emotionally expressive, real-time AI. This is a critical role to build the infrastructure that powers our AI platform. You will own the systems that serve models at scale, orchestrate complex data workflows, and ensure our real-time video AI runs reliably with low latency for users worldwide.

Requirements

Infrastructure Expertise: Strong practical experience with Kubernetes, Terraform, and cloud platforms. You can design secure, scalable systems and debug complex distributed issues.
Systems Programming: Proficiency in Python and experience with systems languages (Rust or Go). Comfortable profiling workloads and resolving compute, memory, or network bottlenecks.
Orchestration & Pipelines: Experience managing large-scale offline workflows using tools like Dagster, Ray, Airflow, or similar frameworks.
Production Operations: Deep understanding of production reliability, monitoring, incident response, and capacity planning for high-traffic services.

Nice To Haves

Experience with WebRTC or real-time media pipelines in production
Experience running GPU-backed inference services at scale (vLLM, Triton Inference Server, TensorRT)
Knowledge of performance optimization and low-level systems debugging
Familiarity with video/audio processing and storage systems

Responsibilities

Own Inference Infrastructure: Build and maintain the serving stack for multimodal AI workloads. Optimize for latency, throughput, and cost using batching strategies, autoscaling, and intelligent resource allocation.
Real-Time Video Streaming: Architect systems to handle long-lived WebRTC connections with unpredictable client behavior, ensuring smooth video and audio delivery at scale.
Orchestrate Data Workflows: Build robust pipelines for offline processing, evaluation, and training using orchestration frameworks like Dagster or Ray. Manage petabyte-scale video storage and network requirements.
GPU Cluster Management: Configure and maintain GPU clusters using Kubernetes and Terraform. Implement monitoring, autoscaling based on custom metrics, and cost optimization strategies.
Developer Tooling: Build CI/CD, evaluation, and versioning systems that enable safe, zero-downtime model deployments and rapid iteration cycles.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume