Systems Engineer: Real-Time Engine

Nuance Labs•Seattle, WA

6d•Onsite

About The Position

We're building the engine that powers our AI avatar: a real-time interactive loop that continuously senses the user (audio and video), orchestrates inference across multiple models, manages state, and renders a coherent audio-visual response within tight latency budgets. Traditional real-time systems are hard because the timing requirements are strict. This system is harder: the system components are neural networks with variable latency, non-deterministic outputs, and no ability to pause the user while they think. You're building a system that has to feel instantaneous while running inference that isn't. This is the runtime that makes a human-AI conversation feel alive,. You’ll own this runtime and collaborate closely with our research team on how models are invoked, how conversational context is assembled, and how response quality is balanced against latency. You’ll have direct influence over architecture decisions as an early engineer at a small, well-funded team.

Requirements

Real-time streaming systems experience. You’ve built systems that operate on a continuous real-time loop with hard per-tick latency budgets, where output must never stall.
Strong Python and async programming. You need to be productive immediately in Python — asyncio should be second nature. The key skill is writing prototype code with clean enough architecture that it survives a language port.
Systems programming background. The production system will be written in Rust. You don’t need to know Rust today, but you should have experience in at least one systems language (Rust, C++, Go) and be motivated to adopt Rust.
Concurrency and state machine design. Experience designing concurrent systems: async runtimes, thread models, lock contention, schedulers. Specifically, managing multiple in-flight async processes with cancellation, priority switching, and preemption
Strong intuition for latency. Profiling, tail behavior, and tradeoffs across throughput vs. responsiveness. Ability to reason about end-to-end pipelines across CPU and GPU boundaries.
Comfort building from scratch under time pressure. This is a “design the architecture and ship it” role, not a “maintain existing infrastructure” role. You’re comfortable with ambiguity and rapid iteration.

Nice To Haves

Experience with real-time media systems: WebRTC, RTP/RTCP, jitter buffers, A/V sync
Experience with real-time tick-loop architectures (e.g., game engines, simulation runtimes, audio DSP pipelines, robotics)
Experience with GPU inference serving and optimization: Triton, TensorRT, vLLM, CUDA profiling
Building LLM agent orchestration systems
Familiarity with streaming generation systems: incremental decoding and mid-stream control, lock-free data structure design

Responsibilities

Build and own the server-side real-time engine: session lifecycle, state management, and the architecture of the interaction loop, including the timing and scheduling layer that keeps the loop coherent
Integrate GPU-backed model inference into the real-time loop, wiring model outputs into the engine's state and render pipeline
Develop performance tooling for latency breakdowns (TTFO, steady-state), tracing, profiling, and regression detection
Collaborate with product and research to define how the system behaves at its boundaries — APIs, event streams, and the invariants the engine guarantees to the rest of the stack

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume