Inference is now the defining cost and latency bottleneck for frontier AI. Fluidstack’s Inference Platform team owns the serving layer that sits between our global GPU supply and the production workloads our customers run on it: LLM serving frameworks, KV cache infrastructure, disaggregated prefill/decode pipelines, and Kubernetes-based orchestration across multi-datacenter footprints. This is a hands-on IC role at the intersection of distributed systems, model optimization, and GPU infrastructure. You’ll own end-to-end inference deployments for frontier AI labs and our inference product, drive measurable improvements in throughput, cost-per-token, and time-to-first-token, and contribute to the platform architecture choices that determine how Fluidstack deploys across tens of thousands of GPUs.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level
Education Level
No Education Listed