Staff Software Engineer - Computer Vision Deployment

Claryo•San Francisco, CA

2d•Hybrid

About The Position

We're looking for a Staff Software Engineer – Computer Vision Deployment to build and scale the infrastructure that powers our AI-driven warehouse intelligence platform. You'll own the end-to-end lifecycle of computer vision models — from training pipelines through optimized cloud deployment — ensuring our cutting-edge computer vision and multi-modal AI systems run reliably and efficiently in production. Your work will directly enable the real-time perception and autonomous decision-making capabilities at the core of our platform. This is a deeply technical role at the intersection of machine learning, distributed systems, and cloud infrastructure. You'll design scalable GPU compute clusters, build robust orchestration pipelines, and optimize model serving for low-latency inference at scale. You'll work closely with our research scientists, computer vision engineers, and product teams to bridge the gap between experimental models and production-ready systems that operate across diverse warehouse environments. We've found tremendous value in collaborative problem-solving, thus our team works from our SF office three days a week.

Requirements

B.S. / M.S. in Computer Science, Robotics, or similar technical field, or equivalent practical experience.
7+ years of professional software engineering experience, with at least 3 years in machine learning infrastructure — developing, scaling, training, deploying, and optimizing large-scale ML systems from data to model.
Track record of deploying computer vision models in production environments with real-world constraints.
Experience with distributed messaging and compute systems (Kafka, gRPC, ROS2, or similar).
Strong programming skills in Python with solid software engineering practices.

Nice To Haves

Experience developing, running, and managing orchestration systems (Flyte, Temporal, Airflow, or similar) for ML and data pipelines.
Proficiency with ML frameworks (PyTorch, TensorFlow, DeepSpeed) and model serving platforms (TorchServe, TensorFlow Serving, NVIDIA Triton Inference Server, or similar).
Deep understanding of state-of-the-art machine learning models such as auto-regressive transformers and familiarity with inference optimization techniques (TensorRT, quantization, custom kernels).
Experience with C++ or CUDA programming for GPU acceleration.
Prior experience working at autonomous vehicles or robotics companies.

Responsibilities

Develop and maintain distributed cloud GPU infrastructure for large-scale world model training and low-latency inference.
Build end-to-end computer vision pipelines — from data ingestion and preprocessing through model training, evaluation, and deployment — and integrate them into core product workflows.
Deploy and optimize state-of-the-art machine learning models in the cloud using model serving platforms and inference optimization techniques, including VLMs and VLAs.
Design and operate orchestration systems that enable both engineers and non-engineers to build and manage data and ML pipelines.
Establish monitoring, benchmarking, and evaluation frameworks to ensure model performance and reliability in production environments.