Senior Machine Learning Engineer- Computer Vision

Warner Bros. Discovery•San Francisco, CA

About The Position

At Warner Bros. Discovery, we are reimagining how machine learning transforms storytelling. As part of the AI/ML organization, focusing on supporting applications of AI to video, the Machine Learning Engineer – Services group powers infrastructure and backend services behind production workflows. We're looking for an experienced ML Engineer with strong fundamentals and infrastructure experience to help build reusable components and services for video understanding, video summary, and video classifications. You will be part of a team focused on re-training, model hosting, cost optimization, and managing production workflows at scale.

Requirements

5+ years of experience in machine learning engineering, with end-to-end ML workflow expertise
Strong background in model retraining, fine-tuning, and evaluation techniques
Experience deploying and managing open-source model servers (e.g., Triton, TorchServe, Ray Serve)
Proficient in managing cost-effective distributed computing environments (e.g., Kubernetes, Ray, SageMaker)
Familiar with experiment tracking tools (e.g., MLflow, Weights & Biases) and model versioning strategies
Deep understanding of ML domains including NLP, RecSys, and reinforcement learning
Familiarity with labeling tools, HITL workflows, and offline data curation strategies
Comfort working in Agile development environments and collaborating across global teams

Nice To Haves

Experience with real-time inference systems and streaming data pipelines is a plus

Responsibilities

Build and maintain pipelines for model fine-tuning and retraining, including LoRA-based workflows and Large Language Models
Integrate and maintain vector search services and semantic similarity infrastructure
Design scalable model serving solutions for open-source and foundation models
Develop systems for experiment tracking, model versioning, and evaluation
Monitor production models for drift and performance degradation
Manage compute cost and resource optimization across distributed training jobs
Integrate Human-in-the-Loop (HITL) workflows and offline labeling into training pipelines
Support model deployment for varied model architectures, including Vision-Language Models, Convolutional Neural Nets, and Embedding Generation models
Stand up and maintain Feature Store and data versioning infrastructure
Architect and implement RAG pipelines for video metadata, summarization, and Q&A
Build evaluation frameworks to assess LLM performance, hallucination frequency, and structured response accuracy