Research Engineer, Response Quality/Hermes

DeepMind•Mountain View, CA

17h

About The Position

As a Research Engineer/Scientist on the Gemini Post-Training team, you will bridge the gap between core model alignment and the next generation of interactive AI. You will be re-architecting how Gemini interacts, reasons, and collaborates with users over complex, multi-turn exchanges. Your work will involve building high-scale distillation pipelines, developing metrics and evaluation techniques, taking advantage of latest development on agentic framework and autonomous user agents, and designing custom system instructions (SIs) that redefine model behavior, and finding the right training recipes to impact Gemini. This is a high-impact role where your contributions to SFT and Reinforcement Learning will directly shape the multimodal, widget-rich future of the world’s most capable AI.

Requirements

PhD or MSc in Computer Science, Machine Learning, or a related technical field.
RS: PhD
Experience in model alignment or post-training techniques, such as SFT, Reinforcement Learning (RL), or Reward Modeling.
Experience building and maintaining large-scale data processing or distillation pipelines (e.g., using Python, Spark, or similar frameworks).

Nice To Haves

Experience with distributed training frameworks and optimizing model performance for complex, open-ended tasks.
Experience working with agentic frameworks to evaluate or steer LLM behavior in interactive environments.
Demonstrated ability to translate research breakthroughs into production-level features or products.
A passion for building collaborative AI that excels at complex information synthesis.
A proven track record of development real-world systems, research or publications in areas such as r multimodal generation, reasoning, multi-turn dialogue, agentic evaluation.

Responsibilities

Models: Improve model quality for understanding and generation through applying and developing cutting edge modeling interventions, multi-turn reasoning, and rich-format multimodal synthesis.
Data: Unlock new information-seeking and collaborative capabilities through large-scale data distillation and processing, bridging pre-training and post-training.
Evals: Build on top of agentic frameworks to develop better evaluation methods (human, auto-raters, and automated metrics) that measure the quality of open-ended tasks with rich formats.
Research Execution: Drive projects by defining key research questions and designing experiments that provide clear, actionable answers.
Product Impact: Collaborate with cross-functional teams to land research breakthroughs directly into Google products and services.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume