Technical Program Manager, Agent Quality and Evaluation, DeepMind

Google•Mountain View, CA

1d•$240,000 - $334,000

About The Position

A problem isn’t truly solved until it’s solved for all. That’s why Googlers build products that help create opportunities for everyone, whether down the street or across the globe. As a Program Manager at Google, you’ll lead complex, multi-disciplinary projects from start to finish — working with stakeholders to plan requirements, manage project schedules, identify risks, and communicate clearly with cross-functional partners across the company. Your projects will often span offices, time zones, and hemispheres. It's your job to coordinate the players and keep them up to date on progress and deadlines. As the Technical Program Manager for AI Agent Quality and Evaluation, you will be the strategic owner of evaluation infrastructure that ensures our AI agents deliver reliable, high-quality outcomes at scale. You will scale evaluation efforts across agent quality (e.g., capability-based evaluations, user feedback pipelines, quality dashboards) and product evaluations (e.g., workflow validation, real-world task completion metrics). This role is critical to establishing the quality bar for self-sustaining agent execution across software development, operations, and enterprise workflows. In this role, you will own the evaluation strategy for our AI agent programs. You will work at the intersection of research, engineering, and product to ensure our AI agents meet the highest quality standards before deployment. Artificial intelligence will be one humanity’s most transformative inventions. At Google DeepMind, we are a pioneering AI lab with exceptional interdisciplinary teams focused on advancing AI development to solve complex global challenges and accelerate high-quality product innovation for billions of users. We use our technologies for widespread public benefit and scientific discovery, ensuring safety and ethics are always our highest priority. We are pushing the boundaries across multiple domains. Our global teams offer diverse learning opportunities and varied career pathways for those driven to achieve exceptional results through collective effort.

Requirements

Bachelor's degree or equivalent practical experience.
10 years of experience in program or project management.
10 years of experience managing cross-functional or cross-team projects.

Nice To Haves

Experience building and scaling evaluation infrastructure for AI/ML systems, including benchmark design, metrics definition, and quality tracking.
Experience partnering with research and engineering teams in fast-paced environments to guide program delivery from concept to completion.
Understanding of the unique challenges in evaluating agentic behavior with a passion for AI agents and self-sustaining systems.
Ability to prioritize, adapt to change, and provide flexible thought partnership in an evolving landscape.
Excellent communication skills with the ability to develop meaningful relationships with key partners and influence action and outcomes.

Responsibilities

Build and scale capability-based evaluation frameworks for AI agents.
Establish quality dashboards and leaderboards for tracking agent performance and latency.
Guide user feedback pipelines to collect and curate high-quality evaluation examples.
Coordinate benchmark evaluations comparing agent capabilities against baselines.
Partner with evaluation teams to validate agent capabilities across various use cases.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume