Engineering Analyst, Trust and Safety, RAI Novel Testing

Google•Mountain View, CA

1d•$174,000 - $258,000

About The Position

Novel Testing is a team within Trust & Safety specializing in complex testing, defining protocols and methodologies for assessing risk where best practices do not currently exist. We pioneer and scale innovative testing programs, streamlining the launch of trustworthy, novel, responsible AI (RAI) products. Work spans from designing first-of-their-kind evaluations for Google’s most ambitious product bets—including autonomous agents, personalization, and the latest hardware—to developing new methodologies for assessing novel foundational model capabilities as they emerge. Advancing the state-of-the-art in AI evaluation is central to this mission. To scale these methods, we partner closely with engineering teams to build the innovative infrastructure and tools required for automated, rigorous evaluation. You will lead the development of novel testing methodologies for emergent AI, requiring the methodological precision to design evaluation frameworks where established standards do not yet exist. You’ll address complex data science questions with creative experimentation, designing sophisticated prompt strategies and quantitative analyses to identify systemic risks and edge cases in GenAI products. Bridging the gap between theory and execution, you will build and prototype testing solutions that incorporate data science best practices. You will then partner directly with engineering teams to inform the development of automated infrastructure, ensuring your insights scale effectively across Google’s ecosystem. You will utilize a researcher’s mindset—capable of deep qualitative and quantitative inquiry—paired with technical agility to translate those findings into scalable, high-impact engineering prototypes.

Requirements

Bachelor's degree or equivalent practical experience.
7 years of experience in managing projects and defining project scope, goals, and deliverables.
7 years of experience in data analysis or data science, including identifying trends, generating summary statistics, and drawing insights from quantitative and qualitative data.
5 years of experience in data analysis with experience in SQL or Python.

Nice To Haves

Master's degree or PhD in a relevant quantitative or engineering field.
5 years of experience working in trust and safety operations, data analytics, cybersecurity, or other relevant environment.
Experience working with large language models, LLM operations, prompt engineering, pre-training, and fine-tuning.
Experience in designing and conducting experiments or quantitative research in a technology or AI context.
Experience in AI systems, machine learning, and their potential risks.
Strong technical competency with a data-driven investigative approach to solve complex tests, including proficiency in data manipulation, analysis, and automation using languages like Python and SQL.

Responsibilities

Drive the methodological frontier of model evaluation.
Partner with Google DeepMind to develop novel, data-driven methodologies for the structured and unstructured testing of emerging AI products and model capabilities.
Move beyond standard benchmarks, designing sophisticated experimental frameworks, and uncovering latent model behaviors and capabilities.
Define testing and safety standards, working with cross-functional colleagues, policy, and engineering, to ensure they are met.
Perform analyses and drive insights to develop model-level and product-level safety mitigations.
Lead and influence cross-functional teams to implement safety initiatives.
Act as an advisor to executive leadership on complex safety issues.
Represent Google's AI safety efforts in external forums and collaborations, contributing to industry-wide best practices.
Mentor analysts, fostering a culture of excellence and acting as a subject matter expert on adversarial techniques.
Work with graphic, controversial, or upsetting content.