The Productivity and Machine Learning Evaluation team ensures the quality of AI-powered features across a suite of productivity and creative applications - including Creator Studio - used by hundreds of millions of people. This team serves as the primary evaluation function, providing critical quality signals that directly influence model development decisions and product launches. This role focuses on building and scaling automated evaluation systems and designing adversarial and stress-testing methodologies across multiple AI features. The work requires a deep understanding of how AI systems fail and how to measure quality rigorously. This is an opportunity to shape the evaluation infrastructure that determines whether AI features meet the bar for hundreds of millions of users. DESCRIPTION Day-to-day work involves designing, building, and maintaining automated evaluation systems that assess AI feature quality at scale. This includes creating adversarial test suites that probe model weaknesses and running stress tests to ensure features perform under demanding conditions. The role requires close collaboration with cross-functional partners to ensure evaluation methods are well-calibrated and integrated into development workflows. Typical deliverables include: evaluation frameworks and rubrics, quality assessment reports, adversarial test case libraries, and recommendations on model readiness.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level
Number of Employees
5,001-10,000 employees