Technical Program Manager, AI Evaluation Specialist

Jobgether

23d•$103,680 - $144,000•Remote

About The Position

This role is perfect for a detail-oriented and analytical professional passionate about ensuring the safety, reliability, and quality of AI systems in operational settings. The Technical Program Manager, AI Evaluation Specialist, will lead the human-in-the-loop evaluation process, measuring AI model performance, identifying errors, and recommending improvements to enhance accuracy and trust. You will collaborate closely with data, operations, and model teams to standardize evaluation protocols, analyze patterns, and ensure AI outputs meet organizational standards. This position combines technical rigor with project management, offering the chance to influence model governance at scale. You will track metrics, maintain documentation, and help implement insights into operational workflows. The role requires strong problem-solving, communication, and organizational skills, and operates in a collaborative, fast-paced environment.

Requirements

3â5+ years of experience in QA, evaluation, operational analytics, human-in-the-loop programs, or model monitoring.
Experience reviewing unstructured text and applying rubrics or scorecards for qualitative and quantitative assessment.
Understanding of AI applications in operations, including classification, summarization, categorization, and automation.
Strong analytical skills with the ability to identify patterns, edge cases, and failure modes.
Familiarity with QA frameworks or content-review workflows.
Exceptional attention to detail and consistency in work.
Clear communication and documentation skills.
Passion for ensuring AI systems are safe, fair, and reliable.

Nice To Haves

Experience with SQL, Looker, or Snowflake is a plus.
COPC or Lean Six Sigma experience is a plus.

Responsibilities

Own the human-in-the-loop evaluation process for AI models supporting operations, ensuring consistent and accurate assessments.
Conduct recurring sampling and detailed reviews to assess model accuracy, consistency, and failure modes.
Score, tag, and document instances where AI systems misclassify, hallucinate, or generate incomplete outputs.
Maintain rubrics, guidelines, and documentation to ensure evaluator alignment and scoring consistency.
Investigate error patterns and root causes, translating insights into actionable recommendations for model owners and partner teams.
Track and report evaluation metrics, such as accuracy, recall, coverage, and error types, and integrate findings into dashboards and workflows.
Support scaling of governance processes and strengthen model-health standards across operations.

Benefits

Competitive base salary starting at $103,680â$144,000 annually, plus potential bonus and equity opportunities.
Comprehensive medical, dental, vision, life, and disability insurance.
401(k) retirement plan with company match.
Flexible vacation and paid time off policies.
Paid parental leave for birthing and non-birthing parents.
Wellness stipends and support for family planning services.
Opportunities for both in-person and virtual team engagement and professional development activities.
Remote-first work environment with occasional on-site collaboration as needed.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume