BMO’s Applied AI team is responsible for building high‑performing, safe, and reliable AI systems that power real banking experiences. The Evaluations group within Applied AI develops the methods, datasets, and tooling that measure quality, safety, and performance across the full AI lifecycle. Working closely with product, engineering, and research partners, the team ensures evaluation signals are deeply embedded into training loops, deployment workflows, and continuous monitoring processes. This group operates at the intersection of data science, machine learning, and responsible AI, enabling scalable, repeatable, and trustworthy evaluation of advanced AI systems. The AI Evaluation Scientist is an individual contributor role focused on delivering the data science stream of AI evaluations. This includes designing, implementing, and productionizing evaluation methods, metrics, and datasets that directly influence modeling decisions, product quality, and the safety posture of AI systems across the bank. You will work hands‑on with complex models—particularly LLMs and deep learning systems—developing rigorous empirical analyses that surface model weaknesses, performance trends, and risk signals. In this role, you will translate evaluation standards into robust, maintainable evaluation code and workflows. You will collaborate with engineers to integrate evaluation signals into CI/CD and training pipelines, and work with product and research partners to ensure evaluation insights meaningfully shape model improvements. This position is highly technical, experimental, and delivery‑oriented, with a strong emphasis on applied data science, reproducible experimentation, and responsible AI practices.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level
Number of Employees
5,001-10,000 employees