Staff AI Engineer - Cortex Code Quality

Snowflake•Menlo Park, CA

About The Position

The Cortex Code team is building the future of coding agents for working with data. See our flagship product in action: Cortex Code in Action: Live Demos + AMA. As a Staff AI Engineer on Cortex Code Quality, you will help define architect agent behavior at enterprise scale by building the agentic systems and methodology that make our users build cutting edge agentic systems that are efficient, repeatable, auditable, and shippable. You’ll partner with modeling, platform, and product leadership to turn customer pain into golden scenarios, metrics, and experiment loops that the whole team can trust.

Requirements

Bachelor’s degree in Computer Science, Engineering, Statistics, or a related field. Master’s or higher preferred but not a requirement.
8+ years of experience shipping AI/ML-backed software in production, including Staff-level ownership of technical direction, cross-team delivery, and mentoring.
Strong track record building and operating eval harnesses, measurement, and/or experimentation loops for LLM/agent systems—not only one-off benchmarks.
Proficiency in programming languages such as Python, TypeScript, Go (strong in at least two).
Exceptional communication skills: crisp writeups, constructive debate, and ability to influence without authority across engineering and product.
(Optional) Experience with data engineering pipelines (dbt, Airflow), data modeling, data analysis, retrieval systems, and semantic layers is a plus.

Nice To Haves

Deep experience with agentic coding tools (IDE agents, CLI agents) and intuition for model strengths, failure modes, and prompting limits.
Background in data engineering (dbt, Airflow), analytics, retrieval / RAG, or semantic layers—highly relevant for data-centric coding agents.
Prior work on LLM observability, safety/guardrails, or quality systems used as release gates in production.

Responsibilities

Agent strategy & systems: Own major pillars of the quality stack: tuning agent behavior to engage on next generation agentic coding tasks.
Hill-climb infrastructure: Design and evolve pipelines and tooling that support large-scale experimentation, error mining, and iteration on prompts/tools/workflows with clear before/after signals.
Deep analysis & prioritization: Lead postmortems on quality regressions; cluster failure modes; translate findings into a prioritized roadmap for engineering and modeling partners.
Cross-functional leadership: Align product, infra, and applied AI on what “good” means for critical customer workflows; mentor engineers and uplevel eval craft across the team.
Production-minded rigor: Ensure quality systems are dependable in practice—reproducible runs, stable datasets, versioning, and operational clarity when things drift.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume