Sr. Lead AI Engineer

KlaviyoPalo Alto, CA
1dOnsite

About The Position

At Klaviyo, we believe the future of software isn’t just better productivity tools for humans—it’s systems that can run, optimize, and adapt themselves based on outcome and reward signals. We’ve already built the infrastructure and applications that sit between businesses and consumers—serving 183,000+ customers, billions of consumer profiles, and hundreds of billions of messages and conversion events. Now we’re investing heavily in Marketing AI and Service AI: the shared services and agentic capabilities that power AI‑native experiences across Klaviyo. As a Sr. Lead AI Engineer in the AI & Analytics organization, you’ll be the technical leader for a set of high‑impact AI services and agentic systems. You’ll own architecture and execution for complex, distributed backend systems; guide other engineers through design and implementation; and partner closely with product, machine learning, and data science to turn AI ideas into reliable, scalable production capabilities. This is a hands-on tech lead role, not a people‑manager role. You’ll also be located in our new Palo Alto office. This hub is a center of gravity for AI at Klaviyo—with employees in this hub all having machine learning and AI knowledge and experiences, but also tightly connected to our other U.S. and international R&D hubs. You’ll help shape how Palo Alto collaborates with Boston and global teams, and open up clear paths for growth into broader technical leadership, staff/principal‑level scope, or future people leadership if that’s the direction you want to go.

Requirements

  • Seasoned backend engineer & tech lead. ~8+ years of professional software engineering experience with a strong focus on backend and distributed systems; you’ve led complex projects or areas end‑to-end and acted as the go‑to technical owner for key services.
  • Hands‑on with generative & agentic AI in production. You’ve built and shipped generative or agentic AI applications (e.g., LLM‑backed flows, tool‑using agents, retrieval‑augmented systems) and are comfortable with prompt design, few‑shot approaches, fine‑tuning, and evaluation.
  • Deep experience with large‑scale distributed systems. You’ve architected and operated reliable services, async processing pipelines, and distributed task queues (e.g., Celery, Kafka, SQS, RabbitMQ, Redis) supporting high‑throughput workloads.
  • Strong Python and data tooling fluency. Proficient in Python and modern backend frameworks (FastAPI, Django or similar), with experience using big data tools such as Spark/Hadoop and ORMs like SQLAlchemy/Alembic.
  • Production‑grade cloud experience. Comfortable with AWS and Kubernetes, CI/CD pipelines, observability, and operational best practices; you treat infrastructure choices as part of the system design, not an afterthought.
  • Evaluation and quality‑obsessed. You’ve designed human and automated evals for AI systems, know how to instrument for quality, and understand how to balance latency, cost, and response quality in real‑world usage.
  • Technical leader, not just individual contributor. You influence architecture beyond your own code, facilitate design reviews, rally cross‑functional partners, and help teams converge on pragmatic decisions in ambiguous spaces.
  • Collaborative and customer‑first. You’re comfortable collaborating directly with PMs, ML engineers, and customers; you can translate between customer needs, product goals, and technical constraints, and you care deeply about the customer experience.
  • You’ve already been effectively practicing agentic coding in your daily work. You’re hungry to responsibly explore new AI tools and workflows, finding ways to make your work smarter and more efficient.

Nice To Haves

  • Prior experience training and deploying ML models (including RL or RLHF) into production systems that drove measurable business impact.
  • Experience building shared AI/ML platforms or services used by multiple product teams.
  • Background in marketing tech, ecommerce, or other domains where customer data, personalization, and experimentation are central.

Responsibilities

  • Define and improve the engineering architecture for Service and Marketing AI capabilities. Lead the design of scalable, low‑latency backend systems and APIs that power our AI products and agents for 183K+ customers, handling billions of events and interactions.
  • Build and harden AI serving systems. Lead the development of services that host and orchestrate AI models (LLMs, tools, evaluators, retrieval systems), with clear contracts, SLOs, and observability so other teams can depend on them.
  • Evolve our agentic architecture. Drive the technical roadmap for Service and Marketing AI agents—improving tool use, autonomy, and reliability so agents can safely take actions on behalf of customers and internal users with minimal human intervention.
  • Set engineering standards for AI services. Establish best practices for evaluation, safety/guardrails, prompt and model versioning, offline and online tests, and incident response for AI‑backed systems; hold the bar for design reviews and code quality.
  • Lead cross‑team technical work. Act as the primary technical interface between Service / Marketing AI and other product/platform teams—clarifying ownership boundaries, aligning on interfaces, and unblocking dependencies across hubs and time zones.
  • Mentor and uplevel engineers. Coach senior and mid‑level engineers through design, implementation, and operational best practices, helping them grow into stronger system owners and future tech leads.
  • Help build the Palo Alto hub. Shape local engineering rituals, help with hiring and onboarding, and create healthy patterns for cross‑hub collaboration so work and career opportunities are not constrained by geography.
  • Measure what matters. Define and track key metrics for your systems—availability, latency, cost‑to‑serve, agent success rates, eval scores, and customer adoption—and use those insights to drive roadmap and technical decisions.
  • Transform engineering and product development workflows by putting AI at the center, building & utilizing agentic coding tools and AI-first working.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service