Director, Site Reliability Engineering

VertaforeDenver, CO
1d$175,000 - $220,000

About The Position

Vertafore is a leading technology company whose innovative software solution are advancing the insurance industry. Our suite of products provides solutions to our customers that help them better manage their business, boost their productivity and efficiencies, and lower costs while strengthening relationships. Our mission is to move InsurTech forward by putting people at the heart of the industry. We are leading the way with product innovation, technology partnerships, and focusing on customer success. Our fast-paced and collaborative environment inspires us to create, think, and challenge each other in ways that make our solutions and our teams better. We are headquartered in Denver, Colorado, with offices across the U.S., Canada, and India. The Director, Site Reliability Engineering (SRE) will lead reliability, performance, and observability initiatives for a portfolio of Vertafore products. This role owns SLIs/SLOs, incident response, automation, and CI/CD practices for assigned product families. Directors will manage multiple teams and collaborate with Product Development, Architecture, Cloud Operations, Information Security, and other SRE leaders to ensure operational excellence. This role is responsible for bridging the gap between development and operations by applying a software engineering mindset to system administration. You will own the lifecycle of services - from inception and design, through deployment, operation, and refinement.

Requirements

  • Bachelor’s degree in Computer Science, Information Systems, or related field.
  • 15+ years in Software Engineering, SRE, DevOps, or reliability roles; 8+ years in leadership.
  • Proven ability to leverage software engineering principles and practices to solve reliability and operational challenges.
  • Expertise in CI/CD, observability, and incident response.
  • Strong AWS knowledge and experience with container orchestration.
  • Proven ability to lead reliability programs across multiple SaaS products.
  • Experience architecting applications or infrastructure for high-growth cloud platforms.
  • Experience in B2B SaaS environments involving large-scale distributed systems.
  • Proven leadership communicating and influencing at team, peer, and leadership levels.
  • Demonstrated experience driving operational excellence through metrics and KPIs.

Nice To Haves

  • (Preferred) Background supporting financial services, healthcare, or regulated industries.

Responsibilities

  • Product Reliability Leadership o Define and enforce SLIs/SLOs for a subset of Vertafore flagship products. o Drive observability strategy across application and infrastructure layers.
  • Release Engineering & Toil Reduction o Oversee CI/CD pipelines for product deployments using tools like GitLab, Jenkins, Ansible, LaunchDarkly. o Monitor and cap "Toil" (manual, repetitive operational work) at 50% using Automation and AI tools, ensuring the team spends the remaining time on project work that scales the system.
  • Error Budget Management o Manage "Error Budgets" to balance the velocity of feature releases with the stability of the platform, ensuring clear consequences when budgets are exhausted.
  • Incident Management o Define and participate in 24x7 on-call rotations for assigned products; ensure rapid resolution and blameless postmortems.
  • Cross-Functional Collaboration o Partner with Cloud Ops on capacity planning, OS patching (app tier), and load balancing (ALB, F5). o Align reliability goals with product roadmaps and customer SLAs.
  • Team Leadership o Manage a group of Managers and Engineers, mentor teams on automation, observability, and reliability best practices.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service