About The Position

At Yugabyte, we are on a mission to become the default transactional database for enterprises building cloud-native applications. YugabyteDB is our PostgreSQL-compatible distributed database for cloud-native apps. Resilient, scalable, and flexible, it runs on any cloud and enables developers to become instantly productive using well-known APIs.We are looking for talented and driven people to join us on our ambitious mission and help us build a lasting and impactful company.The transactional database market is estimated to grow to $64B by 2025. YugabyteDB is cloud-native by design, has on-demand horizontal scalability, and supports geographical distribution of data using built-in replication. This means that we are well-positioned to meet market demand for geo-distributed, high-scale, high-performance workloads. Join the Database Revolution at Yugabyte. Modern applications need a cloud-native database that eliminates tradeoffs and silos. YugabyteDB retains the power and familiarity of PostgreSQL by pairing its trusted API with a precision-engineered, distributed, cloud-native architecture. Even better, it’s 100% open source. Many of the world's leading enterprises are migrating from legacy RDBMSs (like Oracle, SQL Server, and DB2) to YugabyteDB, to meet their mission-critical app demands. Director of Site Reliability Engineering (SRE) Overview We are seeking an experienced Director of Site Reliability Engineering (SRE) to lead and scale a high-performing reliability organization supporting both internal infrastructure and customer-facing reliability initiatives. This role is critical to ensuring the availability, scalability, and performance of our database platforms, including fully managed services and customer-deployed cloud environments. The ideal candidate brings deep technical expertise, a strong background in systems and infrastructure, and proven leadership experience managing SRE or reliability teams at scale.

Requirements

  • Proven experience leading SRE, Infrastructure, or Customer Reliability Engineering teams
  • Strong background in systems administration with a demonstrated progression into people management and leadership roles
  • Experience managing both internal infrastructure operations and customer-facing reliability initiatives
  • Hands-on knowledge of modern infrastructure tooling, automation, and cloud-native technologies
  • Ability to operate effectively in a fast-growing, highly technical environment
  • 10+ years of industry experience in SRE, infrastructure, systems engineering, or related fields

Nice To Haves

  • Prior experience at established technology companies managing infrastructure and reliability at scale
  • Database experience is a strong plus, particularly with MySQL or similar relational databases
  • Comfortable balancing strategic leadership with technical depth
  • Strong communicator with the ability to collaborate across technical and non-technical teams

Responsibilities

  • Lead, mentor, and grow an SRE organization, initially managing a team of 8 engineers with plans to scale to 10–11 engineers
  • Own reliability strategy across both: Inbound SRE responsibilities: infrastructure as code, patching, observability, monitoring, incident response, and automation
  • Outbound reliability responsibilities: customer-facing reliability engineering, escalations, and proactive support for production environments
  • Drive operational excellence, availability, and performance across cloud-native database platforms
  • Partner closely with Engineering, Product, and Customer-facing teams to ensure reliability is built into the product lifecycle
  • Establish and refine best practices for incident management, SLAs/SLOs, capacity planning, and operational readiness
  • Provide technical leadership and architectural guidance for systems operating at scale in cloud environments
  • Contribute to hiring, team structure, and long-term organizational planning for the SRE function
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service