Principal Site Reliability Engineer

Jobgether
2d$200,000 - $260,000Remote

About The Position

This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Principal Site Reliability Engineer in United States. This role provides leadership in designing, scaling, and maintaining highly reliable and performant infrastructure in a fast-paced, high-growth environment. You will work across teams to improve system reliability, optimize deployments, and drive operational excellence using modern cloud-native tools and practices. The role demands a hands-on, “doer” mentality, applying deep technical expertise in SRE, backend engineering, and distributed systems to solve complex challenges. You will influence the technical roadmap, mentor engineers, and implement best practices for observability, CI/CD, and automation. This is a high-impact position offering autonomy, visibility, and the opportunity to shape the engineering culture while leveraging AI tools to scale productivity and infrastructure reliability.

Requirements

  • 12+ years of professional software engineering or infrastructure engineering experience, with significant SRE and backend experience
  • Hands-on experience deploying production changes to large-scale applications or infrastructure
  • Strong proficiency in Golang and experience building and maintaining RESTful APIs
  • Expertise with SQL-based databases (MySQL, PostgreSQL) and optimizing schemas/queries for performance
  • Experience with observability tools (Prometheus, Grafana, Datadog, New Relic) and monitoring best practices
  • Solid understanding of distributed systems design patterns (event-driven architecture, stream processing, transactional outbox, queues)
  • Proven ability to lead complex technical initiatives, influence decisions, and introduce new ideas
  • Bachelor’s degree in Computer Science, Computer Engineering, or equivalent practical experience

Nice To Haves

  • experience with AWS cloud infrastructure, Kubernetes, CI/CD pipelines, AI developer tooling, and data pipelines
  • track record of shipping commercial APIs and high-traffic applications in scaling environments

Responsibilities

  • Architect, design, and implement scalable, reliable infrastructure using Kubernetes, AWS, RDS (MySQL/PostgreSQL), and distributed system patterns
  • Drive infrastructure roadmap initiatives to enhance system reliability, recoverability, and performance
  • Lead capacity planning, stress testing, and benchmarking to identify bottlenecks and prepare systems for growth
  • Define, maintain, and enforce SLAs, alerts, and observability standards across infrastructure
  • Implement AI and automation solutions to improve operational efficiency and developer productivity
  • Build consistency and scalability in a distributed microservices architecture while ensuring high performance
  • Establish and evolve engineering best practices for CI/CD, security, and monitoring
  • Mentor engineers and foster a culture of learning, innovation, and operational excellence
  • Collaborate cross-functionally to translate business objectives into actionable technical roadmaps

Benefits

  • Competitive salary range: $200,000 – $260,000 per year
  • Unlimited PTO, volunteer hours, and sabbatical opportunities
  • Comprehensive medical, dental, vision, life, STD/LTD insurance
  • 401(k) retirement plan with company match
  • Flexible, collaborative, and inclusive work environment
  • Opportunities for mentorship, career growth, and technical leadership
  • Support for work-life balance and personal development initiatives
  • Remote-first position with occasional cross-functional collaboration opportunities
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service