Principal Site Reliability Engineer

Jobgether

2d•$200,000 - $260,000•Remote

About The Position

This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Principal Site Reliability Engineer in United States. This role provides leadership in designing, scaling, and maintaining highly reliable and performant infrastructure in a fast-paced, high-growth environment. You will work across teams to improve system reliability, optimize deployments, and drive operational excellence using modern cloud-native tools and practices. The role demands a hands-on, âdoerâ mentality, applying deep technical expertise in SRE, backend engineering, and distributed systems to solve complex challenges. You will influence the technical roadmap, mentor engineers, and implement best practices for observability, CI/CD, and automation. This is a high-impact position offering autonomy, visibility, and the opportunity to shape the engineering culture while leveraging AI tools to scale productivity and infrastructure reliability.

Requirements

12+ years of professional software engineering or infrastructure engineering experience, with significant SRE and backend experience
Hands-on experience deploying production changes to large-scale applications or infrastructure
Strong proficiency in Golang and experience building and maintaining RESTful APIs
Expertise with SQL-based databases (MySQL, PostgreSQL) and optimizing schemas/queries for performance
Experience with observability tools (Prometheus, Grafana, Datadog, New Relic) and monitoring best practices
Solid understanding of distributed systems design patterns (event-driven architecture, stream processing, transactional outbox, queues)
Proven ability to lead complex technical initiatives, influence decisions, and introduce new ideas
Bachelorâs degree in Computer Science, Computer Engineering, or equivalent practical experience

Nice To Haves

experience with AWS cloud infrastructure, Kubernetes, CI/CD pipelines, AI developer tooling, and data pipelines
track record of shipping commercial APIs and high-traffic applications in scaling environments

Responsibilities

Architect, design, and implement scalable, reliable infrastructure using Kubernetes, AWS, RDS (MySQL/PostgreSQL), and distributed system patterns
Drive infrastructure roadmap initiatives to enhance system reliability, recoverability, and performance
Lead capacity planning, stress testing, and benchmarking to identify bottlenecks and prepare systems for growth
Define, maintain, and enforce SLAs, alerts, and observability standards across infrastructure
Implement AI and automation solutions to improve operational efficiency and developer productivity
Build consistency and scalability in a distributed microservices architecture while ensuring high performance
Establish and evolve engineering best practices for CI/CD, security, and monitoring
Mentor engineers and foster a culture of learning, innovation, and operational excellence
Collaborate cross-functionally to translate business objectives into actionable technical roadmaps

Benefits

Competitive salary range: $200,000 â $260,000 per year
Unlimited PTO, volunteer hours, and sabbatical opportunities
Comprehensive medical, dental, vision, life, STD/LTD insurance
401(k) retirement plan with company match
Flexible, collaborative, and inclusive work environment
Opportunities for mentorship, career growth, and technical leadership
Support for work-life balance and personal development initiatives
Remote-first position with occasional cross-functional collaboration opportunities

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume