As a Site Reliability Engineer (SRE) at GitLab, you’ll help keep all user-facing services and production systems reliable, scalable, and efficient. Our SREs combine a pragmatic operations mindset with strong software engineering practices to drive automation, reduce toil, and improve resilience across our platform. In the Environment Automation specialization, your focus is on operating and automating hundreds of GitLab environments—from initial provisioning to day-to-day maintenance tasks. Unlike other SRE roles, this position centers on automating the lifecycle of many tenant environments, ensuring they remain secure, consistent, and reliable at scale. Some examples of the projects you could work on: Designing infrastructure automation that provisions and operates GitLab environments using Terraform, Ansible, and Kubernetes Creating and maintaining deployment packages for GitLab, such as Helm Charts and omnibus-gitlab Building and operating Dedicated GitLab instances integrated with cloud-native services (e.g., GCP, AWS) Developing tools to orchestrate infrastructure-as-code workflows across multiple tenants Deploying and managing microservices on Kubernetes clusters at scale Enhancing GitLab’s observability stack (e.g., Prometheus, ELK) to support proactive monitoring and incident response Integrating with and operating infrastructure in cloud provider ecosystems (e.g., IAM, networking, storage) Championing and implementing cloud security best practices across automated infrastructure
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Senior
Education Level
No Education Listed