Senior DevOps Engineer, Cloud Infrastructure

Sirius XM•Atlanta, GA

18h

About The Position

The Senior DevOps Engineer, Cloud Infrastructure, leads a team dedicated to developing, deploying, and scaling cloud infrastructure that’s secure, reliable, and optimized for high performance. This hands-on role combines strategic oversight with technical leadership, supporting both project initiatives and operational excellence across cloud environments.

Requirements

Bachelor’s degree in Computer Science, Engineering, or a related field, or an equivalent combination of education and experience.
5 years in DevOps, Site Reliability Engineering, or Cloud Infrastructure roles.
3+ years of experience with AWS services (EC2, S3, ELB, VPC, IAM) or equivalent cloud environments, with a strong understanding of AWS best practices.
3+ years of experience running Linux-based production systems, with in-depth knowledge of Linux operating systems.
Kubernetes: Hands-on experience managing, deploying, and troubleshooting Kubernetes clusters.
Scripting Languages: Proficiency in Bash, Python, or other scripting languages, used for automation and infrastructure management.
Infrastructure Automation: Expertise with tools like Ansible, Terraform, or CloudFormation to deploy and manage infrastructure at scale.
Monitoring and Observability: Experience with monitoring technologies such as Grafana, Prometheus, AlertManager, to maintain visibility into system health.
Version Control: Proficiency in Git and experience with platforms like GitLab or GitHub for collaborative code management.
Curiosity and Initiative: You’re curious, unafraid to ask “why,” and proactive in exploring solutions and innovative ideas.
High Availability Mindset: You prioritize resilience and reliability in everything you design and deploy.
Must have legal right to work in the U.S.

Responsibilities

Infrastructure as Code: Design and implement infrastructure as code to build and deploy cloud solutions effectively.
Full-Service Lifecycle Management: Improve service life cycles, from design through deployment, operation, and refinement, focusing on reliability and scalability.
Monitor and Maintain Services: Ensure live services run smoothly by measuring and monitoring availability, latency, and overall system health, proactively identifying areas for improvement.
Scale with Automation: Scale systems sustainably through automation and push for enhancements that improve reliability, performance, and operational efficiency.
Optimize Infrastructure Costs: Drive initiatives to optimize infrastructure for cost-effectiveness without compromising performance or security.
Incident Response and Postmortems: Lead sustainable incident response efforts and conduct blameless postmortems to ensure continuous improvement and resilience.
Tool Selection and Evaluation: Have opinions on and experience with orchestration tools such as GitLab and ArgoCD, guiding best practices for the team.
AWS Expertise: Leverage and enhance Amazon Cloud environments to support current and future infrastructure needs, staying informed on new services and practices.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume