Your Day to Day: System Reliability & Architecture: Design, build, and maintain highly available and scalable distributed systems. Define Service Level Indicators (SLIs) and Service Level Objectives (SLOs) to measure and ensure system health. Release Engineering & CI/CD Pipelines: Architect and optimize end-to-end CI/CD pipelines to ensure rapid, safe, and repeatable software delivery. Automation & Engineering: Reduce manual, repetitive work by writing code and automation scripts using languages like Python, PowerShell, or Java to improve efficiency and system reliability. Advanced Problem Solving: Exceptional analytical skills with the ability to correlate and troubleshoot complex issues across diverse platforms. This individual serves as the final point of escalation for high-impact technical challenges, utilizing command-line mastery and interactive shells to resolve deep-system anomalies. Infrastructure & Platform Engineering: Serve as the technical lead for hybrid infrastructure, managing the full lifecycle of on-premises and cloud-based resources. Monitoring & Observability: Develop the strategic roadmap for monitoring and observability across the hybrid environment. Capacity Planning & Performance: Analyze system resource usage to forecast and manage capacity, ensuring systems handle traffic growth. Mentorship & Leadership: Mentor junior team members, conduct code reviews, and promote SRE best practices across the organization. Security & Compliance: Maintain a high-security baseline for all platform services, ensuring compliance with SOX, SOC 2, PCI-DSS, or CIS Benchmarks where applicable. Conduct regular security audits, manage encryption protocols, and ensure all infrastructure follows the principle of least privilege.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level
Education Level
No Education Listed