Engineer, Systems Reliability

CNX•Atlanta, GA

1d•$92,250 - $128,000

About The Position

We're Concentrix. The intelligent transformation partner. Solution-focused. Tech-powered. Intelligence-fueled. The global technology and services leader that powers the world’s best brands, today and into the future. We’re solution-focused, tech-powered, intelligence-fueled. With unique data and insights, deep industry expertise, and advanced technology solutions, we’re the intelligent transformation partner that powers a world that works, helping companies become refreshingly simple to work, interact, and transact with. We shape new game-changing careers in over 70 countries, attracting the best talent. The Concentrix Technical Products and Services team is the driving force behind Concentrix’s transformation, data, and technology services. We integrate world-class digital engineering, creativity, and a deep understanding of human behavior to find and unlock value through tech-powered and intelligence-fueled experiences. We combine human-centered design, powerful data, and strong tech to accelerate transformation at scale. You will be surrounded by the best in the world providing market leading technology and insights to modernize and simplify the customer experience. Within our professional services team, you will deliver strategic consulting, design, advisory services, market research, and contact center analytics that deliver insights to improve outcomes and value for our clients. Hence achieving our vision. Our game-changers around the world have devoted their careers to ensuring every relationship is exceptional. And we’re proud to be recognized with awards such as "World's Best Workplaces," “Best Companies for Career Growth,” and “Best Company Culture,” year after year. Join us and be part of this journey towards greater opportunities and brighter futures. System Reliability Engineer (SRE) The System Reliability Engineer (SRE) is responsible for ensuring the availability, performance, scalability, and reliability of Client's customer‑facing digital web platforms. This role partners closely with Digital Web, Platform, and Product teams to support high‑traffic web experiences, proactively prevent incidents, and continuously improve operational excellence. The SRE applies engineering practices to operations, focusing on automation, monitoring, incident management, and resiliency across modern cloud‑native environments.

Requirements

Experience supporting web‑scale, customer‑facing digital platforms
Strong knowledge of Linux, networking fundamentals, and distributed systems
Hands‑on experience with monitoring and alerting tools (Grafana, Splunk, AppDynamics)
Experience with Kubernetes and containerized applications
Familiarity with CI/CD pipelines and deployment automation
3+ years of experience in Site Reliability Engineering, DevOps, or Production Support
Experience supporting mission‑critical systems with strict SLAs
Clear and calm communication during incidents
Strong problem‑solving and troubleshooting skills
Comfortable working in a fast‑paced, always‑on digital environment
Ability to collaborate across engineering, product, and operations teams

Nice To Haves

Experience supporting telecom or large‑scale consumer digital platforms
Exposure to AEM or modern web frameworks
Experience with infrastructure as code (Terraform or similar)
Prior experience supporting digital platforms

Responsibilities

Ensure 24x7 availability and performance of Client's Digital Web applications
Monitor system health using tools such as Grafana, Splunk, and AppDynamics
Proactively identify and remediate reliability risks before customer impact
Support high‑volume traffic events, releases, and promotions
Own the end‑to‑end incident lifecycle: detection, triage, mitigation, and resolution
Lead or participate in major incident calls with clear communication and accountability
Perform root cause analysis (RCA) and drive corrective and preventive actions
Document post‑incident reviews and track follow‑up actions to closure
Build and maintain automation to reduce manual operational work
Support CI/CD pipelines and safe deployment practices
Partner with development teams to improve resiliency, scalability, and fault tolerance
Apply SRE principles such as error budgets, SLIs, and SLOs where applicable
Support cloud‑native platforms including Kubernetes (TKE), AWS, and PCF‑based services
Assist with platform migrations, upgrades, and performance tuning
Validate deployments across non‑prod and production environments
Work closely with Digital Web, API, Platform, and Infrastructure teams
Participate in design reviews to ensure operational readiness
Provide reliability guidance during feature development and releases