Engineer, Systems Reliability

CNXAtlanta, GA
1d$92,250 - $128,000

About The Position

We're Concentrix. The intelligent transformation partner. Solution-focused. Tech-powered. Intelligence-fueled. The global technology and services leader that powers the world’s best brands, today and into the future. We’re solution-focused, tech-powered, intelligence-fueled. With unique data and insights, deep industry expertise, and advanced technology solutions, we’re the intelligent transformation partner that powers a world that works, helping companies become refreshingly simple to work, interact, and transact with. We shape new game-changing careers in over 70 countries, attracting the best talent. The Concentrix Technical Products and Services team is the driving force behind Concentrix’s transformation, data, and technology services. We integrate world-class digital engineering, creativity, and a deep understanding of human behavior to find and unlock value through tech-powered and intelligence-fueled experiences. We combine human-centered design, powerful data, and strong tech to accelerate transformation at scale. You will be surrounded by the best in the world providing market leading technology and insights to modernize and simplify the customer experience. Within our professional services team, you will deliver strategic consulting, design, advisory services, market research, and contact center analytics that deliver insights to improve outcomes and value for our clients. Hence achieving our vision. Our game-changers around the world have devoted their careers to ensuring every relationship is exceptional. And we’re proud to be recognized with awards such as "World's Best Workplaces," “Best Companies for Career Growth,” and “Best Company Culture,” year after year. Join us and be part of this journey towards greater opportunities and brighter futures. System Reliability Engineer (SRE) The System Reliability Engineer (SRE) is responsible for ensuring the availability, performance, scalability, and reliability of Client's customer‑facing digital web platforms. This role partners closely with Digital Web, Platform, and Product teams to support high‑traffic web experiences, proactively prevent incidents, and continuously improve operational excellence. The SRE applies engineering practices to operations, focusing on automation, monitoring, incident management, and resiliency across modern cloud‑native environments.

Requirements

  • Experience supporting web‑scale, customer‑facing digital platforms
  • Strong knowledge of Linux, networking fundamentals, and distributed systems
  • Hands‑on experience with monitoring and alerting tools (Grafana, Splunk, AppDynamics)
  • Experience with Kubernetes and containerized applications
  • Familiarity with CI/CD pipelines and deployment automation
  • 3+ years of experience in Site Reliability Engineering, DevOps, or Production Support
  • Experience supporting mission‑critical systems with strict SLAs
  • Clear and calm communication during incidents
  • Strong problem‑solving and troubleshooting skills
  • Comfortable working in a fast‑paced, always‑on digital environment
  • Ability to collaborate across engineering, product, and operations teams

Nice To Haves

  • Experience supporting telecom or large‑scale consumer digital platforms
  • Exposure to AEM or modern web frameworks
  • Experience with infrastructure as code (Terraform or similar)
  • Prior experience supporting digital platforms

Responsibilities

  • Ensure 24x7 availability and performance of Client's Digital Web applications
  • Monitor system health using tools such as Grafana, Splunk, and AppDynamics
  • Proactively identify and remediate reliability risks before customer impact
  • Support high‑volume traffic events, releases, and promotions
  • Own the end‑to‑end incident lifecycle: detection, triage, mitigation, and resolution
  • Lead or participate in major incident calls with clear communication and accountability
  • Perform root cause analysis (RCA) and drive corrective and preventive actions
  • Document post‑incident reviews and track follow‑up actions to closure
  • Build and maintain automation to reduce manual operational work
  • Support CI/CD pipelines and safe deployment practices
  • Partner with development teams to improve resiliency, scalability, and fault tolerance
  • Apply SRE principles such as error budgets, SLIs, and SLOs where applicable
  • Support cloud‑native platforms including Kubernetes (TKE), AWS, and PCF‑based services
  • Assist with platform migrations, upgrades, and performance tuning
  • Validate deployments across non‑prod and production environments
  • Work closely with Digital Web, API, Platform, and Infrastructure teams
  • Participate in design reviews to ensure operational readiness
  • Provide reliability guidance during feature development and releases

Benefits

  • medical, dental, and vision insurance
  • comprehensive employee assistance program
  • 401(k) retirement plan
  • paid time off and holidays
  • paid learning days
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service