Senior Site Reliability Engineer (Ruby+ DevOps)

Exadel Inc (Website)

4h•Hybrid

About The Position

We’re an AI-first global tech company with 25+ years of engineering leadership, 2,000+ team members, and 500+ active projects powering Fortune 500 clients, including HBO, Microsoft, Google, and Starbucks. From AI platforms to digital transformation, we partner with enterprise leaders to build what’s next. What powers it all? Our people are ambitious, collaborative, and constantly evolving. About the Client The company has been building solutions for mobile apps, effortless payment, business travel, and advertising since 1992. The customer is developing a mobility platform that allows operators to manage their vehicles and drivers efficiently, regulators to be informed and establish guidelines, service providers to deliver sustainable solutions, and riders to have an effortless transit experience. What You’ll Do Design, build, and operate reliable, scalable distributed systems Improve system availability, performance, and resilience Automate infrastructure, deployments, and operational processes Diagnose and resolve production issues Lead upgrades and migrations with minimal or zero downtime Participate in on-call rotations and incident response Collaborate closely with development teams to improve operability Drive best practices around monitoring, alerting, and capacity planning Reduce operational toil through automation Contribute to incident management, post-mortems, disaster recovery strategies, and continuous reliability improvements

Requirements

7+ years of experience, specializing in Kubernetes and AWS
Strong programming skills in at least one major language (Ruby, Java, Go, Python, .NET, or similar)
Solid understanding of concurrency, runtime behavior, and performance optimization
Hands-on experience with Docker and containerized workloads
Strong Kubernetes expertise (Deployments, StatefulSets, Services, Ingress, Helm, troubleshooting, autoscaling)
Strong AWS experience (EC2, EKS, RDS, S3, IAM, VPC, Load Balancers, CloudWatch)
Experience designing infrastructure for high availability and disaster recovery
Experience with CI/CD pipelines and Infrastructure as Code (Terraform, CloudFormation, Pulumi, or similar)
Experience with RabbitMQ or similar messaging systems (Kafka, SQS, Pulsar, etc.)
Strong understanding of relational databases (MySQL/PostgreSQL), including query optimization, replication, and failover strategies
Familiarity with NoSQL and in-memory databases (Redis, DynamoDB, MongoDB)
Experience with distributed systems, microservices, capacity planning, and fault tolerance
Experience with monitoring and observability tools (Prometheus, Grafana, Datadog, ELK/OpenSearch, OpenTelemetry)
Strong understanding of Linux systems and networking fundamentals (TCP/IP, DNS, HTTP/HTTPS, TLS, load balancing)
Experience with SRE practices, including SLOs/SLIs/SLAs, load testing, resilience testing, and incident management
Strong communication skills and ability to collaborate across engineering teams
Calm and effective during incidents with an ownership mindset

Nice To Haves

Experience operating production systems written in Ruby, Java, or other major platforms
Framework experience such as Ruby on Rails, Spring Boot, or similar
Experience operating high-traffic SaaS platforms
Cost optimization in cloud environments
Chaos engineering practices
Experience mentoring junior engineers

Responsibilities

Design, build, and operate reliable, scalable distributed systems
Improve system availability, performance, and resilience
Automate infrastructure, deployments, and operational processes
Diagnose and resolve production issues
Lead upgrades and migrations with minimal or zero downtime
Participate in on-call rotations and incident response
Collaborate closely with development teams to improve operability
Drive best practices around monitoring, alerting, and capacity planning
Reduce operational toil through automation
Contribute to incident management, post-mortems, disaster recovery strategies, and continuous reliability improvements