Senior SRE DevOps Engineer (Remote from United States)

Jobgether

3d•Remote

About The Position

This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Senior SRE DevOps Engineer in United States. This is a high-impact role at the intersection of software engineering and cloud operations, focused on building and maintaining resilient, large-scale infrastructure for real-time communication systems. You will design, automate, and optimize cloud-native environments that support mission-critical connectivity under strict latency and reliability constraints. The position combines hands-on coding with deep operational ownership, empowering you to shape infrastructure strategy while improving developer productivity. Working in a remote-first, highly technical environment, youâll collaborate across engineering teams to ensure scalability, security, and performance. If you thrive on solving distributed systems challenges and building production-grade reliability tooling, this role offers both ownership and influence.

Requirements

7+ years of experience in SRE, DevOps, or Platform Engineering roles with daily hands-on coding responsibilities.
Proficiency in at least one backend language (TypeScript/Node.js, Python, or Go) for developing automation tools, internal services, and reliability frameworks.
Deep expertise in AWS services (ECS, EKS, RDS, ElastiCache, SQS, VPC, IAM, CloudWatch).
Strong experience with Infrastructure as Code tools (Terraform, CloudFormation, or Pulumi), including modular design and state management.
Proven experience designing and maintaining CI/CD pipelines in both cloud and on-prem environments.
Solid understanding of container orchestration (Docker, Kubernetes, Helm) and distributed systems patterns such as circuit breakers, retries, and graceful degradation.
Experience operating production databases (PostgreSQL, Redis) and message queues.
Strong security knowledge covering network segmentation, encryption, secrets management, and incident response.

Nice To Haves

Preferred experience with real-time communication infrastructure (SIP, RTP, WebRTC), telecom systems, IoT pipelines, or satellite/low-bandwidth optimization environments.

Responsibilities

Designing and implementing SLI/SLO frameworks with error budgets to guide reliability and performance decisions.
Building and maintaining AWS-based production infrastructure using Infrastructure as Code (Terraform, CloudFormation), including ECS, EKS/Kubernetes, and microservices orchestration.
Developing internal tools, automation frameworks, and reliability services in TypeScript, Python, or similar languages to enhance operational efficiency.
Leading incident response processes, conducting root cause analyses, and creating automated runbooks to reduce MTTR.
Architecting and maintaining CI/CD pipelines for backend services, mobile applications, and IoT firmware across cloud and on-prem environments.
Implementing comprehensive observability using OpenTelemetry, distributed tracing, metrics exporters, and alerting systems.
Managing data services such as PostgreSQL (RDS), Redis/ElastiCache, SQS, and networking components (ALB/NLB, VPC, IAM).
Enforcing strong security standards, including IAM policies, encryption, secrets management, vulnerability management, and compliance auditing.

Benefits

Competitive compensation package
Flexible remote work environment with autonomy and ownership
Opportunity to build and scale critical communication infrastructure
Exposure to cutting-edge technologies across cloud, IoT, telecom, and distributed systems
High-impact role with direct influence on reliability and platform architecture
Collaborative, technically advanced engineering culture

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume