About The Position

This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Senior SRE DevOps Engineer in United States. This is a high-impact role at the intersection of software engineering and cloud operations, focused on building and maintaining resilient, large-scale infrastructure for real-time communication systems. You will design, automate, and optimize cloud-native environments that support mission-critical connectivity under strict latency and reliability constraints. The position combines hands-on coding with deep operational ownership, empowering you to shape infrastructure strategy while improving developer productivity. Working in a remote-first, highly technical environment, you’ll collaborate across engineering teams to ensure scalability, security, and performance. If you thrive on solving distributed systems challenges and building production-grade reliability tooling, this role offers both ownership and influence.

Requirements

  • 7+ years of experience in SRE, DevOps, or Platform Engineering roles with daily hands-on coding responsibilities.
  • Proficiency in at least one backend language (TypeScript/Node.js, Python, or Go) for developing automation tools, internal services, and reliability frameworks.
  • Deep expertise in AWS services (ECS, EKS, RDS, ElastiCache, SQS, VPC, IAM, CloudWatch).
  • Strong experience with Infrastructure as Code tools (Terraform, CloudFormation, or Pulumi), including modular design and state management.
  • Proven experience designing and maintaining CI/CD pipelines in both cloud and on-prem environments.
  • Solid understanding of container orchestration (Docker, Kubernetes, Helm) and distributed systems patterns such as circuit breakers, retries, and graceful degradation.
  • Experience operating production databases (PostgreSQL, Redis) and message queues.
  • Strong security knowledge covering network segmentation, encryption, secrets management, and incident response.

Nice To Haves

  • Preferred experience with real-time communication infrastructure (SIP, RTP, WebRTC), telecom systems, IoT pipelines, or satellite/low-bandwidth optimization environments.

Responsibilities

  • Designing and implementing SLI/SLO frameworks with error budgets to guide reliability and performance decisions.
  • Building and maintaining AWS-based production infrastructure using Infrastructure as Code (Terraform, CloudFormation), including ECS, EKS/Kubernetes, and microservices orchestration.
  • Developing internal tools, automation frameworks, and reliability services in TypeScript, Python, or similar languages to enhance operational efficiency.
  • Leading incident response processes, conducting root cause analyses, and creating automated runbooks to reduce MTTR.
  • Architecting and maintaining CI/CD pipelines for backend services, mobile applications, and IoT firmware across cloud and on-prem environments.
  • Implementing comprehensive observability using OpenTelemetry, distributed tracing, metrics exporters, and alerting systems.
  • Managing data services such as PostgreSQL (RDS), Redis/ElastiCache, SQS, and networking components (ALB/NLB, VPC, IAM).
  • Enforcing strong security standards, including IAM policies, encryption, secrets management, vulnerability management, and compliance auditing.

Benefits

  • Competitive compensation package
  • Flexible remote work environment with autonomy and ownership
  • Opportunity to build and scale critical communication infrastructure
  • Exposure to cutting-edge technologies across cloud, IoT, telecom, and distributed systems
  • High-impact role with direct influence on reliability and platform architecture
  • Collaborative, technically advanced engineering culture
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service