Staff DevOps Engineer

Witness AIMountain View, CA
1d$234,000 - $257,000Hybrid

About The Position

High-impact individual contributor role at the intersection of cloud infrastructure, developer experience, AI-augmented tooling, and platform security. You will own architecture and operations of a large-scale multi-tenant SaaS platform across AWS and GCP, partner directly with product engineering teams on new service design, and work autonomously from ambiguous problem to production solution.

Requirements

  • 10+ years in DevOps, Platform Engineering, or SRE in cloud-native SaaS environments.
  • Expert AWS (EKS, RDS/Aurora, IAM, VPC, Cost Management) with solid GCP experience.
  • Production Terraform at scale: modules, state, drift detection, multi-account patterns.
  • Advanced Kubernetes: RBAC, network policy, GitOps (ArgoCD/Flux), operators, and resource management.
  • Strong Go and/or Python — able to build and ship production-grade internal tooling.
  • Hands-on experience building tools with AI/LLM APIs integrated into engineering workflows.
  • Production SQL proficiency and NoSQL platform operations experience.
  • Demonstrated design and operation of large-scale, multi-cluster observability solutions.
  • DevSecOps: vulnerability management, supply chain security, compliance frameworks (SOC 2 / ISO 27001).
  • Self-directed: scopes ambiguous problems, drives to solution, and delivers independently.

Nice To Haves

  • Regulated industry background (Saas-based fintech, healthtech, AI infrastructure).
  • GPU infrastructure or AI model-serving experience (vLLM, SageMaker).
  • Open source development projects used by others

Responsibilities

  • Design and operate 50+ EKS/GKE clusters, multi-tenant compute, autoscaling, and cluster lifecycle management across AWS and GCP.
  • Own Infrastructure-as-Code (Terraform) for multi-account, multi-region environments spanning 200+ repos and services.
  • Architect and run end-to-end CI/CD pipelines (Harness, GitHub Actions, ArgoCD) with supply chain security, SBOM, and progressive delivery.
  • Build and operate large-scale observability stacks — metrics, logs, distributed traces — with OpenTelemetry across all clusters.
  • Embed DevSecOps controls: secret management, image signing (Cosign/Chainguard), OPA/Kyverno policy-as-code, and compliance automation.
  • Create AI/LLM-powered internal tools for platform operations, incident triage, drift detection, and CI/CD automation.
  • Operate SQL (Aurora/PostgreSQL) and NoSQL (DynamoDB, ClickHouse, Elasticsearch) platforms at scale, including DR and schema lifecycle.
  • Partner with dev teams from architecture review through production launch; produce ADRs, runbooks, and engineering standards.

Benefits

  • Hybrid work environment
  • Competitive salary and equity
  • Health, dental, and vision insurance
  • 401(k) plan
  • Opportunities for professional development and growth
  • Generous vacation policy
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service