OpenAIposted 17 days ago
San Francisco, CA

About the position

At OpenAI, we’re building safe and beneficial artificial general intelligence. We deploy our models through ChatGPT, our APIs, and other cutting-edge products. Behind the scenes, making these systems fast, reliable, and cost-efficient requires world-class infrastructure. The Caching Infrastructure team is responsible for building a caching layer that powers many critical use cases at OpenAI. We aim to provide a high-availability, multi-tenant cache platform that scales automatically with workload, minimizes tail latency, and supports a diverse range of use cases. We’re looking for an experienced engineer to help design and scale this critical infrastructure. The ideal candidate has deep experience in distributed caching systems (e.g., Redis, Memcached), networking fundamentals, and Kubernetes-based service orchestration.

Responsibilities

  • Design, build, and operate OpenAI’s multi-tenant caching platform used across inference, identity, quota, and product experiences.
  • Define the long-term vision and roadmap for caching as a core infra capability, balancing performance, durability, and cost.
  • Collaborate with other infra teams (e.g., networking, observability, databases) and product teams to ensure our caching platform meets their needs.

Requirements

  • 5+ years of experience building and scaling distributed systems, with a strong focus on caching, load balancing, or storage systems.
  • Deep expertise with Redis, Memcached, or similar solutions, including clustering, durability configurations, client-side connection patterns, and performance tuning.
  • Production experience with Kubernetes, service meshes (e.g., Envoy), and autoscaling systems.
  • Think rigorously about latency, reliability, throughput, and cost in designing platform capabilities.
  • Thrive in a fast-paced environment and enjoy balancing pragmatic engineering with long-term technical excellence.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service