About The Position

We’re not just building better tech. We’re rewriting how data moves and what the world can do with it. With Confluent, data doesn’t sit still. Our platform puts information in motion, streaming in near real-time so companies can react faster, build smarter, and deliver experiences as dynamic as the world around them. It takes a certain kind of person to join this team. Those who ask hard questions, give honest feedback, and show up for each other. No egos, no solo acts. Just smart, curious humans pushing toward something bigger, together. One Confluent. One Team. One Data Streaming Platform. About the Role: As a Software Engineer on the Compute Platform team, you will be a key technical leader in building and evolving our next-generation, multi-tenant, cloud-native compute substrate that powers all of Confluent Cloud's diverse workloads.Our platform orchestrates workloads across thousands of Kubernetes clusters globally across all cloud service providers, providing a unified abstraction layer for scheduling, lifecycle management, and operational excellence. You'll work on critical systems including: Multi-Cluster Workload Orchestration: Build the control plane that manages workload placement, lifecycle, and state across multiple Kubernetes clusters per region Platform APIs & Abstractions: Design and evolve APIs that provide clean abstractions for polyglot workload management across diverse compute needs Cloud Platform Integration: Build and optimize deep integrations with the broader Confluent Cloud platform for seamless end-to-end operations Multi-Tenancy & Security: Implement and enhance workload isolation, network policies, and secure execution environments Observability & Operations: Drive operational excellence through monitoring integration, automated health checks, and self-healing capabilities As a senior technical leader, you think strategically and help drive end-to-end technical delivery from customer experience to scaling internal operations. You leverage your expertise in cloud-native distributed systems to take our platform to the next level while ensuring high availability, reliability, and security for our largest enterprise customers.

Requirements

  • 8+ years of experience delivering scalable software solutions
  • Proven track record of leading the delivery of large-scale, highly available, low-latency systems
  • Deep expertise in Kubernetes including controller development, operator patterns, and multi-cluster architectures
  • Strong proficiency in Go with experience building production-grade distributed systems
  • Experience with multi-tenant platform architectures and security isolation patterns
  • Familiarity with gRPC, Protobuf, and API design for internal platform services
  • Experience with observability tools and operational excellence practices
  • Experience with multi-cloud environments (AWS, GCP, Azure) and cloud-provider integrations
  • Track record of providing technical leadership and mentorship
  • Track record of working collaboratively across teams including product management, SRE, and other engineering teams
  • A smart, humble, and empathetic attitude with a strong sense of teamwork
  • Drive and excitement about the challenges of a fast-paced, innovative software environment

Responsibilities

  • Drive the overall technical charter for the Compute Platform, including multi-cluster orchestration, workload placement, and security architecture
  • Design and implement platform APIs and Kubernetes operators using Go to support evolving workload requirements
  • Work closely with product management and engineering leadership to build and drive the roadmap for Confluent's Compute Platform, enabling new business opportunities across Confluent.
  • Deliver high-impact initiatives in areas such as workload scheduling, disruption management, network isolation, rolling update strategies, and cross-cluster resource management
  • Lead technical design reviews and drive architectural decisions across organizational boundaries
  • Mentor and grow other engineers on the team through code reviews, pairing, and technical guidance
  • Own operational aspects including availability, reliability, performance monitoring, emergency response, and disaster recovery for our global compute infrastructure
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service