Senior Software Engineer, Big Data

Zillow
4d$152,900 - $257,100Remote

About The Position

About the team At Zillow, data is one of our most valuable assets in helping customers unlock life’s next chapter. The Data Platform organization enables teams across Zillow to use that data efficiently, securely, and at scale. The Streaming team operates within Data Platform and serves as a technical backbone for Zillow’s real‑time systems. As a Senior Software Engineer, Big Data on this team, you will help drive complex initiatives for Zillow’s core real‑time streaming infrastructure and control planes, and you will influence architecture and operational standards across the broader Data Platform organization. This team owns the foundational runtime and developer abstractions for Kafka‑ and Flink‑based workloads that power tier‑0 and tier‑1 use cases across the company. We are modernizing Zillow’s streaming platform to deliver a reliable, scalable, and developer‑friendly ecosystem that supports event‑driven systems, real‑time analytics, and emerging AI‑driven applications. About the role This is an opportunity to shape the future of Zillow’s real‑time data platform and the experiences we deliver to customers. As a Senior Software Engineer, Big Data on the Streaming team, you will design and evolve the streaming infrastructure that hundreds of internal services rely on every day. You will independently lead the design and evolution of distributed streaming systems, control planes, and developer tooling that serve hundreds of internal services, with a focus on scalability, reliability, and long‑term platform sustainability.

Requirements

  • 5+ years of experience building and operating large‑scale distributed systems, including independently owning critical production systems end to end.
  • Significant production experience with Kafka and/or Flink, including performance tuning, state management, scaling strategies, and operational incident resolution.
  • Proficiency in at least one programming language such as Python, Java, or Scala.
  • Experience operating services in cloud environments (for example, AWS) and working with container orchestration platforms like Kubernetes.
  • Experience designing scalable, multi‑tenant systems with reliability, cost efficiency, and observability in mind.
  • Experience defining and operating against SLOs, participating in on‑call rotations, and leading incident response efforts.
  • Familiarity with infrastructure‑as‑code tooling such as Terraform and CI/CD systems.
  • Strong systems design skills, including the ability to reason about consistency, state management, fault tolerance, and throughput.
  • Experience collaborating across platform and product teams to define boundaries, contracts, and integration patterns.

Nice To Haves

  • Experience working with streaming vendors (for example, Confluent, MSK, Redpanda) or modernizing legacy Kafka/Flink infrastructure.
  • Demonstrated experience leading system design efforts for complex, multi-team platform initiatives.
  • Experience integrating streaming systems with analytics platforms such as Databricks or building real-time context engineering capabilities for AI systems.
  • Background in reliability engineering or platform engineering.

Responsibilities

  • Design, build, and operate large‑scale Kafka and Flink infrastructure supporting tier‑0 and tier‑1 workloads.
  • Lead critical initiatives in our streaming platform modernization, including platform architecture evolution.
  • Develop and enhance streaming control planes, APIs, CLIs, and provisioning systems that standardize how teams create and operate streaming resources across Zillow.
  • Improve platform reliability through SLO definition, monitoring, alerting, incident response, and automation.
  • Enable simplified stream processing patterns for product and engineering teams, reducing the need for bespoke infrastructure or specialized expertise.
  • Evaluate and integrate modern streaming ecosystem capabilities, including managed Kafka offerings, serverless stream processing, and real‑time AI integration patterns.
  • Make high‑quality architectural decisions under ambiguity, balancing reliability, cost, performance, and developer experience across competing priorities.
  • Mentor engineers and contribute to raising the bar on distributed systems design, operational excellence, and long‑term platform strategy.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service