Principal Data Engineer

SanasPalo Alto, CA
4d

About The Position

Weʼre looking for an experienced and forward-thinking Principal Data Engineer to lead the design and implementation of our end-to-end data infrastructure for industry leading Voice AI products. This is a high impact role where you will shape the technical vision, own strategic architecture decisions, and mentor a growing team of Data engineers focused on delivering reliable and scalable data systems for Machine Learning at scale. Youʼll work cross-functionally with AI research scientists, Infrastructure and product teams to ensure that data - from raw audio to training-ready features - is consistently accessible, compliant and optimized for speed and scale. Youʼll help push the boundaries of real-time Voice AI!

Requirements

  • 10+ years of experience in Data Engineering, Infrastructure, or ML Systems, with at least 2+ years in a technical leadership capacity.
  • Expertise in building distributed batch and real-time data systems
  • Expertise in Databases (like Postgres) andData Lakes (like Snowflake, Databricks and ClickHouse
  • Experience using Data Processing frameworks like Spark, Flink and Ray
  • Deep Experience with cloud platforms AWS/GCP, object storage (e.g., S3), and orchestrators like Airflow and Dagster
  • Strong knowledge of data lifecycle management, including privacy, security, compliance and reproducibility
  • Comfortable working in a fast-paced startup environment
  • Strategic mindset and proven ability to collaborate across engineering, ML and product teams to deliver infrastructure that scales with the business.

Nice To Haves

  • Familiarity with audio data and its unique challenges, like large file sizes, time- series features, metadata handling, is a strong plus
  • Experience with Voice AI models like ASR, TTS and speaker verification.
  • Familiarity with real-time data processing frameworks like Kafka, Flink, Druid and Pinot
  • Familiarity with ML workflows including: MLOps, feature engineering, model training and inference.
  • Experience with labeling tools, audio annotation platforms, or human-in-the- loop annotation pipelines.

Responsibilities

  • Architect and lead the development of large scale data pipelines and data lakes to ingest, transform and serve high quality data for AI model training, product telemetry and analytics.
  • Drive long‑term data infrastructure strategy across streaming and batch, feature store extensions, Iceberg/Delta lake choices, metadata management, and lakehouse evolution.
  • Drive platform and infrastructure decisions, optimizing compute fleets (e.g.Ray, Spark clusters), orchestration tooling Airflow, Dagster), and streaming stacks Kafka, Flink)
  • Collaborate with AI research scientists, engineering leads, product, finance, marketing, and legal to align data architecture with business and regulatory requirements.
  • Advocate best practices in data governance, lineage, observability, testing, tooling, and disaster recovery across pipelines and data stores.
  • Act as a mentor and technical leader - review design and code, share patterns, elevate team capability, and support recruitment and hiring
  • Drive build vs buy decisions for tools to implement data quality and observability solutions to achieve high data quality.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service