Senior Data Engineer, Platform & Pipelines

Natera
8h$125,000 - $155,000Remote

About The Position

We are seeking a Senior Data Engineer to join Natera’s Therapeutics & Innovations group, which focuses on leveraging Natera’s multimodal data assets to enable therapeutic development and scientific innovation. The group works with large-scale biomedical datasets to support therapeutic development, biomarker discovery, and translational research, and is in the process of building shared data foundations to unify and scale these efforts. This role is part of a broader initiative to develop a shared, platform-level data system that spans multi-modal data ingestion, backend services, AI-enabled data access, and web interfaces. The initial focus of the role is on designing and implementing robust data ingestion and transformation pipelines, with scope expanding over time into backend APIs, data-access layers, and LLM-driven analysis tools as the platform matures.

Requirements

  • BS in Computer Science, Bioinformatics, Computational Biology, or a related field, MS preferred
  • 4+ years of experience in production data engineering or software engineering
  • Independently drive technical solutions from high-level goals, exercising judgment in system design, implementation, and tradeoff evaluation
  • Strong proficiency in Python, with experience writing maintainable, production-quality code across data and backend contexts
  • Extensive experience with software engineering fundamentals, design patterns, version control, CI/CD, Docker, and automated testing
  • Experience designing and operating workflow orchestration systems (Dagster preferred; Airflow, Prefect, or similar acceptable)
  • Experience building or contributing to backend services (e.g., FastAPI or similar frameworks)
  • Hands-on experience with AWS services commonly used in data and backend systems (e.g., S3, ECS, Batch, Lambda)
  • Experience deploying and operating large-scale data or bioinformatics pipelines in AWS, including managing throughput, cost, and operational reliability
  • Experience with relational databases (Postgres, MySQL) and/or graph databases (Neo4j), including schema and query design
  • Experience contributing to system-level architecture, including data modeling, service boundaries, and operational robustness
  • Ability to work effectively with scientists, bioinformaticians, and ML practitioners in an R&D environment

Nice To Haves

  • Experience integrating machine-learning inference outputs into data pipelines
  • Familiarity with LLM-based agents and associated frameworks such as LangChain
  • Familiarity with bioinformatics data formats and pipelines (e.g., FASTQ, BAM/CRAM, VCF, RNAseq, WES/WGS)
  • Experience with infrastructure as code (Terraform)
  • Experience with DNAnexus
  • Understanding of genomics, proteomics, or other omics data types and their downstream analytical use cases
  • Ability to evaluate build-vs-buy tradeoffs in fast paced environments

Responsibilities

  • Architect, implement, and maintain data ingestion and transformation pipelines using modern workflow orchestration tools (e.g. Dagster)
  • Identify, catalog, and integrate internal and external data sources used across research efforts
  • Operationalize bioinformatics pipelines that support large-scale batch processing, incremental updates, and backfills within AWS
  • Normalize and structure heterogeneous data into consistent, reusable representations that support downstream analysis, modeling, and querying
  • Populate and maintain patient-centric data models in shared storage systems (e.g., graph and relational databases)
  • Collaborate with backend and AI engineers to design data-access patterns that support analytics applications and AI-driven interactions
  • Contribute to backend services and APIs that expose integrated data to internal tools and applications
  • Participate in the evolution of AI-enabled analysis workflows, including tooling that supports LLM- or agent-based interactions with data
  • Contribute to system-level design decisions around data flow, service boundaries, reliability, and scalability
  • Write clean, tested, and well-documented Python code that meets production software engineering standards
  • Debug and resolve complex data quality, pipeline, backend, and infrastructure issues in a distributed environment

Benefits

  • Employee benefits include comprehensive medical, dental, vision, life and disability plans for eligible employees and their dependents.
  • Natera employees and their immediate families receive free testing in addition to fertility care benefits.
  • Other benefits include pregnancy and baby bonding leave, 401k benefits, commuter benefits and much more.
  • We also offer a generous employee referral program!
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service