Senior Data Engineer, Platform & Pipelines

Natera

8h•$125,000 - $155,000•Remote

About The Position

We are seeking a Senior Data Engineer to join Natera’s Therapeutics & Innovations group, which focuses on leveraging Natera’s multimodal data assets to enable therapeutic development and scientific innovation. The group works with large-scale biomedical datasets to support therapeutic development, biomarker discovery, and translational research, and is in the process of building shared data foundations to unify and scale these efforts. This role is part of a broader initiative to develop a shared, platform-level data system that spans multi-modal data ingestion, backend services, AI-enabled data access, and web interfaces. The initial focus of the role is on designing and implementing robust data ingestion and transformation pipelines, with scope expanding over time into backend APIs, data-access layers, and LLM-driven analysis tools as the platform matures.

Requirements

BS in Computer Science, Bioinformatics, Computational Biology, or a related field, MS preferred
4+ years of experience in production data engineering or software engineering
Independently drive technical solutions from high-level goals, exercising judgment in system design, implementation, and tradeoff evaluation
Strong proficiency in Python, with experience writing maintainable, production-quality code across data and backend contexts
Extensive experience with software engineering fundamentals, design patterns, version control, CI/CD, Docker, and automated testing
Experience designing and operating workflow orchestration systems (Dagster preferred; Airflow, Prefect, or similar acceptable)
Experience building or contributing to backend services (e.g., FastAPI or similar frameworks)
Hands-on experience with AWS services commonly used in data and backend systems (e.g., S3, ECS, Batch, Lambda)
Experience deploying and operating large-scale data or bioinformatics pipelines in AWS, including managing throughput, cost, and operational reliability
Experience with relational databases (Postgres, MySQL) and/or graph databases (Neo4j), including schema and query design
Experience contributing to system-level architecture, including data modeling, service boundaries, and operational robustness
Ability to work effectively with scientists, bioinformaticians, and ML practitioners in an R&D environment

Nice To Haves

Experience integrating machine-learning inference outputs into data pipelines
Familiarity with LLM-based agents and associated frameworks such as LangChain
Familiarity with bioinformatics data formats and pipelines (e.g., FASTQ, BAM/CRAM, VCF, RNAseq, WES/WGS)
Experience with infrastructure as code (Terraform)
Experience with DNAnexus
Understanding of genomics, proteomics, or other omics data types and their downstream analytical use cases
Ability to evaluate build-vs-buy tradeoffs in fast paced environments

Responsibilities

Architect, implement, and maintain data ingestion and transformation pipelines using modern workflow orchestration tools (e.g. Dagster)
Identify, catalog, and integrate internal and external data sources used across research efforts
Operationalize bioinformatics pipelines that support large-scale batch processing, incremental updates, and backfills within AWS
Normalize and structure heterogeneous data into consistent, reusable representations that support downstream analysis, modeling, and querying
Populate and maintain patient-centric data models in shared storage systems (e.g., graph and relational databases)
Collaborate with backend and AI engineers to design data-access patterns that support analytics applications and AI-driven interactions
Contribute to backend services and APIs that expose integrated data to internal tools and applications
Participate in the evolution of AI-enabled analysis workflows, including tooling that supports LLM- or agent-based interactions with data
Contribute to system-level design decisions around data flow, service boundaries, reliability, and scalability
Write clean, tested, and well-documented Python code that meets production software engineering standards
Debug and resolve complex data quality, pipeline, backend, and infrastructure issues in a distributed environment

Benefits

Employee benefits include comprehensive medical, dental, vision, life and disability plans for eligible employees and their dependents.
Natera employees and their immediate families receive free testing in addition to fertility care benefits.
Other benefits include pregnancy and baby bonding leave, 401k benefits, commuter benefits and much more.
We also offer a generous employee referral program!

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume