Senior Data Engineer, Platform & Pipelines

Jobgether

3d•$125,000 - $155,000•Remote

About The Position

This role is a high-impact opportunity for a Senior Data Engineer to lead the design and development of scalable data pipelines and platform services. You will work with multi-modal biomedical datasets to enable analytics, AI-driven insights, and translational research. The role focuses on building robust ingestion, transformation, and backend services, while collaborating with AI, bioinformatics, and software engineering teams to create unified and reusable data models. You will contribute to both hands-on engineering and system-level design decisions, ensuring reliable, maintainable, and scalable data solutions. This position offers a fast-paced, collaborative environment where your work directly influences the effectiveness of scientific and therapeutic programs. The role is fully remote within the U.S.

Requirements

BS in Computer Science, Bioinformatics, Computational Biology, or a related field; MS preferred.
4+ years of production data engineering or software engineering experience.
Strong proficiency in Python and experience producing maintainable, production-quality code.
Hands-on experience with workflow orchestration systems (Dagster preferred; Airflow, Prefect, or similar acceptable).
Experience with backend services (e.g., FastAPI) and cloud infrastructure (AWS S3, ECS, Batch, Lambda).
Experience deploying and operating large-scale data or bioinformatics pipelines with attention to throughput, cost, and reliability.
Knowledge of relational (Postgres, MySQL) and graph databases (Neo4j), including schema and query design.
Ability to collaborate effectively with scientists, bioinformaticians, and ML practitioners in R&D environments.

Nice To Haves

experience with LLM-based agents
bioinformatics data formats (FASTQ, BAM/CRAM, VCF)
infrastructure-as-code tools (Terraform)
genomics or proteomics data

Responsibilities

Architect, implement, and maintain data ingestion and transformation pipelines using modern orchestration tools (e.g., Dagster).
Integrate and catalog internal and external data sources for use in analytics, AI workflows, and research applications.
Operationalize bioinformatics pipelines supporting large-scale batch processing, incremental updates, and backfills in cloud environments.
Normalize heterogeneous data into consistent, reusable formats for downstream analysis and querying.
Populate and maintain patient-centric data models in relational and graph databases.
Collaborate with AI and backend engineers to design APIs and data-access layers supporting applications and LLM-enabled workflows.
Contribute to system-level design decisions including data flow, service boundaries, reliability, and scalability.
Debug, resolve, and optimize complex data quality, pipeline, and infrastructure issues in a distributed environment.

Benefits

Competitive salary range: $125,000â$155,000 USD, dependent on experience and location.
Fully remote U.S.-based role with flexible work environment.
Comprehensive medical, dental, vision, life, and disability plans for employees and dependents.
Fertility and genetic testing benefits for employees and immediate family.
Paid parental leave and baby bonding leave.
401(k) retirement plan with company match.
Commuter benefits and employee referral programs.
Professional development and wellness opportunities.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume