Data Ingestion Specialist

Thyme Care

About The Position

As a Data Ingestion Specialist on our Data Ingestion & Care Enablement (DICE) team, you’ll be part of the horizontal layer that keeps partner and vendor data flowing reliably into Thyme Care. In this position, you will collaborate closely with our Product Manager and other Data teammates focused on data ingestion and analytics engineering. You’ll work across many deals and data sources, supporting Data Scientist deal owners by making ingestion consistent, debugging failures, and raising the reliability bar through better tests, monitoring, and data contracts. In the course of your work you should also expect to: Gain a deep understanding of our data platform and contribute to improving our data models and pipelines using SQL, dbt, and python (generally data-focused packages, e.g., pandas, polars) Support ingestion of a wide range of healthcare-related sources (claims, eligibility, prior auth, ADT, etc.) by Configuring net-new ingestions (parsing file specs, validating assumptions, communicating inconsistencies) Debugging issues in ongoing ones Helping standardize our processes and pipelines Collaborate with data scientist deal owners and internal stakeholders to turn messy, ambiguous requirements into concrete mapping/validation logic and durable data contracts Use Dagster and GitHub Actions to orchestrate and automate the early stages of our data pipelines, improving run reliability and reducing manual intervention Work hands-on with raw data using Jupyter Notebooks in Databricks to investigate data issues, validate assumptions, and unblock processing Design and support incremental data loads (append/merge/upsert patterns) and safe reprocessing (idempotent runs, late-arriving data, backfills) Learn to use Datadog and PagerDuty to monitor pipelines, triage incidents during business hours, communicate impact clearly, and drive root-cause fixes to prevent recurrences Contribute to a complex, self-hosted dbt monorepo: implement transformations, incremental models, tests, documentation, and conventions that scale across deals

Requirements

Strong SQL skills
Familiarity with dbt (and an interest in ramping up your expertise), including working in larger/complex projects
Working knowledge of Python for data investigation in notebooks
Experience operating data pipelines: debugging failures, tracing issues across systems, and communicating clearly about root cause and mitigation
Experience with testing and data quality: writing and maintaining tests and using failures/alerts to drive durable fixes
Responsiveness and the ability to stay calm and organized when triaging failing ingestion runs or pipelines
Willingness to learn new domains and tools quickly (new partner file formats, evolving standards, Databricks), and apply feedback without ego
The ability to engage technical and non-technical stakeholders to explain what’s happening in our pipelines and identify opportunities to improve transparency and alerting

Nice To Haves

healthcare data exposure (claims/eligibility/ADT/etc.)

Responsibilities

Gain a deep understanding of our data platform and contribute to improving our data models and pipelines using SQL, dbt, and python (generally data-focused packages, e.g., pandas, polars)
Support ingestion of a wide range of healthcare-related sources (claims, eligibility, prior auth, ADT, etc.) by
Configuring net-new ingestions (parsing file specs, validating assumptions, communicating inconsistencies)
Debugging issues in ongoing ones
Helping standardize our processes and pipelines
Collaborate with data scientist deal owners and internal stakeholders to turn messy, ambiguous requirements into concrete mapping/validation logic and durable data contracts
Use Dagster and GitHub Actions to orchestrate and automate the early stages of our data pipelines, improving run reliability and reducing manual intervention
Work hands-on with raw data using Jupyter Notebooks in Databricks to investigate data issues, validate assumptions, and unblock processing
Design and support incremental data loads (append/merge/upsert patterns) and safe reprocessing (idempotent runs, late-arriving data, backfills)
Learn to use Datadog and PagerDuty to monitor pipelines, triage incidents during business hours, communicate impact clearly, and drive root-cause fixes to prevent recurrences
Contribute to a complex, self-hosted dbt monorepo: implement transformations, incremental models, tests, documentation, and conventions that scale across deals