Data Platform Engineer

TreeswiftNew York, NY
10hHybrid

About The Position

You are a skilled and motivated Data Platform Engineer. You will: Design, build, and maintain data pipelines at scale. We run Apache Airflow 3 on Astronomer with pipelines that process terabytes of real-world physical data across many file types—imagery, audio, point clouds, and more. You will develop and evolve DAGs that orchestrate complex, multi-step workflows: dozens of tasks, fan out/in in pipelines, Python and Kubernetes operators split across generalized and specialized node pools, and dynamic DAG generation. You will work closely with our in-house ML team (feature pipelines and model deployment live in these DAGs) and coordinate with our hardware team on ingestion and formats. Scope is a mix of pipeline development and platform ownership and we are happy to adjust the scope and balance of responsibilities based on your interests and strengths. Help us scale and harden our data platform. We have one dedicated data engineer today; you will be the second. The broader engineering team is highly collaborative and you will work with members of the full-stack and machine learning teams. We are looking for someone to improve DAG design and execution, resource and cost tuning, reliability and observability, and contribute to how we run Airflow and Kubernetes in the cloud. If you enjoy writing pipelines and improving the platform that runs them, this role has room for both. Stay curious, collaborative, and cross-functional. We are a small team where many people wear multiple hats. You will work alongside ML engineers, hardware engineers, and software engineers. Turning a technically complex set of requirements from a critical industry into a rich data set is at the center of what we do. We take pride in managing complexity and providing high-fidelity data that our customers can use to make better-informed decisions. Be an owner at Treeswift; make the company better in whatever form that takes. We value the full picture you bring—whether that’s deep expertise in orchestration, a knack for debugging at scale, or hidden talents outside work. We launched our platform last fall and have only scratched the surface of what’s possible in terms of finding ways to add value for our customer. You will partner closely with some of the largest utilities in the country and contribute to efforts to develop new workflows in work planning, construction and disaster response. This is a full-time, hybrid role based out of our Lower Manhattan, NYC office (2 days per week in person, currently pinned to Tuesdays and Wednesdays).

Requirements

  • Bachelor’s degree in Computer Science, Computer Engineering, Math, or a related field (or equivalent experience).
  • 4+ years of data engineering or backend engineering experience with a focus on pipelines, orchestration, or platform.
  • Hands-on experience building and maintaining production data pipelines (e.g. Airflow, Prefect, Luigi, or similar). We use Python for our pipeline environment, machine learning, and developer tooling; we don’t require Python expertise and are happy for you to learn on the job.
  • Experience with cloud object storage and data-at-scale (we use AWS and S3; cloud experience is required, but prior AWS experience is not).
  • Comfort with Kubernetes and container-based deployments in practice: running workloads on K8s, resource and volume configuration, and debugging pod/worker issues.
  • Ability to own work end-to-end: design, implement, test, and operate pipelines and related tooling. You are comfortable picking up new parts of the stack when needed.
  • Strong collaboration and communication; you work well with ML, hardware, and product stakeholders and can explain tradeoffs clearly.

Nice To Haves

  • Experience in early-stage or fast-moving environments where scope and ownership evolve.
  • Experience with Apache Airflow (especially 3.x) and/or Astronomer.
  • Experience with geospatial data, imagery, lidar, or point clouds.
  • Interest in utilities, forestry, or field operations and how data pipelines support those domains.

Responsibilities

  • Design, build, and maintain data pipelines at scale.
  • Develop and evolve DAGs that orchestrate complex, multi-step workflows
  • Work closely with our in-house ML team and coordinate with our hardware team on ingestion and formats.
  • Improve DAG design and execution, resource and cost tuning, reliability and observability, and contribute to how we run Airflow and Kubernetes in the cloud.
  • Stay curious, collaborative, and cross-functional.
  • Be an owner at Treeswift; make the company better in whatever form that takes.
  • Partner closely with some of the largest utilities in the country and contribute to efforts to develop new workflows in work planning, construction and disaster response.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service