Machine Learning Data Engineer

Allen Control SystemsAustin, TX
18h

About The Position

Allen Control Systems (ACS) is a cutting-edge defense startup founded by two ex-Navy electrical engineers with a proven track record in robotics and software. We are developing a small, autonomous gun turret that employs advanced computer vision and control systems to precisely target and neutralize small drones and loitering munitions. Our innovative approach requires overcoming significant technical challenges, making this an exciting and dynamic environment for experienced engineers. With an engineering-first culture, ACS values technical excellence and continuous learning. Backed by our founders' successful exits from two previous ventures acquired for a combined $180M in 2022, we are committed to ensuring that the groundbreaking technologies we develop have a real-world impact. Position Overview: We are seeking a Machine Learning Data Engineer who combines expert-level infrastructure skills with a strong knowledge of AI & Machine Learning principles. In this role, you will go beyond simple data validation scripts; you will apply your understanding of model training dynamics to design and implement existing and novel approaches to optimize our datasets. You will build and maintain large-scale image and video pipelines, but with a focus on data curation strategies—such as coreset selection, embedding-based filtering, and automated complexity scoring. You’ll partner closely with our ML engineers to orchestrate ingestion, synthetic data generation, and versioned releases, ensuring that every dataset is not only high-integrity and available but strictly optimized to maximize model performance.

Requirements

  • 3+ years of experience in data engineering or equivalent fields.
  • Proficient in using AWS for data management and processing.
  • Proficient in Python for scripting and data processing; proficient with SQL and Linux.
  • Educational Background: Bachelor’s or Master’s degree in Computer Science or a related field.
  • Solid understanding of data structures and systems design for orchestrating data-related workflows.
  • Proven ability to communicate well across engineering teams, and write and maintain effective documentation.

Nice To Haves

  • 5+ years of industry experience.
  • Experience in image/video data engineering for computer vision projects.
  • Experience with PyTorch DeepCore.
  • Experience with Unreal Engine.

Responsibilities

  • Design and own end-to-end image+video pipelines for computer vision model training: multi-source ingestion, QA and visualization, standardization, and organization.
  • Design and implement existing and novel approaches to optimize datasets for model training (e.g., hard example mining, class balancing, de-duplication, embedded-based filtering).
  • Develop and use synthetic data generation workflows to create realistic synthetic training data for computer vision models.
  • Coordinate collection of real-world data; coordinate label creation and QA with labelers.
  • Develop and use data quality tooling: metrics for balance, drift, and annotation error; active-learning sampling to target gaps; feedback loops from production back to curation.
  • Implement and own dataset versioning, release management, and lineage+metadata cataloging.

Benefits

  • Competitive salary
  • ACS Equity Package
  • Health, Dental, Vision Insurance
  • Paid Time Off
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service