Senior AI Data Engineer

General Dynamics Information Technology
2d$165,750 - $224,250Onsite

About The Position

The Principal AI Data Engineer will support our AI team in Crystal City, VA. In this role, you will design, build, and operate data pipelines that ingest, store, and process high-volume, multi-source data primarily for modern AI/ML processes. You will partner with software, analytics, and product teams to create model-ready datasets (including features, embeddings, and prompts), implement scalable storage layers (such as a data lakehouse and vector stores), and enable low-latency retrieval for query, inference, and retrieval-augmented generation (RAG). As a Principal AI Data Engineer, you will turn raw, multi-source data into reliable, high-performance inputs that directly power AI models and advanced analytics. Your work will make it faster and easier for teams across engineering, analytics, and products to develop, deploy, and improve AI capabilities by ensuring datasets are ready and accessible.

Requirements

  • Experience with Apache Airflow for workflow orchestration.
  • Strong programming skills in Python.
  • Experience with ElasticSearch/OpenSearch for data indexing and search functionalities.
  • Understanding of vector databases, embedding models, and vector search for AI applications.
  • Expertise in event-driven architecture and microservices development.
  • Hands-on experience with cloud services (e.g. MinIO), including data storage and compute resources.
  • Strong understanding of data pipeline orchestration and workflow automation.
  • Working knowledge of Linux environments and database optimization techniques.
  • Strong understanding of version control with Git.
  • Current TS/SCI Clearance is required
  • 8+ years of related experience
  • Bachelor’s degree in Computer Science, Software Engineering, or a related field (or equivalent experience)
  • Work is onsite in Crystal City, VA with optional CONUS travel
  • Due to US Government Contract Requirements, only US Citizens are eligible for this role

Responsibilities

  • Design, develop, and implement scalable data pipelines and ETL processes using Apache Airflow, with a focus on data for AI
  • Build and tune search and retrieval capabilities using ElasticSearch/OpenSearch, including indexing strategies, schema mappings, and relevance/performance optimization.
  • Enable low-latency retrieval for AI inference and RAG applications by optimizing data access patterns, caching approaches, and index refresh strategies.
  • Collaborate with analytic teams to define requirements, schemas, and interfaces for downstream consumption.
  • Use Git for version control, peer code reviews, CI/CD workflows, and reproducible pipeline deployments across environments.
  • Operate within Linux environments and perform performance tuning across pipeline components, storage layers, and compute resources.

Benefits

  • Comprehensive benefits and wellness packages
  • 401K with company match
  • Competitive pay and paid time off
  • Full flex work weeks where possible
  • A variety of paid time off plans, including vacation, sick and personal time, holidays, paid parental, military, bereavement and jury duty leave.
  • Short and long-term disability benefits, life, accidental death and dismemberment, personal accident, critical illness and business travel and accident insurance
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service