Senior AI Data Engineer

General Dynamics Information Technology

2d•$165,750 - $224,250•Onsite

About The Position

The Principal AI Data Engineer will support our AI team in Crystal City, VA. In this role, you will design, build, and operate data pipelines that ingest, store, and process high-volume, multi-source data primarily for modern AI/ML processes. You will partner with software, analytics, and product teams to create model-ready datasets (including features, embeddings, and prompts), implement scalable storage layers (such as a data lakehouse and vector stores), and enable low-latency retrieval for query, inference, and retrieval-augmented generation (RAG). As a Principal AI Data Engineer, you will turn raw, multi-source data into reliable, high-performance inputs that directly power AI models and advanced analytics. Your work will make it faster and easier for teams across engineering, analytics, and products to develop, deploy, and improve AI capabilities by ensuring datasets are ready and accessible.

Requirements

Experience with Apache Airflow for workflow orchestration.
Strong programming skills in Python.
Experience with ElasticSearch/OpenSearch for data indexing and search functionalities.
Understanding of vector databases, embedding models, and vector search for AI applications.
Expertise in event-driven architecture and microservices development.
Hands-on experience with cloud services (e.g. MinIO), including data storage and compute resources.
Strong understanding of data pipeline orchestration and workflow automation.
Working knowledge of Linux environments and database optimization techniques.
Strong understanding of version control with Git.
Current TS/SCI Clearance is required
8+ years of related experience
Bachelor’s degree in Computer Science, Software Engineering, or a related field (or equivalent experience)
Work is onsite in Crystal City, VA with optional CONUS travel
Due to US Government Contract Requirements, only US Citizens are eligible for this role

Responsibilities

Design, develop, and implement scalable data pipelines and ETL processes using Apache Airflow, with a focus on data for AI
Build and tune search and retrieval capabilities using ElasticSearch/OpenSearch, including indexing strategies, schema mappings, and relevance/performance optimization.
Enable low-latency retrieval for AI inference and RAG applications by optimizing data access patterns, caching approaches, and index refresh strategies.
Collaborate with analytic teams to define requirements, schemas, and interfaces for downstream consumption.
Use Git for version control, peer code reviews, CI/CD workflows, and reproducible pipeline deployments across environments.
Operate within Linux environments and perform performance tuning across pipeline components, storage layers, and compute resources.

Benefits

Comprehensive benefits and wellness packages
401K with company match
Competitive pay and paid time off
Full flex work weeks where possible
A variety of paid time off plans, including vacation, sick and personal time, holidays, paid parental, military, bereavement and jury duty leave.
Short and long-term disability benefits, life, accidental death and dismemberment, personal accident, critical illness and business travel and accident insurance

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume