Sr. Data Infrastructure Engineer

Evolv Technologies Inc.•Waltham, MA

4h•$129,000 - $209,000•Onsite

About The Position

Join Evolv as Senior Data Infrastructure Engineer in the Machine Learning & Sensors organization, responsible for building and operating the scalable, secure, and reliable data pipelines that power our AI/ML research and production systems. In this role, you will own the end‑to‑end data lifecycle—from collection on thousands to millions of edge devices, through cloud ingestion and processing, into a centralized data factory enabling model training, evaluation, and continuous improvement. Data is the backbone of our mission to deliver best‑in‑class AI‑based weapon detection systems. You will ensure that data flows seamlessly across geographies, devices, and cloud systems while meeting strict requirements for quality, privacy, security, and scale. This role is ideal for someone who thrives at the intersection of distributed systems, cloud pipelines, and ML‑driven data needs. Success in the Role: What performance outcomes will you work toward in the first 6–12 months? In the first 30 days: Develop a deep understanding of existing edge‑to‑cloud data pipelines and deployment environments. Review current data ingestion flows, governance policies, and cloud infrastructure. Assess pain points in data reliability, quality, and operational scalability. Build relationships with AI/ML, data science, field operations, and cloud engineering teams. Design and prototype data processing pipelines (both cloud and edge) Within the first three months: Design and implement improvements to core ingestion, validation, and processing pipelines. Deploy scalable data pipeline with AWS‑based components (S3, EC2, Lambda, Glue, Step Functions, SageMaker integrations). Introduce automated validation workflows to detect corruption, missing metadata, or malformed data. Design and implement automated model evaluation, model training and model improvement pipeline to speed up experiments Partner with field operations to improve data reliability, observability, and coverage across deployments. By the end of the first year: Own the entire lifecycle of mission‑critical data pipelines supporting AI/ML research and production. Architect next‑generation edge‑to‑cloud data systems that scale across millions of devices. Define and enforce data governance frameworks including retention, access control, privacy, and lineage. Enable ML teams to rapidly experiment through high‑quality, discoverable, versioned datasets.

Requirements

Bachelor’s or Master’s degree in Computer Science, Data Engineering, Software Engineering, or related field.
2-3+ years of experience building production data pipelines and data platforms that support AI/ML models.
Strong proficiency in Python, C++ and distributed data processing frameworks.
Hands‑on experience with AWS services including S3, EC2, SageMaker, and Glue.
Experience designing data systems that support large‑scale ML training and experimentation.
Knowledge of data governance, access control, and lifecycle management.
Experience collaborating with ML, data science, operations, and cloud teams.

Nice To Haves

Experience building pipelines spanning edge devices and cloud systems.
Background working with large‑scale sensor, image or IoT data.
Familiarity with data labeling tools and annotation workflows.
Experience implementing dataset versioning, lineage, and reproducibility systems.
Understanding of privacy, compliance, or regulated data environments.
Experience supporting global, multi‑region data platforms.

Responsibilities

Design, build, and maintain both research and production data pipelines spanning edge devices, cloud services, and centralized data platforms.
Own the full data lifecycle: collection, ingestion, processing, obfuscation, versioning, access, retention, and retirement.
Develop resilient ingestion pipelines capable of handling variable connectivity and device heterogeneity.
Support secure data transfer from the field to cloud storage systems.
Collaborate with field ops to enhance data coverage, observability, and operational robustness.
Implement privacy‑preserving transformations and obfuscation pipelines.
Build automated cleaning/validation steps to remove duplicates, detect corruption, and validate metadata.
Establish data lineage, retention policies, and access controls ensuring compliance and traceability.
Provide scalable data services for model training, evaluation, and research experimentation.
Support continuous data refresh and retraining workflows.
Integrate with data labeling services and annotation workflows.
Enable efficient access patterns for large‑scale ML workloads.
Build and optimize pipelines using AWS services (S3, EC2, SageMaker, Lambda, Glue, Step Functions).
Design for cost‑efficiency, performance, and reliability at scale.
Partner with AI/ML engineers, scientists, and data scientists to understand data requirements.
Translate feedback into automated improvements in data collection, labeling, and consumption.
Support cross‑functional teams in exploratory analysis and debugging data issues.
Design and manage data schema, data versioning and data factory updates
Architect systems that scale globally across millions of devices.
Ensure the data platform remains flexible for research and reliable for production operations.

Benefits

Equity as part of your total compensation package
Medical, dental, and vision insurance
Health Savings Account (HSA)
A 401(k) plan (and 2% company match)
Flexible Paid Time Off (PTO)- take the time you need to recharge, with manager approval and business needs in mind
Quarterly stipend for perks and benefits that matter most to you
Tuition reimbursement to support your ongoing learning and development
Subscription to Calm

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume