Data Engineer Intern

Plymouth Rock Assurance•Boston, MA

7d•$25 - $30

About The Position

At Plymouth Rock, a fast-growing, analytics-driven insurer, we believe data science can redefine how insurance operates. As a Data Engineer Intern with our Enterprise Advanced Analytics team, you will work with a team of top-tier data scientists to generate breakthrough insights that drive profitable growth, operational excellence, and competitive advantage. You will work with the team to help architect and manage the data pipelines that connect internal systems, third-party sources, and advanced analytical platforms, directly enabling our most strategic research initiatives.

Requirements

Currently pursuing or have completed a Master’s in Computer Science, Information Systems, Data Engineering (or related).
Experience (projects, internships, or coursework) building Python + SQL data pipelines or data-intensive workflows.
Familiarity with AWS services commonly used in data platforms, including S3, Glue, Step Functions (hands-on experience preferred).
Experience working with SQL Server (or similar relational databases) and writing performant SQL.
Comfort with Linux and bash; able to debug jobs and inspect logs.
Familiarity with CI/CD, Git, and basic testing practices.
Exposure to hybrid connectivity patterns (e.g., VPN/Direct Connect, hosted agents, secure credential management) or tools for extracting from on-prem sources.
Familiarity with data lake formats and practices (Parquet, partitioning, schema evolution).
Exposure to monitoring/alerting (CloudWatch, Datadog) and data quality frameworks/patterns.
Interest in ML data prep / feature pipelines for SageMaker.

Responsibilities

Assist in modernizing legacy data pipelines into Python-based, cloud-native workflows using AWS Step Functions, AWS Glue, and S3.
Build and improve ingestion pipelines that move data from SQL Server and other on-prem + cloud sources into AWS (e.g., S3-backed data lake).
Improve pipeline performance, efficiency, and scalability (e.g., partitioning strategies, incremental loads, reduced runtime/cost).
Implement data quality checks and operational safeguards to support reliable, repeatable (idempotent) runs (e.g., validation rules, anomaly checks, retry-safe processing).
Contribute to reusable Python modules for ingestion, transformations, quality checks, logging, and alerting.
Partner with data scientists/ML engineers to prepare curated datasets for SageMaker training/inference workflows.
Document code, workflows, and runbooks; keep documentation current as pipelines evolve.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume