Senior Data Engineer, Data Science Infrastructure

Liberty Mutual InsuranceBoston, MA
8h

About The Position

Modeling Data Solutions is seeking experienced data engineers to join our Data Solutions job family. This is an exciting opportunity to join the US Data Science Infrastructure department helping to support creating cutting edge pricing programs. You will play a critical role in designing and developing the data solutions needed for research and development as well as providing front line data support in launching new products into market. This is a range posting. Level of position offered will be based on skills and experience at manager discretion. This role may have in-office requirements based on candidate location. In this role, you can expect to: Lead the design, development, and maintenance of scalable and efficient data pipelines and ETL/ELT processes to support analytics and pricing models. Architect and implement high performance data integration solutions using modern tools and cloud services (AWS preferred) with attention to latency, throughput and cost. Implement and maintain automated data quality and testing frameworks (unit, integration, regression, anomaly detection) to ensure data correctness and trust for downstream analytics Collaborate with cross functional teams, including data scientists, product analysts, software engineers, to gather requirements, design solutions, and provide technical guidance for production launches Optimize and tune data infrastructure for performance, scalability, reliability and cost-efficiency (query tuning, partitioning, resource configuration, storage formats such as Parquet/Delta). Mentor and guide junior data engineers, promoting engineering best practices, code review and thoughtful design. Stay up-to-date with advancements in data engineering, evaluate new tools and recommend adoption where appropriate. Identify opportunities to apply Gen AI for data discovery, lineage summarization, automated documentation, query generation and developer productivity (prompt engineering and code generation) Incorporate Gen AI capabilities into data workflows and developer tooling and help operationalize safe, cost-effective Gen AI integrations.

Requirements

  • Have strong technical aptitude with data extraction and data engineering platforms (Python, Spark/PySpark, SQL, cloud computing: AWS and Snowflake).
  • Demonstrate 5+ years of experience building production data pipelines and platforms, with both batch and streaming experience
  • Possess hands-on experience with data quality testing frameworks and practices, and experience implementing automated data tests and validation.
  • Have practical experience collaborating in Agile teams and applying Agile best practices (sprint planning, refinement, retrospective).
  • Have excellent communication skills and the ability to work with technical and non-technical stakeholders.
  • Strong written and oral communication skills required
  • 3-5 years experience in coding for data management, data warehousing, or other data environments, including, but not limited to, working in Python, SQL, ETL, Spark, Snowflake.

Nice To Haves

  • Familiarity with ML pipelines and supporting feature stores or feature engineering workflows (Databricks).
  • Prior mentoring or technical leadership experience.

Responsibilities

  • Lead the design, development, and maintenance of scalable and efficient data pipelines and ETL/ELT processes to support analytics and pricing models.
  • Architect and implement high performance data integration solutions using modern tools and cloud services (AWS preferred) with attention to latency, throughput and cost.
  • Implement and maintain automated data quality and testing frameworks (unit, integration, regression, anomaly detection) to ensure data correctness and trust for downstream analytics
  • Collaborate with cross functional teams, including data scientists, product analysts, software engineers, to gather requirements, design solutions, and provide technical guidance for production launches
  • Optimize and tune data infrastructure for performance, scalability, reliability and cost-efficiency (query tuning, partitioning, resource configuration, storage formats such as Parquet/Delta).
  • Mentor and guide junior data engineers, promoting engineering best practices, code review and thoughtful design.
  • Stay up-to-date with advancements in data engineering, evaluate new tools and recommend adoption where appropriate.
  • Identify opportunities to apply Gen AI for data discovery, lineage summarization, automated documentation, query generation and developer productivity (prompt engineering and code generation)
  • Incorporate Gen AI capabilities into data workflows and developer tooling and help operationalize safe, cost-effective Gen AI integrations.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service