Data Platform Engineer/Lakehouse Architecture

Infotree Global SolutionsWebster, MA
13h

About The Position

We are seeking an experienced Data Platform Engineer to design and implement a cloud-based data lakehouse platform that ingests engineering and security tool data, transforms it through multiple layers, and serves it to both analytics dashboards and AI agents.

Requirements

  • 8+ years in data engineering roles, with at least 2 years building lakehouse architectures (Bronze/Silver/Gold or equivalent medallion patterns)
  • Proven track record delivering production-grade data platforms
  • Experience with graph databases (Neo4j, Amazon Neptune, TigerGraph) for relationship modeling
  • Hands-on with stream processing (Kafka, Flink, Spark Streaming, Kinesis)
  • Cloud Platforms: Deep expertise in AWS, (S3/Blob, RDS/SQL Database, managed Kafka, serverless compute)
  • SQL & Data Modeling: Expert-level SQL, dimensional modeling, SCD2, normalization vs. denormalization trade-offs
  • Transformation Tools: dbt, Databricks SQL, Dataform, or custom SQL/Python frameworks
  • Programming: Python or Scala for data processing, scripting, and automation
  • Orchestration: Airflow, Prefect, Dagster, Step Functions, or Azure Data Factory
  • IaC: Terraform, CloudFormation, Pulumi, or ARM templates
  • Communication: Explain technical trade-offs (cost, performance, complexity) to non-technical stakeholders
  • Problem-Solving: Debug data quality issues, optimize slow queries, resolve schema conflicts
  • Collaboration: Work with data scientists, DevOps engineers, and compliance teams
  • Autonomy: Manage ambiguity; propose solutions when requirements are incomplete

Nice To Haves

  • Search: OpenSearch, Elasticsearch, or Solr for text indexing and retrieval
  • Graph: Neo4j Cypher, SPARQL, or Gremlin for graph queries; experience with graph ETL
  • Data Quality: Great Expectations, dbt tests, or custom validation frameworks
  • Real-time: Flink, Spark Streaming, or serverless event processing (Lambda, Cloud Functions)
  • Monitoring: Grafana, Datadog, or CloudWatch for data pipeline observability
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service