Databricks / PySpark Data Engineer

Doran Jones•Dallas, TX

About The Position

We are seeking a hands-on Data Engineer with strong Databricks and PySpark experience to build scalable data pipelines and analytics applications within a modern data platform. This role focuses on modernizing legacy ETL and reporting systems (Teradata, Informatica, Tableau) into Databricks-native pipelines, dashboards, and Python-based data applications. The ideal candidate operates across both data engineering and lightweight application development.

Requirements

5+ years of hands-on experience building data pipelines using PySpark in production environments
Strong experience with the Databricks platform (workspaces, clusters, Jobs & Workflows, Unity Catalog)
Experience building analytics dashboards within Databricks (Databricks SQL)
Proven experience designing and building scalable ETL/ELT data pipelines
Strong Python development skills, including building REST APIs or data services
Experience building or supporting data-driven applications (not just traditional ETL pipelines)
Solid understanding of data modeling, including dimensional modeling and transformation patterns
Experience using AI-assisted development tools (e.g., Copilot, ChatGPT) in engineering workflows
Exposure to LLM integration or an AI-powered data application
Familiarity with cloud platforms (AWS, Azure, or GCP)

Nice To Haves

Experience migrating from legacy platforms: Informatica → Databricks / Spark; Teradata → cloud-native data platforms; Tableau (or similar) → Databricks-native dashboards
Experience with FastAPI or similar Python frameworks for data applications
Exposure to CI/CD pipelines for data engineering workflows
Understanding of microservices architecture and scalable application design
Experience in Healthcare Payor domain (e.g., claims processing, member data, provider data, eligibility, or billing systems)

Responsibilities

Design, develop, and maintain scalable data pipelines using PySpark on Databricks
Build analytics data models and transformation workflows for enterprise reporting and analytics
Migrate legacy ETL workloads from platforms such as Informatica and Teradata to Databricks
Develop Databricks-native dashboards and analytics applications to replace traditional BI tools
Build lightweight Python-based data applications (e.g., FastAPI) to expose and interact with data
Integrate Databricks pipelines with APIs and application services
Implement Slowly Changing Dimensions (SCD) and dimensional data modeling techniques
Develop reusable data engineering frameworks and standardized pipelines
Optimize Spark workloads for performance, scalability, and cost efficiency
Collaborate with analytics and business teams to deliver user-facing data solutions
Leverage AI-assisted coding tools (e.g., Copilot, ChatGPT) to improve development productivity
Contribute to best practices for modern data engineering and analytics application development