Data Engineer II

onX•Bozeman, MT

16h•$103,000 - $115,000•Remote

About The Position

The Data Engineer II plays a key role in building and evolving the onX Lakehouse. This role focuses on creating reliable, well-modeled data products on Google Cloud while improving how data is described, governed, discovered, and trusted across the organization. You will balance hands-on pipeline development with thoughtful design around schemas, metadata, and operational health. This is a role for engineers who think beyond moving data from A to B and care deeply about how data is understood, reused, and scaled in a modern analytics and AI-driven ecosystem. As an onX Data Engineer, your day to day responsibilities would look like: Lakehouse Development Build and maintain production-grade data pipelines that land and organize data into Apache Iceberg tables on Google Cloud. Apply best practices for data modeling, partitioning, and table design to support analytics, BI, and emerging AI use cases. Evolve existing pipelines while maintaining platform stability and reliability. Data Modeling & Quality Design clear, consistent schemas with an emphasis on usability, longevity, and downstream consumption. Implement data quality checks and validation patterns to ensure datasets are accurate, timely, and trustworthy. Make intentional tradeoffs between flexibility and structure as data products mature. Metadata & Data Discovery Enrich datasets with meaningful metadata, ownership, lineage, and freshness signals. Partner in building a shared data catalog experience that helps users understand what data exists, what it means, and how it should be used . Treat metadata as a first-class asset alongside the data itself. Cross-Functional Partnership Work closely with Business Intelligence and analytics partners to build curated domain datasets aligned with shared definitions and business logic. Collaborate with platform, analytics, and governance stakeholders to improve data consistency across domains. Operational Excellence Monitor pipeline performance, cost, and data freshness. Participate in incident response, root cause analysis, and continuous improvement of reliability (KTLO). Contribute to standards, patterns, and documentation that scale beyond individual pipelines.

Requirements

Strong foundations in data engineering with 2+ years of experience building and operating data pipelines in production.
Proficiency in Python and SQL , with the ability to write clear, maintainable, and testable code.
Hands-on experience with Google Cloud data services (e.g., BigQuery, Composer/Airflow, Dataflow, or equivalent).
Experience working with modern table formats or lakehouse concepts , such as Apache Iceberg or similar.
Comfort with software engineering best practices , including Git-based workflows, peer code reviews, and CI/CD.
A systems-level mindset that considers data modeling, metadata, cost, performance, and downstream consumers—not just pipeline execution.
Conceptual understanding of metadata, lineage, and governance , and why they matter in scalable analytics and AI-driven environments.
Curiosity about where data platforms are headed , including how AI systems rely on well-described, well-governed data to understand enterprise context.
Strong collaboration skills , with the ability to translate between technical implementation and business meaning.

Responsibilities

Build and maintain production-grade data pipelines that land and organize data into Apache Iceberg tables on Google Cloud.
Apply best practices for data modeling, partitioning, and table design to support analytics, BI, and emerging AI use cases.
Evolve existing pipelines while maintaining platform stability and reliability.
Design clear, consistent schemas with an emphasis on usability, longevity, and downstream consumption.
Implement data quality checks and validation patterns to ensure datasets are accurate, timely, and trustworthy.
Make intentional tradeoffs between flexibility and structure as data products mature.
Enrich datasets with meaningful metadata, ownership, lineage, and freshness signals.
Partner in building a shared data catalog experience that helps users understand what data exists, what it means, and how it should be used .
Treat metadata as a first-class asset alongside the data itself.
Work closely with Business Intelligence and analytics partners to build curated domain datasets aligned with shared definitions and business logic.
Collaborate with platform, analytics, and governance stakeholders to improve data consistency across domains.
Monitor pipeline performance, cost, and data freshness.
Participate in incident response, root cause analysis, and continuous improvement of reliability (KTLO).
Contribute to standards, patterns, and documentation that scale beyond individual pipelines.

Benefits

Competitive salaries, annual bonuses, equity, and opportunities for growth
Comprehensive health benefits including a no-monthly-cost medical plan
Parental leave plan of 5 or 13 weeks fully paid
401k matching at 100% for the first 3% you save and 50% from 3-5%
Company-wide outdoor adventures and amazing outdoor industry perks
Annual “Get Out, Get Active” funds to fuel your active lifestyle in and outside of the gym
Flexible time away package that includes PTO, STO, VTO, and 7 paid holidays annually

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume