Principal Data Engineer

QXO•Greenwich, CT

About The Position

We’re looking for bold, entrepreneurial talent ready to help build something extraordinary — and reshape the future of building products distribution. QXO is a publicly traded company founded by Brad Jacobs with the goal of building the market-leading company in the building products distribution industry. On April 30, 2025, QXO completed its first acquisition: Beacon Building Products, a leading distributor in the sector. We are building a customer-focused, tech-enabled, and innovation-driven business that will scale rapidly through accretive M&A, organic growth, and greenfield expansion. Our strategy is rooted in delivering exceptional customer experiences, improving operational efficiency, and leveraging data, digital tools, and AI to modernize a historically under-digitized industry. We are seeking a highly experienced principal data engineer to design and implement a modern data engineering stack that enables scalable, efficient, and high-performance data processing. The ideal candidate will be responsible for creating and optimizing scalable data structures for both structured and unstructured data derived from our acquired businesses. This role is critical in supporting machine learning and AI applications, ensuring a robust, well-architected, and optimized data infrastructure that powers data science and AI teams to develop high-performing models. As a technical leader, the principal data engineer will architect and build large-scale data systems that efficiently process massive datasets for AI-driven applications. This role requires deep expertise in big data technologies, distributed systems, cloud architecture, and data pipeline optimization. The ideal candidate will drive the design, development, and implementation of next-generation data solutions that enable seamless AI model training, inference, and real-time data processing at scale.

Requirements

Bachelor's or Master’s degree in Computer Science, Engineering, or a related field.
10+ years of experience in data engineering, data architecture, or system design roles.
Proficiency in programming languages such as Python, Java, Scala, or Go.
Strong knowledge of SQL and NoSQL databases (PostgreSQL, MongoDB, Cassandra, etc.).
Hands-on experience with big data processing frameworks like Apache Spark, Flink, or Kafka.
Experience with cloud platforms (AWS, Azure, GCP) and data warehouses (Snowflake, Redshift, BigQuery).
Familiarity with containerization and orchestration tools (Docker, Kubernetes, Airflow, Prefect).
Experience integrating data from multiple acquired businesses.
Knowledge of data privacy and security best practices.
Familiarity with data lakehouse architecture and hybrid storage solutions.
Hands-on experience with Infrastructure-as-Code (IaC) tools like Terraform or CloudFormation.
Strong analytical and problem-solving skills with a keen eye for system scalability and performance tuning.
Excellent communication skills with the ability to mentor junior engineers and work effectively with cross-functional teams

Nice To Haves

Experience working with AI/ML pipelines and model deployment strategies is a plus.

Responsibilities

Design and implement a modern, scalable data infrastructure to support structured and unstructured data ingestion, storage, and processing needed to build forecast, optimization, and simulation models for various business areas, including supply chain, network design, inventory, S&OP, labor, assortment, and pricing.
Partner with scientists to create self-service data science tooling and leverage Agentic AI to integrate multiple models into a unified platform for business managers and analysts. Coach and train business analysts to use AI models and data science tooling effectively.
Design and implement data engineering solutions hands-on using AI, leading a high-caliber internal team of employees and contractors.
Define work for external partners/vendors and act as a product manager as needed.
Engage with scientists to understand the data needs of models and iterate rapidly on data requirements.
Work closely with scientists to determine the necessary data inputs for models and ensure their seamless integration.
Determine how downstream systems will consume AI models and outputs while collaborating with scientists (or vendors) to deploy models in production at scale.
Partner with application engineers to integrate AI model outputs into engineering systems efficiently.
Prioritize the value of data and AI solutions by working closely with business operations teams.
Build scalable, high-performance data systems and architect enterprise-wide solutions.
Design and implement machine learning and AI pipelines, ensuring optimal data acquisition, wrangling, and structuring using AI-driven techniques.
Use AI to develop highly scalable and automatable ETL (Extract, Transform, Load) pipelines to handle large volumes of data from diverse sources efficiently.
Leverage modern cloud-based solutions (AWS, Azure, GCP) and big data technologies (Spark, Hadoop, Snowflake, Databricks) to enhance data processing and storage.
Develop efficient data models while ensuring proper governance, data quality, security, and compliance with industry standards to support business forecasting, optimization, and simulations.
Architect and implement real-time streaming solutions and batch processing frameworks for AI model development.
Work closely with data scientists, AI researchers, software engineers, and business stakeholders to translate requirements into scalable data solutions.
Continuously monitor and optimize data infrastructure for cost-efficiency, scalability, and performance improvements.
Implement automation and CI/CD pipelines for data engineering workflows to improve deployment efficiency and reliability.
Establish best practices for data engineering, including testing methodologies, deployment procedures, and documentation standards.
Create and maintain data catalogs and metadata management systems to ensure data discoverability and understanding across the organization.

Benefits

401(k) with employer match
Annual performance bonus, long term incentive (equity)
Medical, dental, and vision insurance
PTO, company holidays, and parental leave
Paid Time Off/Paid Sick Leave: Applicants can expect to accrue 15 days of paid time off during their first year (4.62 hours for every 80 hours worked) and increased accruals after five years of service.
Paid training and certifications
Legal assistance and identity protection
Pet insurance
Employee assistance program (EAP)

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume