Staff Machine Learning Platform Engineer

FaireSan Francisco, CA
5dHybrid

About The Position

Faire is an online wholesale marketplace built on the belief that the future is local — independent retailers around the globe are doing more revenue than Walmart and Amazon combined, but individually, they are small compared to these massive entities. At Faire, we're using the power of tech, data, and machine learning to connect this thriving community of entrepreneurs across the globe. Picture your favorite boutique in town — we help them discover the best products from around the world to sell in their stores. With the right tools and insights, we believe that we can level the playing field so that small businesses everywhere can compete with these big box and e-commerce giants. By supporting the growth of independent businesses, Faire is driving positive economic impact in local communities, globally. We’re looking for smart, resourceful and passionate people to join us as we power the shop local movement. If you believe in community, come join ours. As a Staff Machine Learning Platform Engineer, you will help design, improve, and operate a scalable ML platform to accelerate model training, deployment, and governance. You are the technical bridge between data science and production engineering. You’ll be joining a small but deeply critical team that scales Faire’s ability to support tens of thousands of local businesses in a constantly narrowing retail landscape.

Requirements

  • 8+ years of experience building production ML or data platforms
  • A degree (preferably graduate level) in Computer Science, Engineering, Statistics, or a related technical field
  • Strong hands-on expertise with Databricks, Spark, Delta Lake, and MLflow.
  • Proficiency in Python, SQL, and distributed systems concepts
  • Experience with cloud platforms and infrastructure-as-code
  • Solid understanding of MLOps best practices: CI/CD, monitoring, reproducibility, and security
  • Experience supporting multiple ML teams in a shared platform environment
  • Are an active owner of orphaned problems and are willing to assimilate whatever knowledge you’re missing to get the job done

Responsibilities

  • Design and operate ML infrastructure, including workspaces, clusters, jobs, and workflows
  • Productionize ML workloads using Spark, Delta Lake, MLflow, and Databricks Workflows
  • Teach data scientists how to utilize our ML platform to advance development from notebook to production for our most critical models
  • Implement Unity Catalog for data governance, lineage, access control, and secure multi-tenant usage
  • Build CI/CD pipelines for ML using Terraform and Git-based workflows (e.g., GitHub Actions)
  • Optimize performance, reliability, and cost across training and inference workloads
  • Configure Identity and Access Management (IAM) and Role Based Authentication Controls (RBAC) for sensitive data sets
  • Establish observability for data quality, model performance, and platform health
  • Build and maintain ML Platform technical documentation
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service