Infrastructure Engineer (AI Platforms)

GalaxyNew York, NY
6h$220,000 - $250,000

About The Position

We are building a large-scale, enterprise-grade AI platform that supports advanced machine learning systems end-to-end—from model development and training to secure, reliable, production deployment. We are looking for a senior infrastructure leader who has designed and operated modern ML platforms at scale and understands the operational, security, and reliability challenges of real-world AI systems. This role sits at the intersection of infrastructure engineering, MLOps, DevOps, and security, with a mandate to build systems that are fast for developers, safe for the business, and resilient in production.

Requirements

  • 10+ years of experience in infrastructure, platform, or systems engineering.
  • Deep, hands-on experience with Kubernetes in production.
  • Strong background in MLOps and ML platform operations.
  • Experience across DevOps, CI/CD, and cloud infrastructure (AWS, GCP, or Azure).
  • Proven ability to design secure, enterprise-grade systems.
  • Proficiency in Go, Python, Java, or similar.

Nice To Haves

  • Experience supporting GPU-accelerated workloads.
  • Familiarity with security and compliance frameworks.
  • Experience building internal platforms for multiple teams.
  • Prior work on enterprise-facing AI platforms.

Responsibilities

  • Architect and own the core infrastructure for AI and ML workloads, including training, inference, experimentation, and evaluation.
  • Design and operate Kubernetes-based platforms for scalable, reliable, and cost-efficient AI workloads.
  • Build and evolve MLOps pipelines: model training, versioning, deployment, monitoring, rollback, and lifecycle management.
  • Establish best practices for DevOps and CI/CD across data, ML, and application layers.
  • Lead security and compliance for AI systems: secrets management, access controls, isolation, auditability, and supply-chain security.
  • Ensure high availability, observability, and incident response for production AI services.
  • Partner closely with ML engineers, researchers, and application teams.
  • Set technical direction, review designs, and mentor engineers.

Benefits

  • Own and shape the foundation of a next-generation AI platform.
  • High-impact, high-autonomy role with architectural authority.
  • Competitive compensation and long-term growth.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service