ML Ops Developer III

OuroAustin, TX
7d

About The Position

The MLOps Developer will join a centralized MLOps Engineering team responsible for productionizing machine learning and generative AI workloads at enterprise scale. The role will drive the design, automation, deployment, observability, and governance of ML and LLM platforms using AWS SageMaker and Amazon Bedrock. This position requires close collaboration with Data Science (DS) teams to support model development, training, validation, and deployment into production. You will also be responsible for evolving and optimizing ML workflows, continuously improving automation, reliability, and security to meet emerging business and platform requirements.

Requirements

  • Advanced proficiency in Python
  • Strong experience in Terraform (IaC) for AWS infrastructure automation
  • Hands-on knowledge of CI/CD, DevOps, and deployment governance
  • Experience with AWS ML/AI ecosystem: SageMaker, Bedrock, IAM, VPC, EKS, Lambda, CloudWatch, cloud security, and monitoring
  • Practical experience with ML model deployment, endpoints, and production support
  • Solid understanding of cloud security, networking, logging, and observability
  • Knowledge of MLOps best practices and ML system design

Nice To Haves

  • 7+ years of experience in AI/ML engineering or platform roles
  • Experience with AWS SageMaker endpoints, pipelines, and model hosting
  • Experience integrating, orchestrating, or governing LLM workloads using Amazon Bedrock
  • Prior experience with ML deployments in production environments
  • Familiarity with Terraform modules and EKS-based deployments
  • Knowledge of ML observability, monitoring, and failure detection
  • Experience in FinTech or enterprise data platforms is an advantage

Responsibilities

  • Architect, deploy, and operate development and production MLOps platforms on AWS (SageMaker, Bedrock)
  • Build and maintain CI/CD pipelines for ML model training and deployment
  • Implement Infrastructure as Code (IaC) using Terraform
  • Manage AWS cloud components, including IAM, VPC, EKS, Lambda, security, networking, monitoring, and compliance
  • Automate the end-to-end ML model lifecycle (training, deployment, endpoints, monitoring, and failure detection)
  • Configure and manage cloud observability (logging, alerts, dashboards – CloudWatch and monitoring tools)
  • Enable secure LLM onboarding, prompt orchestration, and governance using Amazon Bedrock
  • Ensure platform reliability, scalability, security, and regulatory compliance
  • Partner with DS and Engineering to support ML model productionization and release governance
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service