ML Ops Developer III

Ouro•Austin, TX

About The Position

The MLOps Developer will join a centralized MLOps Engineering team responsible for productionizing machine learning and generative AI workloads at enterprise scale. The role will drive the design, automation, deployment, observability, and governance of ML and LLM platforms using AWS SageMaker and Amazon Bedrock. This position requires close collaboration with Data Science (DS) teams to support model development, training, validation, and deployment into production. You will also be responsible for evolving and optimizing ML workflows, continuously improving automation, reliability, and security to meet emerging business and platform requirements.

Requirements

Advanced proficiency in Python
Strong experience in Terraform (IaC) for AWS infrastructure automation
Hands-on knowledge of CI/CD, DevOps, and deployment governance
Experience with AWS ML/AI ecosystem: SageMaker, Bedrock, IAM, VPC, EKS, Lambda, CloudWatch, cloud security, and monitoring
Practical experience with ML model deployment, endpoints, and production support
Solid understanding of cloud security, networking, logging, and observability
Knowledge of MLOps best practices and ML system design

Nice To Haves

7+ years of experience in AI/ML engineering or platform roles
Experience with AWS SageMaker endpoints, pipelines, and model hosting
Experience integrating, orchestrating, or governing LLM workloads using Amazon Bedrock
Prior experience with ML deployments in production environments
Familiarity with Terraform modules and EKS-based deployments
Knowledge of ML observability, monitoring, and failure detection
Experience in FinTech or enterprise data platforms is an advantage

Responsibilities

Architect, deploy, and operate development and production MLOps platforms on AWS (SageMaker, Bedrock)
Build and maintain CI/CD pipelines for ML model training and deployment
Implement Infrastructure as Code (IaC) using Terraform
Manage AWS cloud components, including IAM, VPC, EKS, Lambda, security, networking, monitoring, and compliance
Automate the end-to-end ML model lifecycle (training, deployment, endpoints, monitoring, and failure detection)
Configure and manage cloud observability (logging, alerts, dashboards – CloudWatch and monitoring tools)
Enable secure LLM onboarding, prompt orchestration, and governance using Amazon Bedrock
Ensure platform reliability, scalability, security, and regulatory compliance
Partner with DS and Engineering to support ML model productionization and release governance

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume