Cloud DevOps Engineer

Liminal•Salt Lake City, UT

1d•$180,000 - $210,000•Hybrid

About The Position

We’re looking for a Cloud DevOps & AI Ops Engineer who fully owns the infrastructure and operational lifecycle for our platform — from code deployment to production AI systems. You take end-to-end responsibility for how systems are built, deployed, scaled, and maintained in production. This is not a maintenance-only role. You will: Diagnose issues across cloud infrastructure, data pipelines, and AI systems Design and operate CI/CD pipelines for fast, reliable releases Build and manage scalable infrastructure on GCP using Terraform Implement and support AI-powered workflows, including LLMs and agent-based systems Monitor, debug, and optimize production systems across infrastructure and AI workloads You are both the infrastructure architect and the hands-on engineer, ensuring our systems — including AI — run reliably in production. This is a high-impact hire. You’ll define how infrastructure and AI systems operate at scale — establishing best practices, building automation, and shaping how engineering teams leverage AI in production. This role is based in Salt Lake City and reports to the VP of Engineering. What Success Looks Like In Your First 30 Days Audit existing infrastructure, CI/CD pipelines, and deployment workflows Understand current data pipelines, ML/LLM usage, and AI workflows Identify reliability risks, bottlenecks, and gaps in automation and observability Document system architecture and operational standards Propose improvements to increase stability, speed, and AI system reliability In Your First 90 Days Improve CI/CD pipelines to enable faster, safer deployments Deploy and manage infrastructure using Terraform and GCP best practices Implement monitoring and alerting across infrastructure and AI systems Support and productionize AI workflows and LLM-powered features Reduce manual work through automation and reusable tooling In Your First Year Build a scalable, repeatable infrastructure and AI operations framework Improve uptime, deployment frequency, and system reliability Establish DevOps and AI Ops best practices across engineering Enable reliable deployment of AI systems and agent workflows Serve as the go-to expert for infrastructure, performance, and AI system operations

Requirements

8+ years of experience in DevOps, cloud infrastructure, or platform engineering, or AI Ops within SaaS or cloud-based environments
Strong hands-on experience with: GCP (Cloud Run, BigQuery, etc.) Terraform or similar IaC tools CI/CD systems (GitHub Actions, GitLab CI/CD) Docker and Kubernetes Data pipelines and distributed systems
Experience working with AI systems, including: Deploying or supporting ML/LLM systems in production AI-assisted engineering tools (Claude Code, Cursor, Codex, etc.) Understanding of agent workflows or AI tooling ecosystems
Experience with monitoring, logging, and alerting systems (e.g., Datadog)
Strong scripting skills (Python, Bash, or similar)
Understanding of IAM, security, and cloud best practices
Ability to troubleshoot complex production issues across systems
Clear communication and collaboration skills
A bias toward ownership — you solve problems end-to-end

Nice To Haves

Experience with agent frameworks (LangChain, LangGraph, CrewAI, etc.)
Experience building AI-powered automation or internal tooling
Experience with serverless and cloud-native architectures
Experience with ML/LLM lifecycle management (evaluation, monitoring, versioning)
Experience scaling infrastructure in a high-growth environment

Responsibilities

Own the full lifecycle of infrastructure, deployment systems, and AI operations
Design, build, and maintain CI/CD pipelines (GitHub Actions, GitLab CI/CD)
Deploy and manage cloud infrastructure on GCP using Terraform
Build and maintain data pipelines supporting ML and AI workflows
Design and operate AI-powered workflows, including LLM integrations and agents
Support tool orchestration, prompt/context management, and AI-enabled systems
Build internal automation to improve engineering productivity using AI
Implement containerized systems using Docker and Kubernetes
Monitor and optimize systems using tools like Datadog
Troubleshoot production issues across: Cloud infrastructure CI/CD pipelines Data pipelines AI systems (latency, failures, reliability)
Partner with engineering, data, and product teams to productionize AI capabilities
Drive adoption of DevOps, AI Ops, and automation best practices