Lead, AI Production Services

AECOMHouston, TX
2d$220,000 - $250,000Hybrid

About The Position

Own and build the enterprise AI Operations practice, ensuring all production AI, agentic, and automation solutions are reliable, observable, well-governed, and continuously improving. This is a hands-on leadership role responsible for building the AI Operations function from the ground up, including implementing frameworks, observability, and operational playbooks. This leader defines and operationalizes AI Ops standards, frameworks, and capabilities across the organization—establishing clear accountability, visibility, and control over AI systems at scale. Serve as the functional leader for AI Operations, partnering closely with Product, Engineering, AI Transformation, and Delivery teams to ensure AI solutions are production-ready, resilient, and aligned to enterprise standards. Drive the implementation of governance, observability, and service management practices required to safely scale AI across the business. This role establishes the operational foundation required to scale AI across the enterprise with confidence. By ensuring reliability, visibility, and control of AI and agentic systems in production, this leader enables widespread adoption while minimizing operational risk, controlling costs, and driving continuous improvement toward a more autonomous and efficient organization. This position will offer flexibility for hybrid work schedules to include both in-office presence and telecommute/virtual work, to be based from either Houston or Dallas, TX.

Requirements

  • Bachelor's Degree plus extensive years of experience in enterprise IT operations, service management, reliability engineering, or production support, including 6+ years of overall leadership experience, to include leading operations for AI/ML/agentic/production systems in large-scale environments, or demonstrated equivalent experience and education.
  • Proven experience defining and governing operations frameworks, standards, and operating models across teams.
  • Deep knowledge of AI/agentic production challenges (LLM observability, agent behavior governance, RAG/prompt drift, orchestration risks, cost management).
  • Expertise in ITIL practices, observability (e.g., Prometheus, Grafana, OpenTelemetry), incident/change management, SLAs, and supplier governance.
  • Strong background in risk management, FinOps/Cloud cost optimization, and executive-level reporting.

Nice To Haves

  • Hands-on experience with AI platforms like Azure AI Foundry, AWS Bedrock, LangChain, or UiPath in production.
  • Knowledge of Responsible AI operations, agentic risk governance, and emerging AI standards.
  • Background in site reliability engineering (SRE), DevOps, or enterprise architecture.
  • Experience with hybrid/multi-cloud environments and supplier management.
  • Advanced degree in Computer Science, Engineering, or related field.

Responsibilities

  • Own the Enterprise AI Operations Practice End-to-End
  • Drive Production Reliability, Support, and Governance
  • Lead Observability, Optimization, and Continuous Improvement
  • Establish Enterprise Reporting and Operational Reviews
  • Partner with Product, Delivery, and Technical Teams
  • Mature AI Operations as Capabilities Evolve
  • Build AI Operations Capability Across the Organization

Benefits

  • AECOM benefits may include medical, dental, vision, life, AD&D, disability benefits, paid time off, leaves of absences, voluntary benefits, perks, flexible work options, well-being resources, employee assistance program, business travel insurance, service recognition awards, retirement savings plan, and employee stock purchase plan.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service