About The Position

About the Organization: Our Technology Organization is a global community of problem solvers and innovators shaping the future of digital platforms and intelligent systems. We operate highly scalable, secure, and resilient cloud ecosystems that support mission‑critical services used by millions of users and partners worldwide. As part of our organization, you will work on large-scale distributed systems, cloud-native platforms, and AI-enabled services, solving complex challenges in reliability, automation, security, and developer productivity. Your work will directly influence how modern cloud and GenAI-powered platforms are designed, deployed, and operated at enterprise scale. The Opportunity We are seeking a seasoned DevOps Leader who is strategic, hands-on, and forward-thinking, with deep expertise in cloud technologies, platform engineering, and Generative AI enablement. In this role, you will lead the evolution of our DevOps and platform engineering capabilities—driving cloud modernization, CI/CD excellence, operational resilience, and AI-assisted automation. You will partner closely with Engineering, Product, Security, and Data teams to enable faster, safer, and smarter delivery of software at scale. This is a high-impact leadership role for someone who thrives at the intersection of technology strategy, people leadership, and engineering execution. The Work Itself: Define and execute the DevOps and Platform Engineering strategy to support cloud-native, microservices, and AI-driven workloads at enterprise scale. Design and govern highly available, secure, and scalable cloud architectures that meet stringent reliability, performance, and compliance requirements. Lead the adoption of Infrastructure as Code (IaC), CI/CD pipelines, and automated release management to accelerate delivery while improving quality and stability. Champion Site Reliability Engineering (SRE) principles, including observability, error budgets, incident management, and continuous reliability improvement. Enable Generative AI and ML workloads by building and operating platforms that support model training, deployment, inference, and experimentation. Drive the integration of GenAI capabilities into DevOps workflows (e.g., AI-assisted monitoring, incident response, pipeline optimization, and developer productivity). Partner cross-functionally with Engineering, Security, Product, and Data teams to ensure DevSecOps best practices are embedded across the software lifecycle. Establish and evolve platform standards, reference architectures, and reusable frameworks to improve consistency and reduce operational friction. Lead, mentor, and grow high-performing DevOps and platform engineering teams, fostering a culture of ownership, learning, and innovation. Influence organizational change by promoting cloud-first, automation-first, and AI-enabled operating models.

Requirements

  • 8+ years of relevant work experience with a Bachelor’s Degree or at least 5 years of experience with an Advanced Degree (e.g. Masters, MBA, JD, MD) or 2 years of work experience with a PhD, OR 11+ years of relevant work experience.
  • Strong background in CI/CD-driven DevOps practices, supporting n-tier and microservices-based architectures across development, staging, and production environments.
  • Hands-on expertise with major cloud platforms such as AWS, Azure, or Google Cloud, including compute, storage, networking, IAM, and managed PaaS services.
  • Advanced experience with Infrastructure as Code (IaC) using tools such as Terraform, CloudFormation, ARM, or Pulumi for automated, repeatable deployments.
  • Proven proficiency in containerization and orchestration technologies, including Docker, Kubernetes, Helm, and Kubernetes Operators, with experience operating clusters at scale.
  • Strong experience implementing and maintaining CI/CD pipelines using tools such as Jenkins, GitHub Actions, GitLab CI/CD, Azure DevOps, or similar, integrated with artifact repositories and security scanning.
  • Solid understanding of application runtime environments, including experience supporting Java/J2EE, Spring Boot, or other modern frameworks in cloud and containerized ecosystems.
  • Expertise in API-based and event-driven architectures, including RESTful services, JSON/XML, and messaging/streaming platforms such as Kafka, with experience in installation, configuration, monitoring, and troubleshooting.
  • Experience with observability and reliability engineering, including logging, monitoring, alerting, and tracing using tools such as Prometheus, Grafana, ELK/EFK, CloudWatch, or Azure Monitor.
  • Strong knowledge of databases and data platforms, including RDBMS (Oracle, SQL Server, PostgreSQL) and NoSQL technologies (MongoDB, DynamoDB, Cassandra).
  • Working knowledge of security best practices in cloud and DevOps, including IAM, secrets management, encryption, vulnerability scanning, and DevSecOps pipelines.
  • Experience supporting AI/ML or GenAI workloads, including deploying and operating model-serving pipelines, MLOps platforms, or LLM-based applications in cloud environments.
  • Proficiency with GenAI technologies, including experience integrating LLMs, AI APIs, vector databases, or AI-enabled automation into DevOps workflows.
  • Strong foundation in cloud computing concepts, with a deep understanding of IaaS, PaaS, SaaS, and public, private, and hybrid cloud deployment models.
  • Demonstrated ability to deliver highly reliable, scalable, and secure systems, with a strong commitment to quality, automation, and operational excellence.
  • Ability to manage multiple priorities and deliver across concurrent initiatives in a fast-paced, evolving environment.
  • Excellent communication, collaboration, and stakeholder engagement skills, including the ability to present technical concepts to both technical and non-technical audiences.
  • Proven experience working in Agile/Scrum environments, actively participating in sprint planning, reviews, retrospectives, and continuous improvement efforts.
  • Proven experience leading DevOps, SRE, or Platform Engineering teams in complex, enterprise environments.
  • Deep hands-on and architectural experience with AWS, Azure, or Google Cloud, including IaaS, PaaS, and managed services.
  • Strong background in CI/CD, Infrastructure as Code, containerization, and orchestration (Docker, Kubernetes, Helm).
  • Experience supporting AI/ML or Generative AI workloads, including model deployment, inference platforms, or AI APIs.
  • Demonstrated success building scalable platforms, automation frameworks, and developer enablement tools.
  • Strong understanding of DevSecOps, IAM, secrets management, compliance, and secure cloud architectures.
  • Comfortable challenging the status quo and driving cultural transformation toward automation and cloud-native practices.
  • Continuous curiosity to explore emerging technologies, especially in GenAI, platform engineering, and cloud optimization.
  • Strong partnership skills, working effectively with Product, Engineering, Security, Data, and Agile/Scrum teams.
  • Excellent ability to articulate complex technical concepts to executive, technical, and non-technical stakeholders.

Nice To Haves

  • 9 or more years of relevant work experience with a Bachelor Degree or 7 or more relevant years of experience with an Advanced Degree (e.g. Masters, MBA, JD, MD) or 3 or more years of experience with a PhD
  • Experience leading DevOps initiatives, mentoring engineers, or owning platform-level components is highly desirable.

Responsibilities

  • Provides strategic technical leadership for cloud infrastructure, DevOps tooling, CI/CD platforms, and operational practices.
  • Establishes and governs DevOps, SRE, and cloud standards, ensuring alignment with security, compliance, and business objectives.
  • Leads discussions with engineering and product leaders to define platform roadmaps and recommend scalable, cost-effective solutions.
  • Evaluates, selects, and drives adoption of cloud services, DevOps tools, and GenAI platforms aligned to organizational needs.
  • Designs and oversees end-to-end CI/CD pipelines, including build, test, security scanning, deployment, and rollback strategies.
  • Implements observability and reliability frameworks, ensuring proactive detection, rapid response, and continuous improvement.
  • Champions technology innovation, including proofs of concept (PoCs) for emerging tools and Generative AI use cases.
  • Guides teams in building self-service platforms, reusable modules, and automation frameworks that improve developer experience.
  • Ensures operational excellence through capacity planning, cost optimization (FinOps), incident postmortems, and resilience testing.
  • Keeps abreast of emerging cloud, DevOps, and AI trends and translates them into actionable platform improvements.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service