About The Position

Are you a creative person who loves a challenge? Solve the complex puzzles you’ve been dreaming of as our Engineer. If you have a passion for innovation in tech, we want you on our team! Thrive in this crucial automation role. OCI is a technology leader that’s changing how we build, deliver and operate compute and AI infrastructure to our customers. We’re looking for an experienced and self-motivated person. We appreciate you taking the time to review the list of qualifications and to apply for the position. Come and join us! Building off our Cloud momentum in OCI Compute. This team is central to business success in building, scaling and operating some of the largest CPU and GPU infrastructure in the world. This role is essential part of operating at scale with at most excellence and relentless focus on automation and efficiency. It is a critical role which is expected to be a force multiplier to a large geographically distributed Cloud Operations organization. As a Senior Principal Site Reliability Engineer, you will be responsible for defining and deploying key services with deep focus on architecture, production operations, capacity planning, performance management, deployment, and release engineering. You will work with multiple cross-functional teams helping deliver new and outstanding experiences to our collaborators while ensuring reliability and performance.

Requirements

  • Developing/operating large scale distributed services / applications
  • Container administration and development applying Kubernetes, Docker, Mesos, or similar
  • Infrastructure automation through Terraform, Chef, Ansible, Puppet, Packer or similar
  • Prior experience or in-depth knowledge of AIOps to create change to operational efficiency.
  • Experience with CI/CD pipelines including VCS (git, svn, etc), Gitlab Runners, Jenkins, Rundeck
  • Working with or supporting production, test, and development environments for medium to large user environments
  • Experience in developing scripts to automate software deployments and installations using PowerShell or Bash
  • Knowledge of cloud compute technologies, network monitoring, data processing and analytics
  • Experience with a modern programming language such as Java, Python, or C++ or equivalent
  • Experience working with fault tolerant, highly available, high throughput, distributed, scalable systems
  • Experience operating services in one of the major Clouds such as AWS, OCI, Azure, etc

Responsibilities

  • defining and deploying key services with deep focus on architecture, production operations, capacity planning, performance management, deployment, and release engineering
  • working with multiple cross-functional teams helping deliver new and outstanding experiences to our collaborators while ensuring reliability and performance

Benefits

  • flexible medical
  • life insurance
  • retirement options
  • volunteer programs
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service