Staff Site Reliability Engineer

Zscaler•San Jose, CA

8h•$119,000 - $170,000•Hybrid

About The Position

Zscaler is a pioneer and global leader in zero trust security. The world’s largest businesses, critical infrastructure organizations, and government agencies rely on Zscaler to secure users, branches, applications, data & devices, and to accelerate digital transformation initiatives. Distributed across more than 160 data centers globally, the Zscaler Zero Trust Exchange platform combined with advanced AI combats billions of cyber threats and policy violations every day and unlocks productivity gains for modern enterprises by reducing costs and complexity. Here, impact in your role matters more than title and trust is built on results. We believe in transparency and value constructive, honest debate—we’re focused on getting to the best ideas, faster. We build high-performing teams that can make an impact quickly and with high quality. To do this, we are building a culture of execution centered on customer obsession, collaboration, ownership and accountability. We champion an “AI Forward, People First” philosophy to help us accelerate and innovate, empowering our people to embrace their potential. If you’re driven by purpose, thrive on solving complex challenges and want to make a positive difference on a global scale, we invite you to bring your talents to Zscaler to help shape the future of cybersecurity. Role We are looking for a Staff Site Reliability Engineer (Automation) to join our Engineering team. This is a hybrid role based in San Jose, CA (3 days in office), reporting to the Director, Site Reliability Engineer. You will be a key driver in provisioning and deploying new infrastructure, focusing heavily on infrastructure automation. Your expertise will help manage how customer traffic is routed within the cloud and ensure seamless troubleshooting across hardware and automated systems.

Requirements

5+ years of relevant experience in site reliability or systems engineering
Proficiency with Python or Ansible for automation tasks as well as proficiency with interacting with external APIs.
Demonstrated experience building and maintaining automation solutions
Strong background in systems administration, specifically with Linux or other major operating systems
Bachelor’s degree in Computer Science, a related field, or equivalent practical experience

Nice To Haves

Hands-on experience with Systems Kickstart using PXE and monitoring and observability tools like Prometheus, Grafana, or Nagios.

Responsibilities

Manage and maintain large-scale distributed systems using an infrastructure-as-code approach
Develop and enhance tools to automate the deployment and management of large-scale services, focusing on reliable system architecture and maintaining high code quality
Diagnose and resolve issues by editing code, adjusting infrastructure configurations, conducting performance and network analysis, and creating reusable tools
Develop automation solutions and manage services efficiently using version-controlled infrastructure-as-code
Support mission critical services and participate in on-call rotations as needed.