Cybersecurity Platform SRE – Senior Manager

Wells Fargo•Easton, OH

1d•Hybrid

About The Position

Wells Fargo is back in the office collaborating for fabulous outcomes! This role is a hybrid position and in the office three days a week. There is no visa sponsorship or visa transfers for this role. Wells Fargo is seeking a Cybersecurity Senior Manager, Executive Director to lead the Site Reliability Engineering (SRE) function for our Identity & Access Management (IAM) Platform Team supporting a 24×7 global enterprise. You will own the reliability, availability, performance, and security posture of mission‑critical IAM services (e.g., Access Administration, privileged access, authentication/authorization, directory services) that protect hundreds of thousands of users and millions of access decisions daily. This role blends platform engineering, reliability operations, and cyber risk management. You’ll lead a global team of SRE managers/engineers, partner with Cybersecurity, Cloud, and Line‑of‑Business technology teams, and drive measurable outcomes for resiliency, service level objectives (SLOs), operational excellence, and regulatory compliance. In this role, you will Own IAM platform reliability end‑to‑end: define SLOs/SLIs, error budgets, capacity plans, and resiliency roadmaps for core services (e.g., access administration, identity lifecycle, authentication, federation, PAM, directories). Lead a 24×7 follow‑the‑sun operation: build and mature global on‑call rotations, incident response, and Major Incident Management (MIM) practices with clear RACI and runbooks. Engineer for resiliency: champion chaos testing, failure mode analysis, multi‑region/high‑availability patterns, DR/BCP validation, and automated health checks. Automate everything: drive “operations as code,” CI/CD for platform changes, immutable infrastructure, policy‑as‑code, automated compliance checks, and self‑service tooling for developers and operators. Manage risk & compliance: ensure controls alignment (e.g., SOX, FFIEC, GLBA, PCI where applicable), identity governance, separation of duties, and auditable change management. Elevate observability: standardize logging, metrics, tracing, and actionable alerting; implement service catalogs, golden signals, and error‑budget policies. Optimize cost & performance: track usage, right‑size capacity, and tune platform configurations while maintaining security and reliability objectives. Lead people & culture: attract, develop, and retain diverse SRE talent; set objectives, coach managers/ICs, and foster a blameless post‑incident culture with continuous learning. Partner & influence: work with product owners, security architects, enterprise architecture, and risk partners to align roadmaps and deliver high‑impact outcomes. Stakeholder communication: deliver clear status, metrics, and executive‑ready updates on risk, reliability, and remediation programs.

Requirements

7+ years of Information Security Engineering experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education
3+ years of management or leadership experience
Deep experience operating Cybersecurity platforms at scale or similar experience. (e.g., identity lifecycle, authentication/federation, directory/PKI, privileged access, secrets, cyber defense).
Proven track record establishing SLOs/SLIs and error budgets, driving reliability improvements through automation and engineering.
Experience managing cloud (AWS/Azure/GCP) and/or containerized workloads (Kubernetes), infrastructure as code (Terraform/CloudFormation), and CI/CD.
Demonstrated leadership in Major Incident response, post incident reviews, and production change management/governance in regulated environments.
Strong understanding of security controls, identity governance, least privilege, and regulatory/audit expectations.
Excellent communication skills with the ability to influence senior executives and partner across Cybersecurity, Risk, and Engineering.

Nice To Haves

Experience with one or more enterprise IAM stacks (e.g., SailPoint/IGA, Okta/Azure AD/Entra ID, CyberArk/BeyondTrust, HashiCorp Vault, enterprise PKI, LDAP/AD).
Expertise with observability platforms (e.g., Prometheus/Grafana, Splunk, Elastic, OpenTelemetry) and SRE practices (golden signals, chaos engineering).
Proficiency in at least one programming/scripting language (Python, Go, PowerShell, Bash) and GitOps workflows.
Background with zero trust architectures, strong authentication (FIDO2, MFA), and modern authorization (OIDC/OAuth2, ABAC/RBAC).
Experience in financial services or other highly regulated industries.
Relevant certifications (e.g., CISSP, CCSP, CISM, SRE/DevOps, cloud provider certs).

Responsibilities

Own IAM platform reliability end‑to‑end: define SLOs/SLIs, error budgets, capacity plans, and resiliency roadmaps for core services (e.g., access administration, identity lifecycle, authentication, federation, PAM, directories).
Lead a 24×7 follow‑the‑sun operation: build and mature global on‑call rotations, incident response, and Major Incident Management (MIM) practices with clear RACI and runbooks.
Engineer for resiliency: champion chaos testing, failure mode analysis, multi‑region/high‑availability patterns, DR/BCP validation, and automated health checks.
Automate everything: drive “operations as code,” CI/CD for platform changes, immutable infrastructure, policy‑as‑code, automated compliance checks, and self‑service tooling for developers and operators.
Manage risk & compliance: ensure controls alignment (e.g., SOX, FFIEC, GLBA, PCI where applicable), identity governance, separation of duties, and auditable change management.
Elevate observability: standardize logging, metrics, tracing, and actionable alerting; implement service catalogs, golden signals, and error‑budget policies.
Optimize cost & performance: track usage, right‑size capacity, and tune platform configurations while maintaining security and reliability objectives.
Lead people & culture: attract, develop, and retain diverse SRE talent; set objectives, coach managers/ICs, and foster a blameless post‑incident culture with continuous learning.
Partner & influence: work with product owners, security architects, enterprise architecture, and risk partners to align roadmaps and deliver high‑impact outcomes.
Stakeholder communication: deliver clear status, metrics, and executive‑ready updates on risk, reliability, and remediation programs.