Site Reliability Engineer Lead

U.S. Bank National AssociationHopkins, MN
1d

About The Position

The Site Reliability Engineering (SRE) Lead is accountable for the stability, resilience, and technology currency of a defined CIO portfolio. This role acts as the gatekeeper for production reliability and technology obsolescence, owning how incidents, risks, and lifecycle decisions are identified, prioritized, and resolved across product and platform teams. Operating at the intersection of program leadership (70%) and hands-on technical engagement (30%), the SRE Lead drives reliability outcomes through clear governance, disciplined execution, and targeted technical intervention. The role partners closely with engineering teams, product leaders, risk and change governance, and executive stakeholders to ensure systems are reliable, well-understood, and aligned with enterprise standards. The SRE Lead applies SRE, DevOps, SDLC, and ITSM best practices—leveraging platforms such as ServiceNow for incident, change, problem, risk, and lifecycle management—to reduce incidents, improve change quality, and mitigate obsolescence risk. Transparency is maintained through metrics, dashboards, and regular leadership reporting that clearly communicates dependencies, risks, progress, and outcomes.

Requirements

  • Bachelor's degree, or equivalent work experience
  • Eight or more years of relevant work experience in business and risk analysis, IT Service Management, production support, product/project management, or application development
  • Strong understanding of organizational processes, customer needs, and business operations
  • Demonstrated ability to influence outcomes without direct authority and foster strong partnerships across business and technology teams
  • Ability to solve complex problems with minimal guidance and drive decisions through influence

Nice To Haves

  • Experience managing or influencing technology lifecycle, obsolescence, or modernization initiatives by leveraging ITSM platforms such as ServiceNow
  • Ability to coach teams on best practices in planning, prioritization, and value measurement
  • Familiarity with strategic planning processes, including OKR formulation and roadmap alignment
  • Strong analytical skills with the ability to turn data into actionable insights
  • Excellent written and verbal communication skills, including executive‑level storytelling
  • Experience in developing portfolio reporting, dashboards, and financial or operational artifacts
  • Proficiency with portfolio or product management tools (e.g., Jira, ADO, Smartsheet, or similar) and ITSM platforms such as ServiceNow

Responsibilities

  • Own the CIO area’s End of Life (EOL)/ End of service (EOS) risk register, ensuring each item has a remediation path (upgrade, replatform, decommission) with dates, owners, and dependencies; manage exception requests with required remediation plans per enterprise standard.
  • Drive accuracy of Technology Roadmaps (TRM) and coordinate lifecycle updates/requests to keep product versions in approved phases and avoid tech debt
  • Define and steward service SLOs/error budgets; partner with product/platform teams to improve availability, latency, change failure rate, and MTTR. Align change practices with Enterprise ITSM Change Management and SDLC procedures.
  • Ensure runbook coverage & quality for critical services and that diagnostics/rollback steps are current; socialize standards using reference runbooks
  • Instrument and continuously improve health dashboards and alerting; enable proactive detection, meaningful SLO alerts, and executive health views tied to MBRs
  • Partner with Technology Risk Management /ITAM and other transformation programs/teams to reduce obsolescence and improve reliability through modernization
  • Lead cross team incident/postmortem debriefs; coach engineers on reliability patterns (SLOs, error budgets, autoscaling, chaos testing, etc.)
  • Shape CIO wide reliability/obsolescence OKRs; publish progress and insights in monthly tech health reviews.

Benefits

  • Healthcare (medical, dental, vision)
  • Basic term and optional term life insurance
  • Short-term and long-term disability
  • Pregnancy disability and parental leave
  • 401(k) and employer-funded retirement plan
  • Paid vacation (from two to five weeks depending on salary grade and tenure)
  • Up to 11 paid holiday opportunities
  • Adoption assistance
  • Sick and Safe Leave accruals of one hour for every 30 worked, up to 80 hours per calendar year unless otherwise provided by law
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service