About this role: Wells Fargo is seeking a Payments Modernization Platform Engineer / SRE to join Payments & Liquidity Technology (GPLT), supporting the transformation of the high-value payments landscape as we modernize legacy wire platforms onto a new, event-driven, cloud-native architecture. This role sits at the intersection of engineering, service management, and reliability. You will be embedded early in the Payments Modernization lifecycle to ensure platforms are operable, resilient, observable, and supportable by design, not retrofitted after go-live. You will work closely with Application Engineering, Architecture and Service Operations teams to shape how new payment capabilities are built, tested, released, and stabilized at scale. Learn more about the career areas and lines of business at wellsfargojobs.com. In This Role, You Will Platform & Reliability Engineering Embed SRE and production engineering principles into Payments Modernization from design through early life support Define and validate non-functional requirements (NFRs) covering resilience, scalability, observability, recovery, and operability Drive replay, retry, and exception-handling validation for event-driven payment flows Lead capacity and performance testing, including volume growth and peak event scenarios (e.g. FedNow, CHIPS, SWIFT) Service Transition & Operational Readiness Own Permit-to-Operate readiness across environments (NFR Testing) Define cutover, shadow support, and early life support models Ensure runbooks, support procedures, on-call readiness, and escalation paths are production-grade before go-live Partner with Change Assurance to apply risk-based release controls, canary/blue-green strategies, and rollback automation Observability & Stability Implement end-to-end observability across Kafka, MongoDB, API layers, and downstream payment components Define and monitor SLOs, error budgets, and golden signals Reduce alert noise through signal design, correlation, and automation Analyze early defects and exception patterns (ACK/NACKs, business errors) to drive stabilization Chaos Engineering & Continuous Improvement Design and execute controlled failure testing (chaos engineering) to validate recovery patterns and blast radius Lead blameless RCAs, ensuring corrective actions are owned and recurrence is prevented Drive continuous service improvement (CSI) initiatives, including automation, resilience uplift, and technical debt reduction
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level
Education Level
No Education Listed