Lead Site Reliability Engineer [Multiple Positions Available]

JPMorgan Chase & Co.•Plano, TX

About The Position

Duties: Troubleshoot, maintain, identify, escalate, and resolve application issues. Enable telemetry and alerts for complex enterprise applications for proactive monitoring. Develop tools and accelerators to reduce toil and process improvements. Ensure that production changes are made in light of best practices, lifecycle methodology, and overall risk. Partner with multiple teams for applications' performance or functional issues, troubleshooting, infrastructure service support, and change management. Collaborate with others to create and implement observability and reliability designs for complex systems that are robust, stable, and do not incur additional toil or technical debt. Perform site reliability principles and practices every day and adopt site reliability across multiple applications. Lead a team of Site Reliability Engineers (SREs), overseeing the product portfolio focused on reliability, observability, production support, and application maintenance.

Requirements

Bachelor's degree in Information Technology, Computer Science, or related field of study plus 5 years of experience in the job offered or as Lead Site Reliability Engineer, Infrastructure Engineer, IT Project Manager, Test Lead, IT Consultant, or related occupation
Troubleshooting application issues with distributed IT infrastructure
Implementing end to end monitoring using AppDynamics and Dyna Trace
Implementing automated solution using python scripts
Analyzing Non Functional Requirements to identify targeted response time as Service Level Objectives (SLO)
Log Analysis and Visualization using Splunk
Performance Testing using Performance Center, JMeter and Blazemeter
Performance Testing results analysis and reporting
Defect tracking and analysis using Quality Center
Amazon Web Services (AWS) platform

Responsibilities

Troubleshoot, maintain, identify, escalate, and resolve application issues
Enable telemetry and alerts for complex enterprise applications for proactive monitoring
Develop tools and accelerators to reduce toil and process improvements
Ensure that production changes are made in light of best practices, lifecycle methodology, and overall risk
Partner with multiple teams for applications' performance or functional issues, troubleshooting, infrastructure service support, and change management
Collaborate with others to create and implement observability and reliability designs for complex systems that are robust, stable, and do not incur additional toil or technical debt
Perform site reliability principles and practices every day and adopt site reliability across multiple applications
Lead a team of Site Reliability Engineers (SREs), overseeing the product portfolio focused on reliability, observability, production support, and application maintenance

Benefits

competitive total rewards package including base salary determined based on the role, experience, skill set and location
Those in eligible roles may receive commission-based pay and/or discretionary incentive compensation, paid in the form of cash and/or forfeitable equity, awarded in recognition of individual achievements and contributions
comprehensive health care coverage
on-site health and wellness centers
a retirement savings plan
backup childcare
tuition reimbursement
mental health support
financial coaching

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume