Senior Site Reliability Engineer - Remote

UnitedHealth Group•Eden Prairie, MN

14h•Remote

About The Position

Optum Tech is a global leader in health care innovation. Our teams develop cutting-edge solutions that help people live healthier lives and help make the health system work better for everyone. From advanced data analytics and AI to cybersecurity, we use innovative approaches to solve some of health care’s most complex challenges. Your contributions here have the potential to change lives. Ready to build the next breakthrough? Join us to start Caring. Connecting. Growing together. The Senior Site Reliability Engineer plays a significant role in implementing and maintaining highly available and scalable systems. This engineer will drive the seamless performance of consumer facing applications and their backends across a matrixed organization. This role will also collaborate with cross-functional teams to detect and communicate performance opportunities, develop service level indicators, participate in war rooms and establish error trending review process. This individual will support Optum Rx’s consumer-facing digital platforms. The team’s focus is on maintaining stable, high-performing applications while helping the organization scale and take on additional systems. You’ll enjoy the flexibility to work remotely from anywhere within the U.S. as you take on some tough challenges. For all hires in the Minneapolis or Washington, D.C. area, you will be required to work in the office a minimum of four days per week. You’ll be rewarded and recognized for your performance in an environment that will challenge you and give you clear direction on what it takes to succeed in your role as well as provide development for other roles you may be interested in.

Requirements

3+ years of experience in Site Reliability Engineering
3+ years of monitoring tool experience (Dynatrace, Splunk, Glassbox, Azure App Insight, DataDog)
3+ years of experience with code analysis, interpreting, and communicating of these impacts
3+ years of experience troubleshooting complex application issues
3+ years of experience participating and leading multi-application war rooms
2+ years of experience working with Azure cloud architecture

Nice To Haves

Good understanding of networking concepts (e.g. routing, firewalls, proxies) and technologies (e.g. HTTP, TLS, Nginx)
Ability to troubleshoot infrastructure issues on Cloud platforms
Automation of manual data collection and Excel report generation
Skilled in communicating with stakeholders across all organizational levels
Excellent at debugging complex systems

Responsibilities

Reduce UI and customer abrasion errors on CE ORx sites
Improving Net Promoter Score
Improve service uptime and resiliency
Enhance metrics reporting and clarity on service health and customer expectations
Monitoring, troubleshooting, detecting anomalies and resolve performance issues
Reviewing trends and page performances and provide solutions to bottlenecks
Looking for alerting patterns, setting Service Level Indicators and Service Level Objectives with the dev team, participate in war rooms, enhance observability
Proactive issue handling, alert threshold improvements, Adjusted Down Time Metric reduction, Raw Error Rate reduction, error trending review process