Platform Engineer (Reliability)

AdvancedMDSouth Jordan, UT
1dHybrid

About The Position

AdvancedMD is a unified cloud suite of medical office software hosted on Amazon Web Services/AWS including practice management, electronic health records, and patient engagement, and offers managed medical billing services for independent practices. AdvancedMD serves an expansive national footprint of 65,000 practitioners across 14,000 practices and 900 independent medical billing companies. 8.8M insurance claims are processed every month on the AdvancedMD billing platform! Role Summary Are you passionate about building reliable, scalable systems that power mission-critical applications? AdvancedMD is seeking a skilled and motivated Platform Engineer (Reliability) or (SRE) to join our growing ITSecOps organization. In this role, you’ll help bridge the gap between development and operations—applying software engineering principles to infrastructure and operations to improve reliability, performance, and efficiency across our cloud-based SaaS platform. As a Site Reliability Engineer, you’ll play a key role in ensuring system uptime, performance, and resilience through automation, observability, and proactive capacity planning. You’ll collaborate closely with Product, Engineering, and IT Operations teams to build and maintain reliable cloud-native systems using AWS, Kubernetes, Terraform, and modern monitoring tools. This is an exciting opportunity to join a healthcare technology leader where your technical expertise, problem-solving skills, and passion for automation will directly enhance the availability and performance of applications used by healthcare professionals nationwide. If you love tackling complex reliability challenges in a fast-paced DevSecOps culture, we’d love to have you on our team.

Requirements

  • Bachelor’s degree in Computer Science or related field, or equivalent professional experience
  • 3+ years of experience in a technical or operations engineering role in a highly regulated environment
  • Hands-on experience with cloud platforms. Primarily AWS (EC2, RDS, Route53, S3, ECS, Lambda, IAM, VPC, CloudFront) but Azure/GCP are a plus
  • Proficiency in one or more scripting or programming languages: PowerShell, Python, Bash, C#, Golang, or TypeScript
  • Experience managing Windows Server and SQL Server environments; familiarity with Linux administration (Ubuntu)
  • Experience with Infrastructure as Code (IaC) tools like Terraform, Ansible, or CloudFormation
  • Knowledge of containerization and orchestration technologies, such as Kubernetes and ArgoCD
  • Familiarity with source control (Azure DevOps) and work management tools (Jira, Confluence)
  • Experience with monitoring, APM, and log aggregation tools such as Splunk, Prometheus, Grafana, Nagios, CloudWatch
  • Familiarity with distributed tracing concepts and experience using OpenTelemetry to instrument, collect, and analyze telemetry data
  • Understanding of networking fundamentals, automation frameworks, and DevOps principles
  • Familiarity with AI tooling and its application in modern development environments to streamline coding and problem solving

Nice To Haves

  • You approach reliability like an engineer — automating your way out of repetitive tasks and designing systems that heal themselves.
  • You’re calm under pressure — when incidents occur, you bring structure, communication, and resolution without chaos.
  • You think in systems — spotting weak points, planning capacity ahead, and improving processes before issues arise.
  • You thrive in collaboration — partnering with developers, DBAs, and platform teams to deliver measurable improvements in performance and uptime.
  • You’re a lifelong learner — constantly exploring new tools, AWS services, and reliability practices to keep our systems modern, secure, and efficient.
  • You’re proactive — you don’t wait for alerts; you anticipate them.

Responsibilities

  • Ensure proper monitoring, alerting, and observability across production and development environments
  • Collaborate with Product, Engineering, and IT Operations teams to identify and resolve issues affecting application performance and stability
  • Design and build self-service tools and automation to reduce manual operational work and improve response times
  • Participate in Change Management and Incident Review processes, contributing to root cause analysis and long-term fixes
  • Develop and enhance operational SLOs, SLIs, and SLAs in partnership with engineering teams
  • Automate scaling and recovery processes to improve system resilience
  • Support services before they go live through design reviews, capacity planning, and operational readiness assessments
  • Participate in a shared on-call rotation to ensure 24x7 production system reliability
  • Continuously evaluate and adopt emerging technologies to optimize performance, cost efficiency, and automation
  • Contribute to a healthy and collaborative engineering culture through documentation, mentorship, and teamwork

Benefits

  • Competitive compensation and total rewards benefits
  • Comprehensive health, dental, and vision insurance
  • 401(k) with generous company match
  • Paid time off and holidays
  • Hybrid and remote work opportunities
  • Career growth and development support
  • Collaborative, team-oriented culture
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service