As a Principal Member of Technical Staff (DevOps), you will play a pivotal role in building and operating the next-generation, AI-first Electronic Health Record platform. This role blends strong software engineering fundamentals with Site Reliability Engineering (SRE) and production engineering practices to deliver highly scalable, resilient, secure, and observable cloud-native services. You will design, develop, and own complex distributed systems end-to-end—from architecture and implementation to production operations, reliability, and continuous improvement. Working closely with technical leads and cross-functional teams, you will ensure services are built using modern engineering principles with a strong focus on availability, scalability, performance, operability, and cost-awareness. You will embed SRE practices such as SLI/SLO definition, error budgets, observability, incident response, and automated remediation into the development lifecycle. You will proactively improve system reliability through automation, data-driven insights, structured operational workflows, and production engineering excellence (including safe experimentation and resilience testing where appropriate). You will also leverage AI-assisted development tools to accelerate delivery, improve troubleshooting, and enhance engineering productivity—while maintaining rigorous standards for code quality, security, and reliability.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level