Director, Software Engineering, Container Orchestration & Scheduling Services

Amazon•Seattle, WA

About The Position

Imagine leading the engineering teams behind infrastructure that quietly powers some of the world's most demanding workloads - processing billions of container launches every week, orchestrating compute resources across the globe, and enabling customers to focus on innovation rather than infrastructure complexity. That's the opportunity in front of you. We're looking for a Director of Engineering who thrives who thrives in the demanding world of massive scale, operational rigor, and customer obsession. You'll lead the teams building and operating container orchestration platforms that our customers depend on for their most critical applications - the kind where every millisecond of latency matters and where customers expect always-on reliability and we deliver. You'll be driving the technical evolution of container technologies, workload scheduling systems, and platform infrastructure that needs to stay ahead of exponentially growing customer demands. Your teams will tackle challenges like supporting diverse customer workloads while balancing security with speed ,optimizing container performance at unprecedented scale, and building scheduling algorithms that efficiently pack workloads while maintaining strict isolation guarantees. You'll have the autonomy to shape both business and technical strategy, the resources to build world-class teams, and the direct customer impact that comes from operating infrastructure at a massive scale. Your decisions will influence how billions of containers are launched, scheduled, and managed every week. We need someone who gets energized by operational excellence, who sees on-call rotations not as burdens but as opportunities to build more resilient systems, who treats every incident as a learning moment, and who believes that the best way to serve customers is to obsess over the details that make systems reliable, fast, and delightful to use. If you're the kind of leader who can translate complex distributed systems challenges into clear technical roadmaps, who mentors engineers to think like owners, and who won't rest until your services achieve operational excellence that sets industry benchmarks, we should talk.

Requirements

Experience designing, building, operating, and managing large-scale distributed systems or web services
15+ years of software engineering experience with 5+ years in engineering leadership roles
Deep expertise in container technologies (Docker, containerd, runc) and orchestration systems
Strong understanding of Linux internals, virtualization technologies, and infrastructure automation

Nice To Haves

Experience in Kubernetes, Docker or containers ecosystem
Background in kernel development, systems programming, or low-level infrastructure
Knowledge of security best practices for multi-tenant container environments
Experience with cost optimization and resource efficiency at scale

Responsibilities

Lead engineering teams responsible for building and operating massive-scale application lifecycle platforms serving thousands of enterprise customers
Drive innovation in workload scheduling, resource allocation, and multi-tenant isolation technologies
Drive technical vision and architecture for next-generation container runtime environments, image management systems, and workload scheduling infrastructure
Establish operational excellence standards for systems processing billions of container launches weekly with industry-leading reliability metrics
Own the operational health of large-scale distributed systems with stringent SLA requirements (99.95%+ availability)
Build and scale teams focused on system reliability, performance optimization, and customer experience
Implement comprehensive monitoring, alerting, and automated remediation strategies for complex distributed workloads
Drive continuous improvement in operational metrics including latency, throughput, and resource efficiency
Partner with enterprise customers to understand their container workload requirements and pain points
Translate customer needs into technical roadmaps and feature priorities
Ensure world-class customer experience through proactive monitoring, rapid incident response, and continuous service improvements
Build mechanisms for gathering and acting on customer feedback at scale
Build, mentor, and grow high-performing engineering teams across multiple locations
Foster a culture of ownership, innovation, and operational excellence
Establish engineering best practices, code quality standards, and technical review processes
Develop talent pipeline and succession planning for critical technical roles

Benefits

health insurance (medical, dental, vision, prescription, Basic Life & AD&D insurance and option for Supplemental life plans, EAP, Mental Health Support, Medical Advice Line, Flexible Spending Accounts, Adoption and Surrogacy Reimbursement coverage)
401(k) matching
paid time off
parental leave

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume