Staff Cloud Hypervisor R&D

CrusoeSunnyvale, CA
9d$204,000 - $247,000Onsite

About The Position

At Crusoe, we are building the "engine room" of the AI revolution. We are seeking a Staff Cloud Hypervisor R&D Engineer to serve as the lead architect for our next-generation virtualization stack. In this role, you will move beyond standard virtualization to design a "Greenfield" hypervisor environment where GPUs, DPUs, and high-speed interconnects are first-class citizens. You will be responsible for eliminating the "virtualization tax," ensuring our cloud infrastructure delivers bare-metal performance for the world’s most demanding AI/ML workloads.

Requirements

  • 7+ Years of Experience: Proven track record in hypervisor internals, kernel development, or low-level systems programming.
  • Deep Virtualization Expertise: Expert-level knowledge of CPU virtualization (Intel VT-x, AMD-V) and memory virtualization (EPT/NPT, HugePages). You should be comfortable discussing the nuances of VMExit overhead.
  • Hardware-Software Integration: Experience working with specialized AI hardware, including GPUs, InfiniBand/RoCE NICs, and SmartNICs/DPUs.
  • Programming & Tooling: Mastery of C and C++ is required; proficiency in Rust for modern systems programming is highly preferred. Experience with QEMU, KVM, and Linux kernel debugging tools (perf, ftrace, eBPF).
  • I/O Mastery: Deep understanding of VirtIO, vhost-user, and hardware-accelerated I/O paths.
  • Technical Leadership: Experience leading complex, cross-functional projects that bridge the gap between hardware engineering and cloud control planes.

Responsibilities

  • Next-Gen Hypervisor Architecture: Lead the R&D and implementation of core hypervisor components (KVM, QEMU, or custom Rust-based solutions) specifically optimized for massive-scale GPU fleets.
  • AI Hardware Virtualization: Develop and refine advanced hardware pass-through and abstraction techniques (SR-IOV, VFIO, mdev) to ensure NVIDIA GPUs and BlueField DPUs operate with near-zero latency in a multi-tenant environment.
  • The "Holy Grail" Challenges: Solve high-stakes technical hurdles such as live migration for AI workloads with 80GB+ VRAM and optimizing PCIe peer-to-peer communication between virtualized accelerators.
  • Performance Research & Profiling: Conduct deep-dive bottleneck analysis across the entire stack—from CPU microarchitecture and MMU virtualization to guest OS scheduling—to minimize jitter and maximize throughput.
  • Open Source Leadership: Actively contribute to and maintain upstream open-source virtualization projects, positioning Crusoe as a thought leader in the Linux kernel and virtualization communities.
  • Security & Isolation: Architect robust security boundaries for AI-native cloud infrastructure, balancing high-performance hardware access with strict multi-tenant isolation and hardening.

Benefits

  • Industry competitive pay
  • Restricted Stock Units in a fast growing, well-funded technology company
  • Health insurance package options that include HDHP and PPO, vision, and dental for you and your dependents
  • Employer contributions to HSA accounts
  • Paid Parental Leave
  • Paid life insurance, short-term and long-term disability
  • Teladoc
  • 401(k) with a 100% match up to 4% of salary
  • Generous paid time off and holiday schedule
  • Cell phone reimbursement
  • Tuition reimbursement
  • Subscription to the Calm app
  • MetLife Legal
  • Company paid Commuter FSA benefit of $300 per month

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Mid Level

Education Level

No Education Listed

Number of Employees

251-500 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service