Senior Systems Performance Engineer

Crusoe•San Francisco, CA

2d•$172,500 - $210,000•Onsite

About The Position

At Crusoe, we are pioneering the future of sustainable computing. We are seeking a Senior Performance Engineer to serve as a technical lead for the end-to-end hardware evaluation, reliability, and scaling of our AI infrastructure. You will be responsible for defining the performance roadmap of our next-generation cloud, ensuring that our SOTA (State-of-the-Art) AI models run with peak efficiency across diverse hardware architectures.

Requirements

5+ Years experience in end-to-end hardware evaluation, reliability, and scaling of our AI infrastructure
Large-Scale Systems: Proven experience in building and optimizing AI application systems for large-scale GPU infrastructure.
Architecture & Microarchitecture: Deep knowledge of x86 and ARM architectures, including competitive analysis of microarchitecture and performance-based validation.
Programming & Tooling: Expert-level proficiency in Python and C++. Experience with cycle-accurate simulators and hardware debuggers like Lauterbach Trace32 or ARM DS-5 is essential.
Low-Level Systems: Ability to write and debug ARMv8 assembly, implement data synchronization protocols (MESI/MOESI), and analyze RTL via simulation waveforms.
Security & HPC: Experience with performance modeling for secure environments (e.g., Intel SGX, TDX, VM Encryption) and high-performance computing benchmarks.

Responsibilities

Architectural Strategy: Lead the evaluation and establishment of New Product Introduction (NPI) across varied hardware architectures, focusing on Bare Metal and VM environments.
Full-Stack Optimization: Conduct deep-dive performance evaluations and workload characterizations across compute, memory, storage, and networking.
Performance Modeling: Develop sophisticated multi-variable projection models and frameworks to analyze system design options through KPI tradeoffs, such as Power and TCO (Total Cost of Ownership).
Hardware-Software Co-Design: Collaborate with external vendors to drive platform customization and optimize server/AI architectures for maximum performance-per-TCO.
Infrastructure Scaling: Design and implement 0-to-1 performance methodologies that allow the team to scale evaluation processes for large-scale GPU/AI data centers.
Industry Leadership: Actively engage in industry research and contribute technical insights to consortiums and standards committees to influence future hardware roadmaps.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume