About The Position

We are seeking an experienced AI HW Systems Validation Architect to serve as the technical authority for validation of next-generation AI server and rack-scale platforms. This role defines and drives the end-to-end validation architecture across blade-level and rack-level systems. The successful candidate will ensure comprehensive validation coverage across functional, electrical, networking, stress, and thermal domains to enable reliable hyperscale AI infrastructure deployments.

Requirements

  • Bachelor’s or Master’s degree in Electrical Engineering, Computer Engineering, or related discipline.
  • 15+ years of experience in server hardware validation or system engineering.
  • Proven experience validating board, blade, and rack-level server hardware platforms.
  • Strong knowledge of high-speed interfaces such as PCIe, CXL, DDR, NVLink, and Ethernet.
  • Experience developing validation methodologies and large-scale validation test plans.
  • Experience leading debug and failure analysis across complex systems.
  • Experience managing ODM validation programs including test planning and issue tracking.
  • Familiarity with liquid cooling validation and system-level thermal reliability.

Nice To Haves

  • Experience with ARM-based or x86 server architectures.
  • Background in rack integration testing and hyperscale deployment readiness.
  • Experience with automated validation frameworks and test data analytics.
  • Strong program leadership and cross-functional collaboration skills.

Responsibilities

  • Own the end-to-end validation methodology and technical strategy for AI hardware platforms across blade-level and rack-level systems.
  • Drive validation of rack-scale platforms covering functional, power, cooling, networking fabric, and system reliability.
  • Collaborate with rack validation teams to validate full rack configurations, power distribution, cooling loop integration, and system reliability.
  • Define and lead execution of comprehensive validation test plans for internal teams and ODM validation partners.
  • Ensure validation coverage aligns with architectural, electrical, and mechanical specifications across CPU, GPU, DDR, PCIe, storage, and networking interfaces.
  • Oversee liquid cooling validation including performance, leak detection, and long-term reliability of cooling hardware.
  • Lead debug and issue management across cross-functional engineering teams and external partners.
  • Establish validation dashboards, coverage metrics, and quality indicators to monitor execution progress.
  • Partner with architecture, silicon enablement, firmware, and operations teams to ensure robust system bring-up and production readiness.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service