Sr. Principal Engineer, Systems Reliability

Ayar LabsSan Jose, CA
3h$206,000 - $300,000Onsite

About The Position

Ayar Labs is shattering AI data bottlenecks by moving data at the speed of light. As pioneers of co-packaged optics (CPO), we are using light instead of electricity to move data faster, further, and with a fraction of the energy needed to fuel the explosive growth of AI models. Backed by industry giants like NVIDIA, AMD, Mediatek and Intel and manufactured in partnership with the world’s leading semiconductor ecosystem, Ayar Labs’ co-packaged optics solution is key to unleashing next-generation AI scale-up architectures. The Sr. Principal Engineer, Systems Reliability will be a high-impact leader focused on one of the industry's greatest challenges: transitioning co-packaged optics (CPO) from cutting-edge innovation to mission-critical, high-volume deployments. You will be the bridge between silicon photonics physics and the massive operational scale of global AI infrastructure.

Requirements

  • Industry Tenure: 15+ years of experience in Systems Engineering, Reliability Physics, or Semiconductor Architecture, with a focus on Data Center or HPC environments.
  • Technical Mastery: Deep expertise in RAS frameworks and high-speed interconnect protocols such as PCIe, CXL, or UCIe , NVlink and UAlink.
  • Strategic Leadership: A proven track record of influencing Tier-1 Hyperscale partners and driving cross-functional technical strategies at a global scale.

Nice To Haves

  • Advanced Packaging: Familiarity with 2.5D/3D packaging technologies (e.g., CoWoS, SoIC) and their associated reliability challenges.
  • Domain Knowledge: Exposure to silicon photonics, laser physics, or optical networking.
  • Foundry Relations: Experience working with major foundries (e.g., TSMC) on advanced silicon integration.

Responsibilities

  • Hyperscale Integration & Strategy Lead the Technical Roadmap: Define how optical chiplets integrate into massive AI scale-up fabrics.
  • Tier-1 Collaboration: Partner directly with Hyperscalers to ensure Ayar solutions thrive in multi-rack GPU clusters.
  • Systems Engineering: Address the intersection of thermal management, power delivery, and mission profile alignment to ensure seamless integration.
  • RAS (Reliability, Availability, Serviceability) Framework Architect Standards: Translate Tier-1 customer requirements into a robust RAS architecture for silicon photonics systems.
  • Fault Management: Establish protocols for error correction (FEC), fault isolation, and link recovery.
  • Redundancy Logic: Design the redundancy strategy for our remote light source to ensure "always-on" performance in high-stakes environments.
  • Ecosystem & Foundry Collaboration Technical Liaison: Work closely with ASIC partners (e.g., Alchip, GUC) and foundries (e.g., TSMC) to optimize advanced packaging and interconnect reliability.
  • Industry Influence: Represent Ayar Labs at industry standardization forums to shape the future of optical interconnects.
  • Operational Telemetry & Predictive Analytics Real-Time Monitoring: Develop system-level monitoring and predictive analytics to track the health of optical links.
  • TCO Optimization: Use telemetry data to reduce downtime and optimize the Total Cost of Ownership (TCO) for data center operators.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service