Technical Program Manager

Cerebras SystemsSunnyvale, CA
20hOnsite

About The Position

Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. Our novel wafer-scale architecture provides the AI compute power of dozens of GPUs on a single chip, with the programming simplicity of a single device. This approach allows Cerebras to deliver industry-leading training and inference speeds and empowers machine learning users to effortlessly run large-scale ML applications, without the hassle of managing hundreds of GPUs or TPUs. Cerebras' current customers include top model labs, global enterprises, and cutting-edge AI-native startups. OpenAI recently announced a multi-year partnership with Cerebras, to deploy 750 megawatts of scale, transforming key workloads with ultra high-speed inference. Thanks to the groundbreaking wafer-scale architecture, Cerebras Inference offers the fastest Generative AI inference solution in the world, over 10 times faster than GPU-based hyperscale cloud inference services. This order of magnitude increase in speed is transforming the user experience of AI applications, unlocking real-time iteration and increasing intelligence via additional agentic computation.

Requirements

  • Master’s degree or foreign equivalent degree in Mechanical Engineering, Electrical Engineering, Computer Science, or a related field and 4 years of experience as Junior Mechanical Design Engineer, Mechanical Design Engineer, Mechanical Engineer, Technical Program Manager, Engineering Program Manager, or a related occupation required.
  • The required work experience must include 3 years of experience with the following:
  • Demonstrated knowledge of materials selection for thermal interfaces and structural components, and use Computer-Aided Design (CAD) to review mechanical drawings for accuracy and completeness;
  • Project management, risk management, and scheduling for engineering projects;
  • Bills of Materials (BOMs) management;
  • Demonstrated knowledge of Manufacturing processes for precision components, specifically knowledge of changes that can be made to part design to improve Design for Manufacturing and Design for Assembly; and
  • Demonstrated knowledge of FMEA (Failure Modes and Effects Analysis) for engineering processes.

Responsibilities

  • Create comprehensive program schedules for the network hardware design, Bill of Materials (BOM) creation, and deployment of Cerebras’s hardware systems for AI. Drive networking hardware designs and BOMs towards closure.
  • Architect and manage all project documentation and configuration control, including detailed BOMs, and 3D CAD assemblies. Author and review Engineering Change Orders (ECOs) within Product Lifecycle Management (PLM) tools. Apply advanced knowledge of mechanical systems, design for manufacturing, and thermal analysis to validate BOM content.
  • Oversee the engineering aspects of deploying hardware networking and front-end infrastructure racks, ensuring integration with Cerebras’s computing systems. Collaborate with software and hardware engineering teams to optimize system performance and reliability. Drive engineering efforts related to the deployment process, track technical progress, and address engineering risks by coordinating with leadership.
  • Define product architecture and system-level strategy, partnering with executive leadership, hardware architects, and product managers. Evaluate how intended use cases, such as AI training or inference, affect mechanical system requirements including heat load profiles, mechanical stress under load, and expansion interfaces.
  • Ensure specialized mechanical components with long manufacturing lead times (such as custom cooling hoses, precision machined system interfaces, and structural rack elements) are ordered early enough to maintain program schedule. Work with manufacturers to design parts for cost, reliability, and lead time.
  • Track daily activities during deployment of mechanical parts and designs in Cerebras's hardware systems by monitoring mechanical installation metrics such as cooling performance, structural stability, and thermal expansion within the specialized hardware system enclosures.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service