Principal GPU/NPU AI System Architect

Advanced Micro Devices, IncAustin, TX
21h

About The Position

At AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you’ll discover the real differentiator is our culture. We push the limits of innovation to solve the world’s most important challenges—striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. Together, we advance your career. THE ROLE: The AI Architect will define and drive end‑to‑end AI system architecture for embedded and edge platforms, with deep expertise in GPU/NPU micro‑architecture, AI software stacks, and model behavior. This role bridges silicon capabilities, system software, and AI models, enabling performant, power‑efficient, and safe AI deployments across robotics, automotive, and industrial markets. The architect will own technical solutioning from model selection through deployment, working closely with silicon, compiler, software, and product teams, and will represent the AI architecture vision with customers and partners. Location: Austin or San Jose THE PERSON: We are seeking a senior AI systems architect with deep expertise across GPU/NPU architecture, AI software stacks, and model behavior. This individual operates at the intersection of silicon, system software, and applied AI — translating real-world robotics, automotive, and industrial workloads into scalable, production-ready AI platform architectures. The ideal candidate combines hardware-aware AI model understanding with embedded deployment experience, and can drive full-stack architectural trade-offs across performance, power, memory, safety, and lifecycle constraints. They are technically hands-on when needed, yet comfortable influencing silicon roadmaps, guiding cross-functional teams, and representing architectural strategy with customers and ecosystem partners. This is a high-impact technical leadership role requiring strong architectural judgment, cross-functional influence without direct authority, and the ability to bridge research, productization, and long-term platform evolution.

Requirements

  • Deep expertise in GPU and/or NPU architecture and execution models.
  • Strong hands‑on experience with AI models and inference pipelines, not just framework usage.
  • Proven background in embedded / edge AI systems.
  • Strong understanding of hardware‑aware model optimization techniques.
  • Experience in robotics, automotive, or industrial AI domains.
  • Ability to translate customer problems into scalable architectural solutions.
  • Motivating leader with good interpersonal skills; cross‑functional & external leadership

Responsibilities

  • GPU / NPU Architecture & HW–SW Co‑Design
  • Develop deep architectural understanding of GPU, NPU, and heterogeneous SoC designs, including memory hierarchies, interconnects, scheduling, and power/performance trade‑offs.
  • Guide HW–SW co‑optimization strategies for AI workloads across vision, perception, planning, and control.
  • Influence silicon and platform roadmaps using model‑driven architectural insights from robotics, automotive, and industrial workloads.
  • Collaborate across silicon, system engineering, software, thermal/mechanical, security, and product teams.
  • Technically lead internal AI engineers and work closely with partners, ISVs, and customers.
  • Act as a technical authority and mentor, influencing architecture decisions without direct reporting authority.
  • Architect AI solutions with strong understanding of model internals (CNNs, Transformers, multi‑modal models, sensor fusion, perception stacks).
  • Evaluate and map model characteristics (latency, memory bandwidth, precision, sparsity) onto GPU/NPU execution.
  • Drive model optimization strategies (quantization, pruning, distillation, compilation flows) aligned with embedded constraints.
  • Model‑Aware AI System Architecture
  • Software Stack & Deployment Solutioning
  • Define and optimize AI software stacks spanning:
  • Frameworks (PyTorch, ONNX, TensorRT‑like runtimes)
  • Compilers, graph optimizers, and runtime schedulers
  • Drivers, firmware, and OS integration
  • Lead solutioning for edge and embedded deployment, including OTA updates, lifecycle management, and production‑grade robustness.
  • Ensure scalability from prototype → production → long‑term maintenance.
  • Domain‑Focused Architecture Leadership
  • Robotics: perception, localization, SLAM, manipulation, real‑time decision pipelines.
  • Automotive: ADAS, autonomous perception, sensor fusion, safety‑critical AI execution.
  • Industrial: vision inspection, predictive maintenance, autonomous systems, real‑time analytics.
  • Translate domain use‑cases into architectural requirements and reusable platform capabilities.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service