About The Position

Senior GenAI & High Performance Computing (HPC) Delivery Engineer Dell Technologies has delivered HPC solutions for 25+ years, including support for Bright Cluster Manager (now NVIDIA BCM) since 2011. Today, Dell is NVIDIA’s preferred partner for GenAI Factory systems, using Dell GenAI PowerEdge XE servers and NVIDIA NVAIE to help customers build and scale end to end GenAI and High-Performance Computing environments. Join us to do the best work of your career and make a profound social impact as a Senior GenAI & High Performance Computing (HPC) Delivery Engineer on our Service Delivery Team in Austin, Texas. What you’ll achieve We’re seeking a Senior GenAI & HPC Engineer with deep experience in GPU accelerated systems, Linux performance tuning, and benchmarking. This role is highly hands on and customer facing, supporting onsite deployments across the U.S. for advanced HPC and GenAI solutions. You will work as a part of a team to help build, integrate, and test some of the world’s largest multi GPU systems, benchmark them using industry standard tools, and deliver the next generations of AI and HPC infrastructure.

Requirements

  • 7+ years with HPC or GenAI clusters, GPU based systems, AI infrastructure, or related fields
  • Deep hands on experience with GPU deployment, configuration, and multi-node testing using NVIDIA Base Command Manager
  • Proficiency with benchmarking tools: HPL, STREAM, NCCL, RCCL, MxP, OSU Microbenchmarks
  • Red Hat certification (RHCSA/RHCE) or 7+ years of relevant RH distros experience
  • Experience with GenAI/HPC networking (InfiniBand and/or RoCE)
  • Experience working in Linux based parallel computing environments at scale
  • Experience with containers/orchestration (Docker, Singularity/Apptainer, Kubernetes, Slurm)
  • Ability to travel up to 70% of the time across the U.S. as needed for projects
  • Strong customer facing and communication skills

Nice To Haves

  • Bachelor’s degree
  • NVIDIA certifications (NCA, NCE, DGX)
  • Experience with NVIDIA UFM, Infiniband, and SpectrumX fabrics
  • Exposure to hybrid cloud or GPU cloud environments
  • Experience with GPU observability/performance profiling tools

Responsibilities

  • Deploy, configure, and validate GPU accelerated compute clusters for AI, ML, and HPC with NVIDIA Base Command Manager (Warewulf and OpenHPC knowledge are a plus)
  • Perform benchmarking with HPL GPU, HPL MxP, STREAM, NCCL, RCCL, OSU Microbenchmarks, and related tools
  • Produce as-built documentation, performance reports, and share best practices amongst the team.
  • Configure and secure RHEL, Ubuntu, Rocky for GenAI or HPC workloads
  • Work directly with customers onsite (travel both regionally and across the U.S.)

Benefits

  • Dell is committed to fair and equitable compensation practices.
  • The salary range for this position is $153,850 to $199,100.
  • Your life. Your health. Supported by your benefits.
  • You can explore the overall benefits experience that awaits you as a Dell Technologies team member — right now at MyWellatDell.com
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service