This project focuses on evaluating the performance, reliability, and scalability of inference services deployed within the Argonne Leadership Computing Facility (ALCF). The intern will design and implement testing workflows to validate inference pipelines, develop benchmarking methodologies to measure latency, throughput, and resource utilization, and analyze system performance under varying workloads. The work will help ensure that ALCF’s AI inference infrastructure can efficiently support large-scale scientific and machine learning applications. Results will include reproducible benchmarking tools, performance reports, and recommendations for improving inference service reliability and efficiency.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Intern
Education Level
No Education Listed
Number of Employees
1,001-5,000 employees