Research Aide - MCS - Li, Changqing - 3.10.26.

Argonne National Laboratory•Lemont, IL

2d•$31 - $47

About The Position

This project introduces SZ-Gen, a benchmarking framework for evaluating how large language models (LLMs), based coding agents translate and optimize SZ-family lossy scientific compression across heterogeneous HPC architectures using distinct parallel programming models. The study targets CUDA GPUs (CUDA C/C++), MPI-based distributed-memory clusters (C/MPI), and wafer-scale accelerator DSLs, where architecture-specific constraints in concurrency, memory hierarchy, synchronization, and communication shape performance. Lossy compression is a strong stress test for LLM-driven optimization in two key ways: (i) its kernels exhibit irregular, branch-heavy behavior that complicates parallelization and performance tuning; and (ii) improvements must respect rate vs. distortion constraints, increasing throughput without sacrificing compression ratio or bounded reconstruction error. SZ-Gen structures evaluation along three core dimensions. (i) Architecture and programming model: we analyze how agent performance varies across architectures and their associated languages and abstractions, and whether agents can identify hardware bottlenecks and apply appropriate parallelization strategies. (ii) User prompting expertise: we quantify the impact of prompt quality by comparing (a) regular users with minimal domain knowledge, (b) knowledgeable users who understand architecture constraints and parallel programming techniques, and (c) expert users who provide guidance aligned with manually optimized kernel deployments. (iii) LLM/agent workflow diversity: we compare representative systems spanning single-shot translation and multi-agent optimization, including CodeX (OpenAI) single-agent single-shot, Gemini (Google) single-agent single-shot, OpenEvolve multi-agent, and our proposed multi-agent workflow. Across these dimensions, SZ-Gen uses a hierarchical SZ benchmark suite covering both kernel-level primitives (prediction, quantization, histogramming, bitstream packing) and end-to-end compression pipelines. Evaluation emphasizes functional correctness and architecture-aware performance behaviors, such as memory coalescing, loop restructuring, tiling, kernel fusion, and communication/computation overlap, while maintaining strict error-bound and reconstruction-quality guarantees.

Requirements

The entirety of the appointment must be conducted within the United States.
Applicants must be:
Currently enrolled in undergraduate or graduate studies at an accredited institution.
Graduated from an accredited institution within the past 3 months; or
Actively enrolled in a graduate program at an accredited institution.
Must be 18 years or older at the time the appointment begins.
Must possess a cumulative GPA of 3.0 on a 4.0 scale.
If accepting an offer, candidates may be required to complete pre-employment drug testing based on appointment length. All students remain subject to applicable drug testing policies.
Must complete a satisfactory background check.