This project introduces SZ-Gen, a benchmarking framework for evaluating how large language models (LLMs), based coding agents translate and optimize SZ-family lossy scientific compression across heterogeneous HPC architectures using distinct parallel programming models. The study targets CUDA GPUs (CUDA C/C++), MPI-based distributed-memory clusters (C/MPI), and wafer-scale accelerator DSLs, where architecture-specific constraints in concurrency, memory hierarchy, synchronization, and communication shape performance. Lossy compression is a strong stress test for LLM-driven optimization in two key ways: (i) its kernels exhibit irregular, branch-heavy behavior that complicates parallelization and performance tuning; and (ii) improvements must respect rate vs. distortion constraints, increasing throughput without sacrificing compression ratio or bounded reconstruction error. SZ-Gen structures evaluation along three core dimensions. (i) Architecture and programming model: we analyze how agent performance varies across architectures and their associated languages and abstractions, and whether agents can identify hardware bottlenecks and apply appropriate parallelization strategies. (ii) User prompting expertise: we quantify the impact of prompt quality by comparing (a) regular users with minimal domain knowledge, (b) knowledgeable users who understand architecture constraints and parallel programming techniques, and (c) expert users who provide guidance aligned with manually optimized kernel deployments. (iii) LLM/agent workflow diversity: we compare representative systems spanning single-shot translation and multi-agent optimization, including CodeX (OpenAI) single-agent single-shot, Gemini (Google) single-agent single-shot, OpenEvolve multi-agent, and our proposed multi-agent workflow. Across these dimensions, SZ-Gen uses a hierarchical SZ benchmark suite covering both kernel-level primitives (prediction, quantization, histogramming, bitstream packing) and end-to-end compression pipelines. Evaluation emphasizes functional correctness and architecture-aware performance behaviors, such as memory coalescing, loop restructuring, tiling, kernel fusion, and communication/computation overlap, while maintaining strict error-bound and reconstruction-quality guarantees.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Intern
Education Level
No Education Listed
Number of Employees
1,001-5,000 employees