This project studies how reliable modern GPUs are when small transient errors occur during application execution. It uses a unified fault-injection approach to understand how errors propagate through GPU-accelerated scientific and heterogeneous workloads across different hardware platforms. By comparing program outputs after controlled bit-level perturbations, the work identifies which parts of applications are most vulnerable to silent errors or failures. The broader goal is to improve understanding of GPU resilience and guide the design of more reliable future computing systems.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Intern
Education Level
No Education Listed
Number of Employees
1,001-5,000 employees