This project focused on optimizing a memory-bound scientific code for multi-GPU clusters. The work was initially conducted during a 3-month internship at Technische Universität Wien for the VSC Research Center. It is now being continued and improved as a personal hobby project. Video Presentation: https://www.youtube.com/watch?v=LHAgrdNKcDM&t=186s
- Initial Duration: July 2020 - September 2020 (3 months internship)
- Department: Originally a remote internship,VSC Research Center (during internship)
- CUDA
- OpenACC
- C
- Bash
- NVIDIA GPUs
- Optimize memory consumption of the Jacobi iterative solver.
- Reduce communication time in multi-GPU environments.
- Port the existing code to run efficiently on a multi-GPU cluster.
- Successfully ported the code to a multi-GPU cluster.
- Explored and implemented various optimization techniques.
- Improved both memory usage and inter-GPU communication efficiency.
The project was initially developed and tested on a supercomputer equipped with multi-GPU nodes featuring NVIDIA GPUs.
This project is being actively improved as a hobby project. Some areas of focus include:
- Further optimization of GPU memory usage
- Implementing advanced "modern" parallelization techniques
- Exploring new applications for the optimized code
- Improving documentation and adding examples