-
Notifications
You must be signed in to change notification settings - Fork 14
Publications
The following publications are related to CALDGEMM and HPL-GPU
[1] D. Rohr, V. Lindenstruth: “A load-distributed Linpack Implementation for Heterogeneous Clusters”, in Proceedings of the 17th IEEE International Conference on High Performance Computing and Communications, HPCC 2015, New York. IEEE, [2015].
[2] D. Rohr, M. Bach, G. Nešković, V. Lindenstruth, C. Pinke, O. Philipsen: “Lattice-CSC: Optimizing and Building an Efficient Supercomputer for Lattice-QCD and to Achieve First Place in Green500”, in High Performance Computing, vol. 9137 of Lecture Notes in Computer Science, pp. 179–196 [Springer International Publishing, 2015], ISBN 978-3-319-20118-4.
[3] D. Rohr, G. Nescovic, M. Radtke, V. Lindenstruth: “The L-CSC cluster: green- est supercomputer in the world in Green500 list of November 2014”, in Proceedings of Supercomputing Frontiers Conference, Singapore, [2015].
[4] D. Rohr, V. Lindenstruth: “A flexible and portable large-scale DGEMM library for Linpack on next-generation multi-GPU systems”, in 23rd Euromicro International Conference on Parallel, Distributed and Network-Based Processing, [2015].
[5] D. Rohr, S. Kalcher, M. Bach, A. Alaqeeli, H. Alzaid, D. Eschweiler, V. Lindenstruth, A. Sakhar, A. Alharthi, et al.: “An Energy-Efficient Multi-GPU Supercomputer”, in Proceedings of the 16th IEEE International Conference on High Performance Computing and Communications, HPCC 2014, Paris, France. IEEE, [2014].
[6] D. Rohr: “On Development, Feasibility, and Limits of Highly Efficient CPU and GPU Programs in Several Fields”, Dissertation thesis, Goethe University of Frankfurt [2013].
[7] M. Bach, J. De Cuveland, H. Ebermann, D. Eschweiler, M. Kretz, M. Pollok, D. Rohr, H. J. Lüdde, V. Lindenstruth: “The LOEWE-CSC: A Comprehensive Approach for a Power Efficient General Purpose Supercomputer”, in 21st Euromicro International Conference on Parallel, Distributed and Network-Based Processing, pp. 1–17 [2013].
[8] D. Rohr, M. Bach, M. Kretz, V. Lindenstruth: “Multi-GPU DGEMM and HPL on Highly Energy Efficient Clusters”, IEEE Micro, Special Issue, CPU, GPU, and Hybrid Computing [2011].
[9] M. Bach, M. Kretz, V. Lindenstruth, D. Rohr: “Optimized HPL for AMD GPU and Multi-Core CPU Usage”, Computer Science - Research and Development, vol. 26 , no. 3-4 : pp. 153–164 [2011].
[10] D. Rohr, M. Kretz, M. Bach: “Technical Report, CALDGEMM and HPL”, Technical Report, Goethe University Frankfurt [2010].
- DMA and memory bandwidth
- CALDGEMM Performance Optimization Guide (CAL OpenCL without GPU_C)
- CALDGEMM Performance Optimization Guide (OpenCL CUDA)
- Thread to core pinning in HPL and CALDGEMM
- Important HPL GPU / CALDGEMM options
Tools / Information
- Analysis Plots of HPL GPU Runs
- Headless System with X Server
- Heterogeneous cluster with different node types
- HPL Compile Time Options
- Catalyst Driver Patch
Reference