Inquiry About Resources for Simulating Matrix Multiply on gem5-GPU #710
-
Hello authors, I am currently exploring GPU simulation for various computations.Recently, I have experimented with the DNNMark. However, I encountered a bug in the fully connected (fc) functionality within DNNMark, and raising an issue, it seems to remain unresolved. Due to this, I am now looking to shift my focus more towards direct simulation of matrix multiplication operations. I am reaching out to inquire if there are any pre-built resources or examples available within the gem5-GPU framework that specifically focus on matrix multiplication. My project doesn't restrict the type of GPU to be simulated, so I am open to resources that involve any GPU type supported by gem5-GPU. My main goal is to understand the performance characteristics and the potential optimizations in the context of matrix multiplication on GPUs. If such resources are available, could you please guide me on how to access them? Or, if there are similar examples that could be modified for matrix multiplication, that would also be very helpful. Any guidance, advice, or references to documentation would be greatly appreciated. Thank you for your time and assistance. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Sorry for the delayed responses over the holiday break. I and others have run the DeepBench SGEMM and DGEMM benchmarks (https://github.com/baidu-research/DeepBench/blob/master/code/amd/gemm_bench.cpp) in gem5 previously. It is on my list of things to do to integrate it into the gem5-resources repo, but has not gotten to the top of that list yet. I don't think there is anything major needed to run it from scratch though -- the main change we made is changing it to run a specific GEMM size from the command line instead of running hundreds of GEMMs back-to-back as Baidu set it up to do originally. So you are welcome to try this. Beyond this, I believe you could also write your own benchmark that call's AMD's rocBLAS library (their equivalent of NVIDIA's cuBLAS library) directly for a GEMM of a given size -- this should also be supported (DeepBench's benchmarks call rocBLAS). |
Beta Was this translation helpful? Give feedback.
Sorry for the delayed responses over the holiday break. I and others have run the DeepBench SGEMM and DGEMM benchmarks (https://github.com/baidu-research/DeepBench/blob/master/code/amd/gemm_bench.cpp) in gem5 previously. It is on my list of things to do to integrate it into the gem5-resources repo, but has not gotten to the top of that list yet. I don't think there is anything major needed to run it from scratch though -- the main change we made is changing it to run a specific GEMM size from the command line instead of running hundreds of GEMMs back-to-back as Baidu set it up to do originally. So you are welcome to try this.
Beyond this, I believe you could also write your own benchmark …