Added wrapped C cuda code and runable examples #1

ArrogantGao · 2023-07-03T04:20:50Z

No description provided.

ArrogantGao · 2023-07-03T04:29:09Z

I compared the speed of the our cuda code and tropical operator by GemmKernels in julia 1.7, the result is shown below

GemmKernels perform much better in julia 1.7 than in julia 1.6, but still can only reach about half of the maxium performance.

I wrapped our CUDA code as .so lib and added a julia interface, related tests are also created.

src/SGEMM/CuTropicalSGEMM_example.jl

src/SGEMM/TropicalSGemm.cu

src/SGEMM/benchmark_GemmKerenls_Tropical.jl

ArrogantGao · 2023-07-05T08:42:01Z

Sorry for the previous chaos, I thought these parts will not be publish as part of the package.

The following changes have been made:

The .so file is uploaded to gist as an artifact, so that there no more binary in the repo now.
I relocated all the files into folder src, test and benchmark.
Scripts used for benchmarks are given, including the fall back implementation in CUDA.jl. However I found something strange: it seems that CUDA.@sync do not work when using the function from a .so lib, so I failed the benchmark our code in julia.

The new benchmark result is show here:

GiggleLiu · 2023-07-05T10:05:30Z

Hi @maleadt, I am mentoring an Open Source Promotion Plan student to implement Tropical GEMM on GPUs. Regarding the recent update in GemmKernels.jl: JuliaGPU/GemmKernels.jl#101, I was suggesting him to try the GemmKernels.jl to make the implementation compatible with Julia CUDA ecosystem. However from the above benchmark, we can see its performance is not as good as the 600 line C code.

We might need your help to decide which way to go is technically more feasible:

We can either implement the TropicalGEMM in C and port it to CUDA ecosystem, or
try polishing the GemmKernels implementation.

Also, @ArrogantGao found CUDA.@sync do not work when call a function from a .so lib. I can not find any issue discussing how CUDA.@sync interact with .so lib. So if you could provide a direction to investigate, that would be very helpful. If there are any existing sample project for reference, then it would be perfect.

NOTE: All the benchmarks and implementations are included in this repo.

GiggleLiu

I think the changes look great, well done!

GiggleLiu · 2023-07-05T09:13:59Z

.vscode/settings.json

@@ -0,0 +1 @@
+{}


vscode configuration files should not be commited.

Artifacts.toml

GiggleLiu · 2023-07-05T10:17:51Z

src/TropicalGemmFP32.cu

@@ -0,0 +1,627 @@
+// This CUDA code is modified based on github repo https://github.com/Yinghan-Li/YHs_Sample, which is under GPL 3.0 License


Holy, the GPL3 license, that is sexy. If we decide to keep this version in our code base, we have to include GPL3 license.

To “propagate” a work means to do anything with it that, without permission, would make you directly or secondarily liable for infringement under applicable copyright law, except executing it on a computer or modifying a private copy. Propagation includes copying, distribution (with or without modification), making available to the public, and in some countries other activities as well.

maleadt · 2023-07-05T10:32:56Z

try polishing the GemmKernels implementation.

I would recommend doing so. An all-Julia implementation is always preferable, for so many reasons: support for different datatypes, easier to tune using metaprogramming instead of the hard-coded 128x128x8 here, easier for other people to contribute to, etc. The code generated by GemmKernels.jl is generally pretty good, so it should be possible to compare the generated PTX code of both implementations, and/or use NSight Compute to compare executions. Maybe it's something simple, like GemmKernels.jl not using ldg. It's possible that it's more serious, like how we use 64-bit integers for pointer arithmetic (JuliaGPU/CUDA.jl#1895), but I wouldn't expect so with a memory-bound kernel.

ArrogantGao · 2023-07-05T12:01:28Z

Remove the .vscode file and changed the license to GPL 3.0 (indeed, I also like that better).

GiggleLiu · 2023-07-05T13:22:13Z

@maleadt Thank you first your prompt reply. @ArrogantGao Let us do some profiling and get some understanding about the performance issues.
This papge is how to profile GPU code with CUDA.jl: https://cuda.juliagpu.org/stable/development/profiling/

Let me merge the PR first, and move the discussion to: #2 , we can update the profiling result and generated ptx code there.

added wrapped c code and runable examples

16beb68

GiggleLiu requested changes Jul 3, 2023

View reviewed changes

src/SGEMM/CuTropicalSGEMM_example.jl Outdated Show resolved Hide resolved

src/SGEMM/CuTropicalSGEMM_example.jl Outdated Show resolved Hide resolved

src/SGEMM/TropicalSGemm.cu Outdated Show resolved Hide resolved

src/SGEMM/benchmark_GemmKerenls_Tropical.jl Outdated Show resolved Hide resolved

relocated all the files

b9cb5bd

GiggleLiu approved these changes Jul 5, 2023

View reviewed changes

GiggleLiu reviewed Jul 5, 2023

View reviewed changes

changed license to GPL 3.0

d77cddc

GiggleLiu mentioned this pull request Jul 5, 2023

Investigate the performance issues and consider moving to GemmKernels.jl #2

Open

GiggleLiu merged commit 895cfa5 into main Jul 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added wrapped C cuda code and runable examples #1

Added wrapped C cuda code and runable examples #1

ArrogantGao commented Jul 3, 2023

ArrogantGao commented Jul 3, 2023

ArrogantGao commented Jul 5, 2023 •

edited

Loading

GiggleLiu commented Jul 5, 2023

GiggleLiu left a comment •

edited

Loading

GiggleLiu Jul 5, 2023

GiggleLiu Jul 5, 2023

maleadt commented Jul 5, 2023

ArrogantGao commented Jul 5, 2023

GiggleLiu commented Jul 5, 2023 •

edited

Loading

		@@ -0,0 +1,627 @@
		// This CUDA code is modified based on github repo https://github.com/Yinghan-Li/YHs_Sample, which is under GPL 3.0 License

Added wrapped C cuda code and runable examples #1

Added wrapped C cuda code and runable examples #1

Conversation

ArrogantGao commented Jul 3, 2023

ArrogantGao commented Jul 3, 2023

ArrogantGao commented Jul 5, 2023 • edited Loading

GiggleLiu commented Jul 5, 2023

GiggleLiu left a comment • edited Loading

Choose a reason for hiding this comment

GiggleLiu Jul 5, 2023

Choose a reason for hiding this comment

GiggleLiu Jul 5, 2023

Choose a reason for hiding this comment

maleadt commented Jul 5, 2023

ArrogantGao commented Jul 5, 2023

GiggleLiu commented Jul 5, 2023 • edited Loading

ArrogantGao commented Jul 5, 2023 •

edited

Loading

GiggleLiu left a comment •

edited

Loading

GiggleLiu commented Jul 5, 2023 •

edited

Loading