Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPMM #14

Open
a1941409241 opened this issue Jan 16, 2024 · 8 comments
Open

SPMM #14

a1941409241 opened this issue Jan 16, 2024 · 8 comments

Comments

@a1941409241
Copy link

When initializing the sparse_tile_loader,the threadIdx.x should be threadIdx.x%kBlockWidth. Is what I said correct ?

@tgale96
Copy link
Contributor

tgale96 commented Jan 16, 2024

Hi! We pass threadIdx.x directly (code).

@a1941409241
Copy link
Author

But if using subwarp tiling,different subwarps should correspond to a new line?

@tgale96
Copy link
Contributor

tgale96 commented Jan 16, 2024

I believe that is handled in the block configuration passed in for kernel launch.

@a1941409241
Copy link
Author

Should I use CudaSpmm directly to use the lib after I make install? code

@a1941409241
Copy link
Author

And if I don't use bias, what I need to do is just pass nullptr to this argument or use the default value, is what i said correct?

@tgale96
Copy link
Contributor

tgale96 commented Jan 18, 2024

Yes, CudaSpmm is the right API. If you don't need a fused bias + relu you can call this API. If you want to fuse the operations we have CudaSpmmBiasRelu.

@a1941409241
Copy link
Author

Emmm I have a question. This project determines the configuration of spmmconfig based on the size of the input dense matrix, but this introduces runtime on the CPU. The time I spend directly using cudaspmm is much longer than cuSPARSE, but if I refactor the project and configure spmmconfig myself, the time is shorter than cuSPARSE. But in this case, does it mean that the universality is not good enough, or do I suggest create some instances of cudaspmmex specific to spmmconfig in the library? Is this feasible?

@tgale96
Copy link
Contributor

tgale96 commented Feb 12, 2024

Interesting! I would think your problem must be quite small for that to be the case? The tuning heuristics in this library are by no means expected to be good across all problems and if you know what config is best for your problem you should pass that explicitly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants