-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding GPU Support #128
Comments
I'll try adding a provision for 1) in the I will be testing it on Colab. Here's a notebook to get started (installs CuPy, Colab already has numpy+scipy+numba. Please check if you have turned on a GPU runtime). |
Just wanted to say I'de be interested in trying the GPU support for cupy.matmul with GF 32 bit fieldarrays to see how they compare speed wise with regular cupy.matmul for uint32 data types. I was thinking of maybe using this library to matmul some large arrays in real time but it's too slow to do it as it currently stands. I expect even with cupy acceleration it's still going to be an order of magnitude slower than operating on native data types like uint32. |
@peter64 thanks for the feedback. I'm not too surprised that Some clarifying questions:
What is your current slowdown as compared to normal integer |
I'm using GF(p^m) I think. It's 2**8 (256). Honestly I just need this to do binary multiplication modulus 2 for some kind of entropy extractor I'm trying to reproduce from some paper. Here's some output from the tests I just ran using GF(256) as GF(2^32) wasn't completing in any reasonable period of time and the docs said using a smaller p^m value might mean it could use lookup tables so I gave it a try. In reality I would prefer to use GF(2^32) I guess.
So GF is about 50x-60x times slower than numpy 6 minutes vs ~7 seconds. cupy is about 5x faster than numpy on it's first run (with a GTX1050 and i7 quad core) but then cupy ends up being about 700x faster than numpy on subsequent runs. |
@peter64 thanks for the example. Yes, that is slower than I would expect (which is ~10x slower than NumPy). Let me run some speed tests later today and maybe test a few potential speed ups. I'll report back. |
@peter64 can you confirm which version you're using? |
Perhaps it's because my arrays are large enough they don't fit in the CPU cache or something... |
@varun19299 and @peter64, I now have a GPU to test against. I'm considering starting work on GPU support. Do you have any updated thoughts on a desired API interface regarding transfer to/from GPU, etc? If not, I'll use my best judgment. Just wondering if you have given it any thought. Thanks. |
hey @mhostetter thanks so much for asking, but I have no thoughts regarding a desired API. I can't promise I will end up using the library in the end, but I am very curious to see how it will perform and will be happy to test its performance! Thanks again for writing this library and being able to add GPU support! |
Hi Matt, any news on this one? GPU support would be great for high-order calculations! |
No update as of yet. It's going to be a big change, and just one I haven't embarked on yet. Perhaps soon. Just curious, @geostergiop, what are you looking to speed up? I doubt "large" finite fields (those using |
Well, I am currently calculating about 2,000 * 19,993 * 18,431 field_traces() and respective Norms over 2^256 and 2^233 elements so it takes some time, to say the least :-) Hoped to speed up the np.arange and exponent calculations. |
Early ideas:
Accept
device
flag forGF
instances. IfCUDA
, use a cupy array.Use
cupy.get_array_module
for device agnostic code where possible.Pytorch-like
.to(device)
: allow transferring between host and device(s). Internally this would just be anumpy{cupy}.asarray
orArray.view(np/cp.ndarray)
call.Most numpy functions in
galois/field/linalg.py
have corresponding cupy ones with identical syntax.Numba
jit
functions andufuncs
may require separate GPU implementations, especially if thread and block index need to be accessed.The text was updated successfully, but these errors were encountered: