Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prototype cupy backend #4952

Open
wants to merge 20 commits into
base: master
Choose a base branch
from

Conversation

spxiwh
Copy link
Contributor

@spxiwh spxiwh commented Nov 22, 2024

This adds a prototype CUPY backend to PyCBC.

Our current CUDA GPU backend is not working. There's also a lot more tools now for interacting with CUDA than in 2011. CUPY is really nice, and I think will reduce quite a bit the complexity of our CUDA backend, while still allowing us to use the custom CUDA kernels that exist (as demonstrated in the PR).

This backend will:

  • Run the premerger likelihood through PyCBC inference (with MPI over multiple cores, but not with openmp).
  • Mostly run pycbc_inspiral. There's some issue in the chisq module, but I've run out of time to debug it.

I post this now, although I would like to have pycbc_inspiral running before proposing merging ... But I did promise on Wednesday that I would post this.

Others have suggested moving to torch instead. I would like to see a demonstration of this if we want to consider going that route or this one.

ACTIONS

  • I need to make sure that types are consistent in RawKernel (if not expensive, an explicit check before calling would avoid potential strange errors!)

@spxiwh
Copy link
Contributor Author

spxiwh commented Nov 25, 2024

>> [Mon 25 Nov 07:45:27 CST 2024] Running pycbc inspiral cupy:openmp with 1 threads


>> [Mon 25 Nov 07:45:48 CST 2024] test for GW150914
Pass: 2 GW150914-like triggers

This is now running the pycbc_inspiral unittest (examples/inspiral) ... It's still probably missing lots of things, and probably isn't well optimized (for inspiral), but I'm happy to get feedback (and potentially merge this) at this point.

Copy link
Contributor

@GarethCabournDavies GarethCabournDavies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All this looks sensible to me, though I don't feel I can approve yet

The bits that I'd got to look to be the same as what I'd implemented (though I was much slower and hadn't got to certain parts)

Main points I was wanting to ask about:

  • put your own name down where you have done stuff (even if adding to others')
  • I've looked where bits have been adapted from and have noticed minor discrepancies that I wasn't sure on, so am asking questions.

_backend_dict = {'cupy' : 'cupyfft'}
_backend_list = ['cupy']

_alist, _adict = _list_available(_backend_list, _backend_dict)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The backend_cuda version of this has the if pycbc.HAVE_CUDA statement and this doesn't. This makes me think, should this backend work when not on a GPU?

@@ -0,0 +1,37 @@
# Copyright (C) 2014 Josh Willis
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you wrote this

@@ -0,0 +1,88 @@
# Copyright (C) 2012 Josh Willis
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you wrote this

else:
raise ValueError(_INV_FFT_MSG.format("IFFT", itype, otype))


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to have something similar to the numpy warning, i.e "The cupy backend is a prototype, and performance may not be as expected"


class FFT(_BaseFFT):
"""
Class for performing FFTs via the numpy interface.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

numpy --> cupy

if self.dtype == _xp.float32 or self.dtype == _xp.float64:
return _xp.argmax(abs(self.data))
else:
return abs_arg_max_complex(self._data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see where this is defined?

if cdtype.kind == 'c':
return _xp.sum(self.data.conj() * other, dtype=complex128)
else:
return inner_real(self.data, other)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here - i dont see where this is defined

Comment on lines +108 to +110
#def numpy(self):
# return self._data

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#def numpy(self):
# return self._data

__shared__ unsigned int s;
__shared__ unsigned int e;
__shared__ float2 chisq[${NT} * ${NP}];
float twopi = 6.283185307179586f;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really like that this is defined here - is there any SSOT we can get this from automatically?

Comment on lines +25 to +26
"""This module contains the CPU-specific code for
convenience utilities for manipulating waveforms
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""This module contains the CPU-specific code for
convenience utilities for manipulating waveforms
"""This module contains the CuPy-specific code for
convenience utilities for manipulating waveforms

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants