-
-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Plan for establishment of xsf library for special function scalar kernels #1
Comments
I think this is the right thing to do and I am very happy that you folks are doing it since you also have all the domain knowledge.
I think it is high time that we consider this detail seriously to remove extra bloat. I noticed this while I was wrapping bunch of C extensions that Cython indeed introduces quite some ballast. Unfortunately, I can't offer much in terms of special functions stuff but happy to offer a hand when we are reintegrating Footnotes
|
Big +1!! Maybe one day SciPy will really be a Python(-only) library :)
@asmeurer may be able to advise after converting
I'd be happy to help out here and there with triage rights. |
Yes, completely agree. Testing function correctness in
Agreed, it would be much better to avoid testing
Thanks @ilayn, that would still be really helpful. I think right now the benefit
Thanks @lucascolley |
Even though I'm not a Scipy maintainer I'm more than willing to assist with this if there's things you need help with. I agree it may be hard to completely adapt your code to Boost, and I don't believe your intent is to support arbitrary precision types either. Please correct me if I'm wrong. On the CI front it's easy to get running with CMake. Something to be aware of is the runtime required for NVRTC testing. It takes about 3-5x longer to run the same tests on NVRTC vs NVCC. I think we're paying 7 cents a minute for CUDA runners so this quickly becomes a non-trivial cost (That linked run would be about $6.58). One point of advice would be since this is a new project push your language standard as far forward as possible. Working around broken compiler support for C++11 is a huge pain. C++17 would help get rid of lots of the template spaghetti that exists in Boost.Math (We are currently C++14). That would also help NVRTC out a lot because I think it's really hard to parse all the templates given our policy based design. C++20 you can basically make anything happen at compile time but I don't think that matters a ton here. Concepts are pretty nice though. Developers on Reddit routinely complain about Boosts support for old language standards and I'm sure that's driven people away from wanting to contribute. |
Not to derail the conversation but cc @aterrel @nabobalis @MridulS in case there is anything to say from a NumFOCUS infra perspective.
SciPy is currently at C++17. |
Thanks @mborland for the offer to help. If I remember correctly, @izaid and I did briefly discuss supporting arbitrary precision at some point (but in the context of, Boost does this, maybe we should too), but it's not a priority. Thanks for sharing your CUDA workflow; I think I might look into setting up caching so that a CUDA kernel is only recompiled with NVRTC in xsf's CI if one of the headers the kernel depends on (possibly transitively) has been modified. Agreed on the language standard. Like @lucascolley said, SciPy is currently at C++17 and so is CuPy, so that's what we're going with. |
I used git-filter-repo. Here's the command I used. https://data-apis.org/array-api-strict/#relationship-to-numpy-array-api. It took several tries to get it all right so I recommend doing extensive testing. |
Sounds good. If you don't need arbitrary precision support I would skip it, because everything will be significantly simpler. For example in the Boost.Math Bessel i0 implementation all the calculations for builtin types are pre-computed arrays. This approach is quite clean and compiles quickly even with NVRTC. Since you have C++17 you can simply this even further with a chain of
C++17 is a good place to be. |
This is really great! I like every bit of the plan (maybe no need for arbitrary precision for now). I think it is highly valuable that the routines are easily usable by other projects. Looking forward to it! |
I, of course, am a big fan of this plan. There is something I've been wondering about though. The machinery for creating NumPy ufuncs and gufuncs lives in Does this mean we are going to ship a single Python C extension with all the NumPy ufuncs and gufuncs, something equivalent to what is now https://github.com/scipy/scipy/blob/main/scipy/special/_special_ufuncs.cpp? And Likewise for CUDA testing. To do that, we need to rely on CuPy, hence we'll need to make the CuPy ufuncs and gufuncs as well. The alternative option to this is C++ testing. Which is fine, Google Test is actually extremely good. But I'd rather keep things simple with tests remaining in Python. |
I am hoping to be able to call xsf functions both from Python and from C++, and I have a use case for exactly this. scipy/scipy#20786 wants to add a How to achieve this seems not obvious in the current set-up. With the spin-off of xsf into a header-only library, plus the generation of Python bindings, I guess it will become rather straightforward once xsf lands. |
@izaid, in the short term I think we can just have duplicate ufuncs in xsf and SciPy. Then we defer deciding on whether to move to C++ only testing or whether we should split off SciPy’s ufuncs into a separate Python package. For the time being, testing in Python is most convenient since we can copy over existing tests, and it should be easier to find contributors to work on test code. |
I think you will get hurt if you copy the existing test suite. As we discussed previously, the scipy tests are very unergonomic and very hard to decipher due to extensive dependency injection. I do think by paying some up front investment you would benefit a lot in the future to design your own tests. But that's what I meant above that C++ testing is "acquired taste", I don't know about google test but it is quite hard to compete with pytest ergonomics outside python. So you might want to keep the python bindings but that's up to you. I have no comments either way. For the ufuncs, we can keep them over the scipy side if need be or get them from xsf. The seeming complexity is because now the translation units and ufunc code are intertwined but if you rip |
Yeah, totally agreed that the existing test suite has a lot of problems. My current plan is to just copy over the existing tests this week so we can get something working as soon as possible, and then to make developing a proper test suite a priority over the next month (while getting rid of the original tests). That way we can get the separation out of the way quickly without having to pause special function development while the new test suite is being built. |
Maybe if you could open a separate issue later to elaborate a bit the problem encountered with I guess |
I've submitted a pull request here, #2, to migrate the header files to
@fancidev, sorry, I haven't looked into why |
In the interest of discoverability, would you consider renaming this repo to something other than |
Some background and history
In August of 2023 Irwin Zaid (@izaid) made the following comment on the pull request scipy/scipy#19023,
For context: the array API, short for the Python array API standard, was proposed to reduce fragmentation in the Python array computing world by offering a common API for an important subset of array computing functionality. The goal is that code written to the array API standard should work with any conforming array backend (e.g. NumPy, PyTorch, CuPy, Jax, etc.). Although the array API standard is restricted to a core set of important features, there can be optionalextensions, standardizing coherent sets of functionality. There are currently extensions for Linear Algebra and Fourier transform functions. It seems natural that there should be a special function extension1 codifying consistent naming and argument conventions for a set of the most widely used special functions, but due to inconsistent support for special functions across array libraries, an extension restricted only to functions which are currently uniformly available among the most widely used array libraries would be missing many useful functions. It is much easier for these array libraries to add a new special function when there is an existing implementation that can be re-used with at most small modifications.
At the time, SciPy could not provide header files for many of its special functions that other projects can easily include because SciPy special was a polyglot submodule with special function scalar kernels written in each of C, C++, Cython, and Fortran 77. Scalar kernels written in Cython and Fortran 77 would require translation to run on the C/C++ based CUDA platform for use in GPU aware array libraries, and we observed that it was precisely the functions with such scalar kernels that were most likely to be missing from other array libraries.
Replacing the Fortran 77 codebase in
scipy.special
had already been under active discussion because the Fortran scalar kernels, particularly those from Zhang and Jin's2specfun
codebase, were a source of a sizable number of bug reports, which were challenging to address due to the need to work with old, unstructured, and often low quality Fortran 77 code. My first major contributions toscipy.special
had begun in June of 2021, when I began rewriting the Fortran scalar kernel forhyp2f1
into Cython2, which was the prefered language at the time for new scalar kernels inscipy.special
. Through this work, I'd already begun to think that C++ might be a more suitable choice due to features like overloading on input type, template metaprogramming, better cross-platform support for complex numbers, its extensive standard library, because it can lead to significantly smaller binaries than Cython, and because of the challenges of calling scalar kernels written in Cython from compiled code not written in Cython.In an October 2023 meeting between Irwin, myself, and fellow SciPy maintainer Matt Haberland (@mdhaber) to discuss the draft proposal for a special function array API extension Matt was working on, Irwin explained his vision and I suggested we start by translating special function scalar kernels that are written in Cython into C++. I posted an RFC, scipy/scipy#19404, asking for thoughts on rewriting these Cython scalar kernels in C++, which was well received. I followed up by translating
lambertw
from Cython into C++, scipy/scipy#19435, and then began working on translating other scalar kernels. Irwin in turn followed up with scipy/scipy#19601, adding CUDA support for thislambertw
implementation. In January 2024 I submitted cupy/cupy#8140 to addlambertw
to CuPy. Starting in November 2023, I was able to work on this project 18 hours per week with support from the 2020 NASA ROSES, Reinforcing the Foundations of Scientific Python 3 through my employer Quansight Labs, aiming to bring GPU support to as much ofscipy.special
as feasible. Funding for travel, hardware, cloud, and other miscellaneous costs in service of this project has been provided by the 2023 NumFOCUS small development grant Streamlined Special Function Development in SciPy.Irwin has devoted a great amount time to seeing this project through on a volunteer basis when his work allows.
We are now in a position where it is within reach to add GPU support for all special functions in SciPy except those where the need to manage nontrivial amounts of state makes this unworkable. It was a great help that SciPy maintainer Ilhan Polat (@ilayn) translated all of the Fortran code in
scipy.special
into C as part of his general effort to free SciPy from outdated and difficult to maintain Fortran 77 code 4. Boost Math has recently begun to offer GPU support for special functions and internal tools used within special function code 5, which means we can build upon their work.Over the past year, @izaid and I have frequently made comments on GitHub stating our goal to split the C++ special function code we've been working on into a separate library which could be a shared dependency (as a submodule) of SciPy, CuPy, and other array libraries, and there is support for this from the CuPy developers 6. This RFC is the first step towards really making this happen. The name xsf was chosen by @izaid, based on the name sf for the C++ standard library extension for mathematical special functions.
Concept
We propose splitting SciPy special C++ codebase @izaid and I have been working on into a header only C++ library of inline special function scalar kernels which could be included and used in either C++ or CUDA projects (either with NVCC or the NVRTC jit. The identification macro
__CUDACC__
can be used to detect if a CUDA source file is being compiled, and__CUDACC_RTC__
can be used to detect if the NVRTC jit is being used. Conditional compilation can then be used to adapt the headers for the platform on which they are running.The simplest adaption is to define a macro
XSF_HOST_DEVICE
which expands to__host__ __device__
when a CUDA source file is being compiled, and otherwise an empty string.Host refers to the CPU side of a system and device to the CUDA capable GPU side. Defining a function with both the
__host__
and__device__
qualifiers tells the compiler the compile the function for both the CPU and GPU (having both is needed for some array libraries). Function signatures in the xsf library then look like:Note that the
inline
keyword is used here not because we expect the compiler to inline the long and somewhat complex implementation oflambertw
, but because this is what's needed in a header only C++ library to allow the same function to be used in multiple translation units under the One Definition Rule.We rely heavily on NVIDIA's libcudacxx 7, an implementation of much of the C++ standard library that is part of the CCCL (CUDA core compute libraries) suite. Since we only implement scalar kernels, no explicit management of parallelism or other GPU specific programming techniques are needed; it's sufficient to write scalar kernels which use the subset of the standard library that is supported on NVRTC 8, and which do not make nontrivial memory allocations, use recursion, or use other features which are not compatible with the highly parallel but resource constrained nature of GPU computation 9.
Currently a header config.h is used whose essential purpose is to include the headers xsf uses from the standard library when compiling for CPU, but for CUDA, to instead inject the standard library functions, templates, and types xsf uses from the
libcudacxx
'scuda::std
namespace directly into thestd
namespace. This was a convenient because it let us write our code as if it was just regular C++ code that uses the standard library, but it is generally not considered good practice to modify thestd
namespace, and doing so leads to undefined behavior according the C++ standard 10. Since these headers will no longer just be part of SciPy's (and CuPy's through copy and paste) internal codebase, we should adopt a more principled approach to aliasing to the correct standard library utilities for a given platform.Scope
We hope to include scalar kernels for all
scipy.special
ufuncs and gufuncs in xsf, and to enable GPU support for as much of them as feasible. Since @mborland and @jzmaddock have also begun adding GPU support to mathematical functions in Boost Math, a question that could arise is why there needs to be a separate project doing this. I think it's important that we coordinate with them to avoid duplication of effort. I envision Boost Math being a dependency of xsf, and for us to wrap Boost Math when it provides a quality GPU capable implementation of a function. Currentlyscipy.special
provides many functions which are not available in Boost Math and provides complex-valued versions of others for which Boost Math only provides real valued ones. There may be other functions for which the Boost implementation cannot be easily made GPU capable, but for which we would like to maintain a GPU capable implementation. Over time, I'd hope to see work done in xsf make it's way into Boost in some form when that makes sense, butSo for the time being, I think it makes more sense for the work on xsf to proceed independendly of Boost. We will be able to move more quickly this way. In the future I anticipate a natural delineation will form over what makes more sense to be implemented in Boost vs xsf based on differences in project goals and standards.
Migration Plan
The initial steps would be:
scipy/special/xsf
in the SciPy repo toinclude/xsf
in the xsf repo, taking care to preserve the commit history ofscipy/special/xsf
, perhaps using steps from an answer to this relevant StackOverflow question.scipy.special
tests, making the minor changes to them that would be needed to get them to work. In the future we would build out a more thorough and consistent test suite.xsf
as a submodule of SciPy and CuPy.There are headers in
scipy/special/xsf
which are not GPU capable, or which are in principle, but use things from the standard library which are not yet available in CuPy (config.h as it currently exists on SciPy's main branch is one of them). Compilation with NVRTC will fail at runtime if one tries to call a CuPy ufunc with scalar kernel including one of these headers. This is something which will need to be solved beforexsf
can be included as a submodule of CuPy. Currently what we're doing is taking stuff like this out of the config.h that we copy into CuPy when submitting PRs to add functions there. Instead we should either avoid things that are not available in CuPy (or perhaps spend some time figuring out how to get them to work in CuPy).Priorities for the rest of the year.
Very roughly, the current status of the scipy special ufunc codebase is:
kolmogorov
distribution which depend ondouble double
arithmetic, and the real valued implementation ofhyp2f1
, which uses recursion), but we haven't tested all of it.gdtrib
has been made GPU capable 13. I've translated much of cdflib's internal code (for functions that get inverted with the root finding tools), but have found that cdflib's implementations tend to be worse than the implementations in Boost and cephes, so none of this work translating cdflib has found it's way into SciPy.lambertw
,wright_bessel
,digamma
,loggamma
,hyp2f1
,sici
,sinpi
, andcospi
have been translated from Cython to C++, but a handful remain, the most complicated beingorthogonal_eval.pxd
for orthongal polynomial evaluation.ellint_carlson
are written in C++, but we have not moved them into headers in xsf or made them GPU capable.Through the end of the year I have funding for 287.89 hours of work from the 2020 NASA ROSES grant. There are 14 weeks remaining in the year, which means I am funded to spend 20.5 hours per week for the rest of the year. All of the work above should
For the remainder of the year, I'd like to prioritize the following things 14:
special
, but weren't able to due to the burden of working with, and maintainers needing to review updates to the messy, at the time mostly Fortran,special
codebase. I think there's a lot of potential to find people who'd like to help out, as long as it's straightforward enough for them to get started.xsf
, which could help a prevent a review bottleneck from slowing our development pace and discouraging contribution.Request for Comment
Does anyone have any objections to splitting the
scipy.special
scalar kernel codebase into a separate repo, xsf, which is managed under the SciPy organization? Does anyone anticipate any technical or non-technical difficulties that I may be unaware of? One thing I think is really important to get right is testing and CI, so I plan to make a separate RFC this week just to discuss these issues. The main thing is that requiring a full SciPy build for xsf CI for each platform would be very time consuming, but some things likecython_special
can only be tested in SciPy. Another thing that needs discussion is what the xsf release cycle would look like, and the process for updating xsf in SciPy.@izaid has agreed to be a maintainer of xsf, and is willing to spend time to review and merge my pull requests. This should allow development of xsf to move faster than special function development in SciPy, where lack of bandwidth from qualified reviewers has lead to a significant bottleneck. Any current SciPy maintainers who are interested in getting involved and want commit rights or triage rights should let me know.
Footnotes
An RFC for such an extension was proposed by SciPy maintainer Matt Haberland (@mdhaber) in December 2023,
https://github.com/data-apis/array-api/issues/725, written with input from Irwin and I. ↩
Shanjie Zhang, Jianming Jin, Computation of Special Functions, Wiley, 1996, ISBN: 0-471-11963-6, LC: QA351.C45. ↩ ↩2
#80NSSC22K0405 - Reinforcing the Foundations of Scientific Python. Awarded jointly to NumPy, pandas, SciPy, and scikit-learn. https://numfocus.medium.com/numfocus-projects-receive-nasa-grants-deee374e7a57 ↩
https://github.com/scipy/scipy/issues/18566 ↩
https://github.com/boostorg/math/pull/1191 ↩
https://github.com/cupy/cupy/issues/8303 ↩
Just days ago, AMD published its own libhipcxx implementation of the C++ standard library, which should allow us to also support AMD GPUs with minimal effort on our end. ↩
NVRTC is more restrictive and supports a smaller subset of the standard library. ↩
There are some special functions for which managing dynamic state is needed for more accurate or efficient computation in some or all parameter regions (e.g. Spheroidal Wave Functions). In these cases, conditional compilation may be needed to swap out implementations based on the platform for which the code is compiled. We may also decide and document that certain functions or parameter regions won't be supported for all platforms. ↩
https://wiki.sei.cmu.edu/confluence/display/cplusplus/DCL58-CPP.+Do+not+modify+the+standard+namespaces ↩
https://github.com/scikit-learn/scikit-learn/issues/24491 ↩
https://github.com/scipy/scipy/pull/20390 ↩
https://github.com/scipy/scipy/pull/21454 ↩
It will be difficult, but developing a GPU capabable implementation of Miller's recurrence algorithm to perform argument reduction with a recurrence in cases where the recurrence is only stable in one direction (and not the one we want), is needed for us to implement accurate Bessel and hypergeometric functions which work on the GPU. This is something I've been hoping to work on for a while now, but I don't think it should be priortized over the four items below. ↩
mdhaber began work on the mparray library, which could be used as a home for arbitrary precision reference implementations. ↩
The text was updated successfully, but these errors were encountered: