-
Notifications
You must be signed in to change notification settings - Fork 931
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[QST] more installation methods #6979
Comments
@jangorecki the change of terms to conda's service only applies to the |
This blogpost from conda-forge should clear things up regarding the change in the ToS and conda-forge package hosting: I've also tested using rapids without defaults and things worked without issue |
This issue has been marked stale due to no recent activity in the past 30d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be marked rotten if there is no activity in the next 60d. |
@kkraus14 are there any actionables on this issue?
If there is indeed no plan for a PPA package and if we don't foresee any future changes w.r.t. PyPI support for RAPIDS, then I'd vote to close this issue. |
@vyasr there's active work going on in making handling our dependency much simpler via CMake refactoring and hopefully moving to scikit-build to benefit from the same. Ideally someone cloning our repo, going into the Python directory, and then running That being said, pip packages or a PPA is still out of scope for now. Our first step is making building from source and controlling the dependencies as smooth as possible and then we'll re-evaluate our options as far as packaging goes. |
Even if pip or PPA is not planned for the near future, improving the experience of building from source would be great! Simply cloning the repo and running python setup.py install would be a great approach in that direction! |
This is exactly what we're striving for. We basically have this for the C++ libcudf build, but isn't quite there yet for cuDF Python. The two main challenges are related to automated building of libcudf when running the python build, and handling the PyArrow CUDA pieces which aren't packaged as pip packages currently. |
Please see https://iscinumpy.gitlab.io/post/scikit-build-proposal/ - would cuDF developers be interested in being listed in that proposal? I could help with the move as part of that work. |
@henryiii I'm absolutely interested in this. I've integrated scikit-build into another project that I maintain before and uncovered some of the issues that were previously blockers for adoption in cudf. @kkraus14 was certainly interested in scikit-build at the time and aware of some of those blockers. @quasiben @shwina would you be interested in our pursuing this further? I'm happy to at least take point on starting the conversation here since I've built up a moderately complex Python/Cython/C++ scikit-build project from scratch before. |
I never attempted to build cudf with scikit-build, but I did attempt to build RMM. The PR eventually stalled here: rapidsai/rmm#637. IIRC the main blocker was not being able to ship docstrings in a release build of the Cython which has since been fixed, but more generally that the lever of effort and time it took to get a relatively small fix merged into scikit-build that @bdice contributed left me feeling discouraged in using scikit-build moving forward. |
If the proposal was accepted, I should be able to work on the project, rather than just squeezing in small amounts of time in between all my other projects (scikit-hep, pybind11, cibuildwheel, build, etc). We'd also have some paid time from KitWare too. Renovating it with better testing, avoiding distutils internals, etc. would hopefully smooth things out going forward, as well, after the three year period. The package adoption phase would be ideally happen after some of the infrastructure was improved (it's a three part proposal, first part is working on the infrastructure, second package transition assistance, third tutorials and training). |
@kkraus14 I agree, I've had the same concerns about adopting scikit-build after watching that PR stall (big shoutout to @bdice for following up on that while I was swamped with thesis work, he took my Slack messages and converted them into a workable issue/PR and followed up for a whole year!). I think what @henryiii is proposing for now is very low investment, basically just our expressing interest as a way to raise the profile of his proposal and perhaps providing some feedback as he gets to work. We would be able to track this effort over a sufficiently long period of time before we committed to any switch. We could revisit the discussion of actually switching sometime in that 1-3 year time frame depending on how this progresses. IIRC there were other issues that I identified while implementing skbuild for our project, for instance coverage metrics also don't work because of the way that skbuild sets up build paths, and we could be involved at the level of identifying such issues during early development. Thoughts? |
Sounds good to me. I've also sent the scikit-build proposal to some of my colleagues who help to maintain PyArrow to see if there's interest on that front as well. |
cc @pitrou who expressed some interest from the PyArrow side |
Just to reiterate. basically what @vyasr has already said, mostly for @pitrou's sake: I'm preparing a proposal that will enable scikit-build's development (which is currently minimal due to lack of funding) to move dramatically forward over the next three years. I am working on a set of science drivers that cover current libraries that are interested in adopting this (if it is successful, obviously); they are interested in funding projects with an expressed need from the scientific community, not "if I build it they will come" type projects. I think a rewritten modern scikit-build would be huge for the scientific community, and fills a gaping hole that's just going to get worse when Python 3.12 removes distutils. (And, scikit-build, like any 2014 era project, needs this rewrite to get off of distutils internals itself!). I need a small science driver description (what would this bring to or enable in your project) which I can write, but input is helpful, and a letter of collaboration that states you'd be interested in the project if the proposal was accepted. |
@henryiii Thanks. It's not obvious to me who the proposal is aimed at. Are you looking for funding and/or institutional support? Regardless, here is a quick summary about PyArrow:
For us, I think the main interest would be to largely simplify our specialized code in |
@pitrou Did I not respond? Sorry, I was traveling for Thanksgiving and thought I responded before. I'm only looking for letters of collaboration to support the proposal; the funding would come from the NSF if accepted. I am looking for three things: A science driver statement explaining the impact on science that this collaboration will have, a description of how we will collaborate for the Facilities, Equipment, and Other Resources document, and a letter (template here) just saying you'd be happy to work with the PI (me) if the project was accepted. For most projects, I'd expect the second point will basically be that you will work with me to develop a scikit-build based build for your project (I would avoid promising you will use it, just that you will try it, that way you can decide to adopt it only if it's superior, which KitWare and I will try to make sure is true). I can write it then show it to you, or you can supply it. It's usually 1-2 sentences. @vyasr could you provide a letter for cuDF, then? @pitrou for PyArrow? Also any suggestions on things to include in the science drivers or pointers to useful publications to cite would be helpful. @pitrou has provided a fantastic start for PyArrow. I'll get back to you when I have that section polished off. |
@henryiii sure I can probably write that up pending approval from @quasiben. We'll probably want to touch base on the contents a little bit since packages like Arrow and RAPIDS are sufficiently removed from direct impacts on science that I'm not sure what the appropriate level of discussion is in our letter. |
Also cc @jorisvandenbossche for PyArrow |
@dantegd @BradReesWork would this be of interest to cuML or cuGraph? |
I'd be happy to get a couple more letters if cuML or cuGraph is interested. It's just this template and then 1-3 sentences like: * Letter from **Henry Schreiner, boost-histogram (Princeton University)**:
Dr. Schreiner will work with the PI to integrate scikit-build into boost-histogram.
This will remove custom code and enable a new CMake configuration for downstream
users to integrate boost-histogram into CMake and scikit-build code. (I'm not actually writing a letter for myself, just making up an example.) I'll also include a bit in the Science drivers, so suggestions there are welcome, such as references, maybe an image, etc. |
Can I get an idea of which projects (cuDF, PyArrow, cuML, cuGraph) and who will be able to supply a letter? I need to finalize the project description very soon. Letter themselves can be early next week, thought the sooner the better for my sanity. :) FYI, I've collected about half of my expected letters at this point, several collaborators will be supplying letter early next week too. But it still makes me nervous, especially when the topic is part of the science drivers, and would have to be removed if no letter was supplied. The "Facilities, Equipment, or Other Resources" document is easy to modify even next week if needed. |
I can write up a letter for cuDF. I may not get it to you today, but I should be able to have it done by Monday. I could also provide one for RMM if @harrism approves since that transition was already attempted. I'll ping him to ask. |
That's okay, I'd just like to know what I can count on, and letters can be provided early next week. Thanks! That leaves @pitrou or @jorisvandenbossche for PyArrow, are you planning on one too? |
Sorry, we were not aware that you had such a tight deadline :-(. Arrow is a community project under Apache governance and I doubt that we can sign a letter on behalf of the project without at least some discussion on our side (and perhaps a vote). Realistically, it would be at least three weeks (no idea if more, but unlikely to be less). So feel free to without a letter from us if you have enough letters already. |
Well, the original post was from October and we started discussing PyArrow two weeks ago, so it was originally not that tight, though slightly less than three weeks from when PyArrow was first involved.
I've been told that the best chance of funding is if I can prove this is not a "if we build it, they will come" type project, but rather one addressing a real and pressing need in the community (which I strongly believe it is), and the key way to show that is to provide lots of letters of collaboration. Of all the 50+ pages in the proposal, the collaboration letters show that the best.
This is fine, and absolutely up to you, but just to be clear (for everyone else, as well), I can't get a letter from a collaboration anyway. It's a letter from an individual that says "If the proposal submitted by Dr. Henry Schreiner entitled “Elements: Simplifying Compiled Python Packaging in the Sciences” is selected for funding by NSF, it is my intent to collaborate and/or commit resources as detailed in the Project Description or the Facilities, Equipment or Other Resources section of the proposal.". No other text is even allowed. The Facilities, Equipment or Other Resources section can have 1-3 sentences, such as "Dr. So-and-so will work with PI Schreiner to evaluate using scikit-build as the build system for such-and-such project. This will enable blah-blah that would enable us to blah blah", something like that, but really anything you want. It's not a promise that a project will use scikit-build, but rather that a person would be willing to work with me on it, for example to be attempted and compared. I can even get letters not tied to a project at all (such as for getting scikit-build links on websites). |
PS, unrelated, but Kitware will be helping us add a mechanism to CMake to discover configuration files stored in Python packages, and pybind11, NumPy, and SciPy will be using that to distribute configuration. This might be useful for the 1-3 sentence part - this would allow "extendable" (at the compiled level) Python packages to be much easier. |
@henryiii Ok, it certainly works if I sign it personally and/or on behalf of my employer, would you have a ready-to-use template? (really I'm not accustomed at all to such boilerplate :-)) |
Yes, this one. You can send it to henryfs @ princeton.edu. The letter must contain exactly the statement it already does, and nothing else (the guide is from https://www.nsf.gov/pubs/2021/nsf21617/nsf21617.htm, under Letters of Collaboration). And then there's 1-3 sentences in the facilities document that says anything you want, I'd put: Antoine Pitrou will work with us on PyArrow to try a scikit-build build system. This would (Feel free to edit or replace! Assuming https://arrow.apache.org/docs/python/extending.html would be really helped by the configuration plan) |
Does it need a hand-written signature?
This looks good to me. |
It can be a digitally added hand written signature. It doesn't need to be printed. |
And we are finalizing this in the next 2-3 hours. ;) |
Ok, sent it. |
Just to update here, #10919 makes installation from source much easier now. No need to build C++ separately, just |
We now have experimental pip wheels available. Between that and the relative ease of installation from source after the scikit-build conversion (although still slow, installation of the Python package from source is now a single, fairly robust command), I'm going to close this issue as completed. |
I would like to request to support more installation methods.
I don't use docker, so according to https://rapids.ai/start.html I have two other methods left: conda and build from source.
I would prefer to avoid conda due to September's change to their terms of use.
So I am left with building from source, which is no fun.
It would be great if there would be other methods, for example
virtualenv
(virtualenv
works pretty well for me so far)In the past there was
pip
but support for it was dropped, so probably it is still not an option.The text was updated successfully, but these errors were encountered: