Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QST] more installation methods #6979

Closed
jangorecki opened this issue Dec 11, 2020 · 36 comments
Closed

[QST] more installation methods #6979

jangorecki opened this issue Dec 11, 2020 · 36 comments
Labels
doc Documentation question Further information is requested

Comments

@jangorecki
Copy link

jangorecki commented Dec 11, 2020

I would like to request to support more installation methods.
I don't use docker, so according to https://rapids.ai/start.html I have two other methods left: conda and build from source.
I would prefer to avoid conda due to September's change to their terms of use.
So I am left with building from source, which is no fun.

It would be great if there would be other methods, for example

  • installing into virtualenv (virtualenv works pretty well for me so far)
  • installing from PPA apt packages

In the past there was pip but support for it was dropped, so probably it is still not an option.

@jangorecki jangorecki added Needs Triage Need team to review and classify question Further information is requested labels Dec 11, 2020
@kkraus14 kkraus14 added conda doc Documentation and removed Needs Triage Need team to review and classify labels Dec 11, 2020
@kkraus14
Copy link
Collaborator

@jangorecki the change of terms to conda's service only applies to the defaults channel in conda, it does not apply to the conda-forge, or rapidsai channels. We'll be confirming and removing the defaults channel from our installation instructions in the near future as we don't depend on anything in defaults directly.

@quasiben
Copy link
Member

This blogpost from conda-forge should clear things up regarding the change in the ToS and conda-forge package hosting:
https://conda-forge.org/blog/posts/2020-11-20-anaconda-tos/

I've also tested using rapids without defaults and things worked without issue

@github-actions
Copy link

This issue has been marked stale due to no recent activity in the past 30d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be marked rotten if there is no activity in the next 60d.

@github-actions github-actions bot added the stale label Feb 16, 2021
@vyasr
Copy link
Contributor

vyasr commented Mar 24, 2021

@kkraus14 are there any actionables on this issue?

  • Installing into virtualenv or venv is nothing special in Python, once you create the environment installation proceeds through all the normal paths (pip, conda if the env is created inside a conda environment, source).
  • A package published via an unofficial PPA would be nice, but I can't imagine that's anywhere on the radar.
  • Installation via pip would be fantastic, but I assume it was removed because it's a huge headache to maintain working wheels. I've had plenty of trouble getting wheels for projects with extension modules. We could look into cibuildwheel, which seems really nice. We're currently testing it out for some other projects that I work on, and I can also report back on how successful that process is if there's interest, but I don't know if there is any.

If there is indeed no plan for a PPA package and if we don't foresee any future changes w.r.t. PyPI support for RAPIDS, then I'd vote to close this issue.

@kkraus14
Copy link
Collaborator

@vyasr there's active work going on in making handling our dependency much simpler via CMake refactoring and hopefully moving to scikit-build to benefit from the same. Ideally someone cloning our repo, going into the Python directory, and then running pip install . or python setup.py install, should "just work".

That being said, pip packages or a PPA is still out of scope for now. Our first step is making building from source and controlling the dependencies as smooth as possible and then we'll re-evaluate our options as far as packaging goes.

@Atharex
Copy link

Atharex commented Apr 20, 2021

Even if pip or PPA is not planned for the near future, improving the experience of building from source would be great!
Also when building from source, it would be nice if we could reuse any existing CUDA installations and not duplicate libraries with separate CUDA installations over conda.

Simply cloning the repo and running python setup.py install would be a great approach in that direction!

@kkraus14
Copy link
Collaborator

Simply cloning the repo and running python setup.py install would be a great approach in that direction!

This is exactly what we're striving for. We basically have this for the C++ libcudf build, but isn't quite there yet for cuDF Python. The two main challenges are related to automated building of libcudf when running the python build, and handling the PyArrow CUDA pieces which aren't packaged as pip packages currently.

@henryiii
Copy link

there's active work going on in making handling our dependency much simpler via CMake refactoring and hopefully moving to scikit-build to benefit from the same

Please see https://iscinumpy.gitlab.io/post/scikit-build-proposal/ - would cuDF developers be interested in being listed in that proposal? I could help with the move as part of that work.

@vyasr
Copy link
Contributor

vyasr commented Nov 19, 2021

@henryiii I'm absolutely interested in this. I've integrated scikit-build into another project that I maintain before and uncovered some of the issues that were previously blockers for adoption in cudf. @kkraus14 was certainly interested in scikit-build at the time and aware of some of those blockers. @quasiben @shwina would you be interested in our pursuing this further? I'm happy to at least take point on starting the conversation here since I've built up a moderately complex Python/Cython/C++ scikit-build project from scratch before.

@shwina
Copy link
Contributor

shwina commented Nov 19, 2021

@henryiii Absolutely we would be interested! @vyasr thanks for offering to take point on this :-)

@kkraus14
Copy link
Collaborator

I never attempted to build cudf with scikit-build, but I did attempt to build RMM. The PR eventually stalled here: rapidsai/rmm#637.

IIRC the main blocker was not being able to ship docstrings in a release build of the Cython which has since been fixed, but more generally that the lever of effort and time it took to get a relatively small fix merged into scikit-build that @bdice contributed left me feeling discouraged in using scikit-build moving forward.

@henryiii
Copy link

If the proposal was accepted, I should be able to work on the project, rather than just squeezing in small amounts of time in between all my other projects (scikit-hep, pybind11, cibuildwheel, build, etc). We'd also have some paid time from KitWare too.

Renovating it with better testing, avoiding distutils internals, etc. would hopefully smooth things out going forward, as well, after the three year period. The package adoption phase would be ideally happen after some of the infrastructure was improved (it's a three part proposal, first part is working on the infrastructure, second package transition assistance, third tutorials and training).

@vyasr
Copy link
Contributor

vyasr commented Nov 19, 2021

@kkraus14 I agree, I've had the same concerns about adopting scikit-build after watching that PR stall (big shoutout to @bdice for following up on that while I was swamped with thesis work, he took my Slack messages and converted them into a workable issue/PR and followed up for a whole year!). I think what @henryiii is proposing for now is very low investment, basically just our expressing interest as a way to raise the profile of his proposal and perhaps providing some feedback as he gets to work. We would be able to track this effort over a sufficiently long period of time before we committed to any switch. We could revisit the discussion of actually switching sometime in that 1-3 year time frame depending on how this progresses. IIRC there were other issues that I identified while implementing skbuild for our project, for instance coverage metrics also don't work because of the way that skbuild sets up build paths, and we could be involved at the level of identifying such issues during early development. Thoughts?

@kkraus14
Copy link
Collaborator

Thoughts?

Sounds good to me. I've also sent the scikit-build proposal to some of my colleagues who help to maintain PyArrow to see if there's interest on that front as well.

@kkraus14
Copy link
Collaborator

cc @pitrou who expressed some interest from the PyArrow side

@henryiii
Copy link

Just to reiterate. basically what @vyasr has already said, mostly for @pitrou's sake:

I'm preparing a proposal that will enable scikit-build's development (which is currently minimal due to lack of funding) to move dramatically forward over the next three years. I am working on a set of science drivers that cover current libraries that are interested in adopting this (if it is successful, obviously); they are interested in funding projects with an expressed need from the scientific community, not "if I build it they will come" type projects. I think a rewritten modern scikit-build would be huge for the scientific community, and fills a gaping hole that's just going to get worse when Python 3.12 removes distutils. (And, scikit-build, like any 2014 era project, needs this rewrite to get off of distutils internals itself!).

I need a small science driver description (what would this bring to or enable in your project) which I can write, but input is helpful, and a letter of collaboration that states you'd be interested in the project if the proposal was accepted.

@pitrou
Copy link

pitrou commented Nov 21, 2021

@henryiii Thanks. It's not obvious to me who the proposal is aimed at. Are you looking for funding and/or institutional support?

Regardless, here is a quick summary about PyArrow:

  • Apache Arrow is a platform- and runtime-agnostic in-memory format for columnar data analytics; it has implementations in C++, Rust, Java, Python, R, C#, Javascript... and is getting traction as a zero-copy interchange format
  • PyArrow is the Python bindings to the library Arrow C++; it is a mix of custom C++ code (not using pybind11, FWIW), Cython code and pure-Python code, tied together using an unwieldy setup.py
  • PyArrow is a dependency of a non-trivial number of Python packages
  • PyArrow is an optional dependency of Pandas, for example for Parquet IO or an optimized string data type; tighter integration may happen in the future (cc @jorisvandenbossche in case he wants to elaborate)

For us, I think the main interest would be to largely simplify our specialized code in setup.py and perhaps CMakeLists.txt. There are some small complications that I can elaborate on later, but I don't think they should be a big deal.

@henryiii
Copy link

henryiii commented Nov 30, 2021

@pitrou Did I not respond? Sorry, I was traveling for Thanksgiving and thought I responded before.

I'm only looking for letters of collaboration to support the proposal; the funding would come from the NSF if accepted. I am looking for three things: A science driver statement explaining the impact on science that this collaboration will have, a description of how we will collaborate for the Facilities, Equipment, and Other Resources document, and a letter (template here) just saying you'd be happy to work with the PI (me) if the project was accepted. For most projects, I'd expect the second point will basically be that you will work with me to develop a scikit-build based build for your project (I would avoid promising you will use it, just that you will try it, that way you can decide to adopt it only if it's superior, which KitWare and I will try to make sure is true). I can write it then show it to you, or you can supply it. It's usually 1-2 sentences.

@vyasr could you provide a letter for cuDF, then? @pitrou for PyArrow? Also any suggestions on things to include in the science drivers or pointers to useful publications to cite would be helpful. @pitrou has provided a fantastic start for PyArrow. I'll get back to you when I have that section polished off.

@vyasr
Copy link
Contributor

vyasr commented Nov 30, 2021

@henryiii sure I can probably write that up pending approval from @quasiben. We'll probably want to touch base on the contents a little bit since packages like Arrow and RAPIDS are sufficiently removed from direct impacts on science that I'm not sure what the appropriate level of discussion is in our letter.

@pitrou
Copy link

pitrou commented Dec 1, 2021

Also cc @jorisvandenbossche for PyArrow

@beckernick
Copy link
Member

@dantegd @BradReesWork would this be of interest to cuML or cuGraph?

@henryiii
Copy link

henryiii commented Dec 1, 2021

I'd be happy to get a couple more letters if cuML or cuGraph is interested. It's just this template and then 1-3 sentences like:

* Letter from **Henry Schreiner, boost-histogram (Princeton University)**:

  Dr. Schreiner will work with the PI to integrate scikit-build into boost-histogram. 
  This will remove custom code and enable a new CMake configuration for downstream
  users to integrate boost-histogram into CMake and scikit-build code.

(I'm not actually writing a letter for myself, just making up an example.)

I'll also include a bit in the Science drivers, so suggestions there are welcome, such as references, maybe an image, etc.

@henryiii
Copy link

henryiii commented Dec 3, 2021

Can I get an idea of which projects (cuDF, PyArrow, cuML, cuGraph) and who will be able to supply a letter? I need to finalize the project description very soon. Letter themselves can be early next week, thought the sooner the better for my sanity. :) FYI, I've collected about half of my expected letters at this point, several collaborators will be supplying letter early next week too. But it still makes me nervous, especially when the topic is part of the science drivers, and would have to be removed if no letter was supplied. The "Facilities, Equipment, or Other Resources" document is easy to modify even next week if needed.

@vyasr
Copy link
Contributor

vyasr commented Dec 3, 2021

I can write up a letter for cuDF. I may not get it to you today, but I should be able to have it done by Monday. I could also provide one for RMM if @harrism approves since that transition was already attempted. I'll ping him to ask.

@henryiii
Copy link

henryiii commented Dec 3, 2021

That's okay, I'd just like to know what I can count on, and letters can be provided early next week. Thanks!

That leaves @pitrou or @jorisvandenbossche for PyArrow, are you planning on one too?

@pitrou
Copy link

pitrou commented Dec 5, 2021

Sorry, we were not aware that you had such a tight deadline :-(. Arrow is a community project under Apache governance and I doubt that we can sign a letter on behalf of the project without at least some discussion on our side (and perhaps a vote). Realistically, it would be at least three weeks (no idea if more, but unlikely to be less). So feel free to without a letter from us if you have enough letters already.

@henryiii
Copy link

henryiii commented Dec 5, 2021

such a tight deadline

Well, the original post was from October and we started discussing PyArrow two weeks ago, so it was originally not that tight, though slightly less than three weeks from when PyArrow was first involved.

you have enough letters already

I've been told that the best chance of funding is if I can prove this is not a "if we build it, they will come" type project, but rather one addressing a real and pressing need in the community (which I strongly believe it is), and the key way to show that is to provide lots of letters of collaboration. Of all the 50+ pages in the proposal, the collaboration letters show that the best.

sign a letter on behalf of the project

This is fine, and absolutely up to you, but just to be clear (for everyone else, as well), I can't get a letter from a collaboration anyway. It's a letter from an individual that says "If the proposal submitted by Dr. Henry Schreiner entitled “Elements: Simplifying Compiled Python Packaging in the Sciences” is selected for funding by NSF, it is my intent to collaborate and/or commit resources as detailed in the Project Description or the Facilities, Equipment or Other Resources section of the proposal.". No other text is even allowed. The Facilities, Equipment or Other Resources section can have 1-3 sentences, such as "Dr. So-and-so will work with PI Schreiner to evaluate using scikit-build as the build system for such-and-such project. This will enable blah-blah that would enable us to blah blah", something like that, but really anything you want. It's not a promise that a project will use scikit-build, but rather that a person would be willing to work with me on it, for example to be attempted and compared. I can even get letters not tied to a project at all (such as for getting scikit-build links on websites).

@henryiii
Copy link

henryiii commented Dec 5, 2021

PS, unrelated, but Kitware will be helping us add a mechanism to CMake to discover configuration files stored in Python packages, and pybind11, NumPy, and SciPy will be using that to distribute configuration. This might be useful for the 1-3 sentence part - this would allow "extendable" (at the compiled level) Python packages to be much easier.

@pitrou
Copy link

pitrou commented Dec 7, 2021

@henryiii Ok, it certainly works if I sign it personally and/or on behalf of my employer, would you have a ready-to-use template? (really I'm not accustomed at all to such boilerplate :-))

@henryiii
Copy link

henryiii commented Dec 7, 2021

Yes, this one. You can send it to henryfs @ princeton.edu. The letter must contain exactly the statement it already does, and nothing else (the guide is from https://www.nsf.gov/pubs/2021/nsf21617/nsf21617.htm, under Letters of Collaboration).

And then there's 1-3 sentences in the facilities document that says anything you want, I'd put:

Antoine Pitrou will work with us on PyArrow to try a scikit-build build system. This would
simplify the developer experience and encourage new contributions. We will also work to provide a CMake configuration to allow external compiled modules to be much easier to create.

(Feel free to edit or replace! Assuming https://arrow.apache.org/docs/python/extending.html would be really helped by the configuration plan)

@pitrou
Copy link

pitrou commented Dec 8, 2021

Yes, this one. You can send it to henryfs @ princeton.edu.

Does it need a hand-written signature?

Antoine Pitrou will work with us on PyArrow to try a scikit-build build system. This would simplify the developer experience and encourage new contributions. We will also work to provide a CMake configuration to allow external compiled modules to be much easier to create.

This looks good to me.

@henryiii
Copy link

henryiii commented Dec 8, 2021

It can be a digitally added hand written signature. It doesn't need to be printed.

@henryiii
Copy link

henryiii commented Dec 8, 2021

And we are finalizing this in the next 2-3 hours. ;)

@pitrou
Copy link

pitrou commented Dec 8, 2021

Ok, sent it.

@vyasr
Copy link
Contributor

vyasr commented Jul 12, 2022

Just to update here, #10919 makes installation from source much easier now. No need to build C++ separately, just pip install . should do the trick (it will take a very long time, though, since our C++ libraries are huge and can take >1 hour to build on a typical laptop).

@vyasr
Copy link
Contributor

vyasr commented Oct 20, 2022

We now have experimental pip wheels available. Between that and the relative ease of installation from source after the scikit-build conversion (although still slow, installation of the Python package from source is now a single, fairly robust command), I'm going to close this issue as completed.

@vyasr vyasr closed this as completed Oct 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc Documentation question Further information is requested
Projects
None yet
Development

No branches or pull requests

9 participants