[QST] more installation methods #6979

jangorecki · 2020-12-11T09:58:10Z

I would like to request to support more installation methods.
I don't use docker, so according to https://rapids.ai/start.html I have two other methods left: conda and build from source.
I would prefer to avoid conda due to September's change to their terms of use.
So I am left with building from source, which is no fun.

It would be great if there would be other methods, for example

installing into virtualenv (virtualenv works pretty well for me so far)
installing from PPA apt packages

In the past there was pip but support for it was dropped, so probably it is still not an option.

The text was updated successfully, but these errors were encountered:

kkraus14 · 2020-12-11T16:18:48Z

@jangorecki the change of terms to conda's service only applies to the defaults channel in conda, it does not apply to the conda-forge, or rapidsai channels. We'll be confirming and removing the defaults channel from our installation instructions in the near future as we don't depend on anything in defaults directly.

quasiben · 2020-12-13T03:28:25Z

This blogpost from conda-forge should clear things up regarding the change in the ToS and conda-forge package hosting:
https://conda-forge.org/blog/posts/2020-11-20-anaconda-tos/

I've also tested using rapids without defaults and things worked without issue

github-actions · 2021-02-16T20:20:06Z

This issue has been marked stale due to no recent activity in the past 30d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be marked rotten if there is no activity in the next 60d.

vyasr · 2021-03-24T17:31:13Z

@kkraus14 are there any actionables on this issue?

Installing into virtualenv or venv is nothing special in Python, once you create the environment installation proceeds through all the normal paths (pip, conda if the env is created inside a conda environment, source).
A package published via an unofficial PPA would be nice, but I can't imagine that's anywhere on the radar.
Installation via pip would be fantastic, but I assume it was removed because it's a huge headache to maintain working wheels. I've had plenty of trouble getting wheels for projects with extension modules. We could look into cibuildwheel, which seems really nice. We're currently testing it out for some other projects that I work on, and I can also report back on how successful that process is if there's interest, but I don't know if there is any.

If there is indeed no plan for a PPA package and if we don't foresee any future changes w.r.t. PyPI support for RAPIDS, then I'd vote to close this issue.

kkraus14 · 2021-03-24T17:37:11Z

@vyasr there's active work going on in making handling our dependency much simpler via CMake refactoring and hopefully moving to scikit-build to benefit from the same. Ideally someone cloning our repo, going into the Python directory, and then running pip install . or python setup.py install, should "just work".

That being said, pip packages or a PPA is still out of scope for now. Our first step is making building from source and controlling the dependencies as smooth as possible and then we'll re-evaluate our options as far as packaging goes.

Atharex · 2021-04-20T15:43:25Z

Even if pip or PPA is not planned for the near future, improving the experience of building from source would be great!
Also when building from source, it would be nice if we could reuse any existing CUDA installations and not duplicate libraries with separate CUDA installations over conda.

Simply cloning the repo and running python setup.py install would be a great approach in that direction!

kkraus14 · 2021-04-20T21:20:09Z

Simply cloning the repo and running python setup.py install would be a great approach in that direction!

This is exactly what we're striving for. We basically have this for the C++ libcudf build, but isn't quite there yet for cuDF Python. The two main challenges are related to automated building of libcudf when running the python build, and handling the PyArrow CUDA pieces which aren't packaged as pip packages currently.

henryiii · 2021-11-19T18:59:01Z

there's active work going on in making handling our dependency much simpler via CMake refactoring and hopefully moving to scikit-build to benefit from the same

Please see https://iscinumpy.gitlab.io/post/scikit-build-proposal/ - would cuDF developers be interested in being listed in that proposal? I could help with the move as part of that work.

vyasr · 2021-11-19T20:02:19Z

@henryiii I'm absolutely interested in this. I've integrated scikit-build into another project that I maintain before and uncovered some of the issues that were previously blockers for adoption in cudf. @kkraus14 was certainly interested in scikit-build at the time and aware of some of those blockers. @quasiben @shwina would you be interested in our pursuing this further? I'm happy to at least take point on starting the conversation here since I've built up a moderately complex Python/Cython/C++ scikit-build project from scratch before.

shwina · 2021-11-19T21:00:31Z

@henryiii Absolutely we would be interested! @vyasr thanks for offering to take point on this :-)

kkraus14 · 2021-11-19T21:47:20Z

I never attempted to build cudf with scikit-build, but I did attempt to build RMM. The PR eventually stalled here: rapidsai/rmm#637.

IIRC the main blocker was not being able to ship docstrings in a release build of the Cython which has since been fixed, but more generally that the lever of effort and time it took to get a relatively small fix merged into scikit-build that @bdice contributed left me feeling discouraged in using scikit-build moving forward.

henryiii · 2021-11-19T21:55:06Z

If the proposal was accepted, I should be able to work on the project, rather than just squeezing in small amounts of time in between all my other projects (scikit-hep, pybind11, cibuildwheel, build, etc). We'd also have some paid time from KitWare too.

Renovating it with better testing, avoiding distutils internals, etc. would hopefully smooth things out going forward, as well, after the three year period. The package adoption phase would be ideally happen after some of the infrastructure was improved (it's a three part proposal, first part is working on the infrastructure, second package transition assistance, third tutorials and training).

vyasr · 2021-11-19T22:23:53Z

@kkraus14 I agree, I've had the same concerns about adopting scikit-build after watching that PR stall (big shoutout to @bdice for following up on that while I was swamped with thesis work, he took my Slack messages and converted them into a workable issue/PR and followed up for a whole year!). I think what @henryiii is proposing for now is very low investment, basically just our expressing interest as a way to raise the profile of his proposal and perhaps providing some feedback as he gets to work. We would be able to track this effort over a sufficiently long period of time before we committed to any switch. We could revisit the discussion of actually switching sometime in that 1-3 year time frame depending on how this progresses. IIRC there were other issues that I identified while implementing skbuild for our project, for instance coverage metrics also don't work because of the way that skbuild sets up build paths, and we could be involved at the level of identifying such issues during early development. Thoughts?

kkraus14 · 2021-11-19T22:53:38Z

Thoughts?

Sounds good to me. I've also sent the scikit-build proposal to some of my colleagues who help to maintain PyArrow to see if there's interest on that front as well.

kkraus14 · 2021-11-20T20:44:42Z

cc @pitrou who expressed some interest from the PyArrow side

henryiii · 2021-11-20T21:57:10Z

Just to reiterate. basically what @vyasr has already said, mostly for @pitrou's sake:

I'm preparing a proposal that will enable scikit-build's development (which is currently minimal due to lack of funding) to move dramatically forward over the next three years. I am working on a set of science drivers that cover current libraries that are interested in adopting this (if it is successful, obviously); they are interested in funding projects with an expressed need from the scientific community, not "if I build it they will come" type projects. I think a rewritten modern scikit-build would be huge for the scientific community, and fills a gaping hole that's just going to get worse when Python 3.12 removes distutils. (And, scikit-build, like any 2014 era project, needs this rewrite to get off of distutils internals itself!).

I need a small science driver description (what would this bring to or enable in your project) which I can write, but input is helpful, and a letter of collaboration that states you'd be interested in the project if the proposal was accepted.

pitrou · 2021-11-21T16:11:24Z

@henryiii Thanks. It's not obvious to me who the proposal is aimed at. Are you looking for funding and/or institutional support?

Regardless, here is a quick summary about PyArrow:

Apache Arrow is a platform- and runtime-agnostic in-memory format for columnar data analytics; it has implementations in C++, Rust, Java, Python, R, C#, Javascript... and is getting traction as a zero-copy interchange format
PyArrow is the Python bindings to the library Arrow C++; it is a mix of custom C++ code (not using pybind11, FWIW), Cython code and pure-Python code, tied together using an unwieldy setup.py
PyArrow is a dependency of a non-trivial number of Python packages
PyArrow is an optional dependency of Pandas, for example for Parquet IO or an optimized string data type; tighter integration may happen in the future (cc @jorisvandenbossche in case he wants to elaborate)

For us, I think the main interest would be to largely simplify our specialized code in setup.py and perhaps CMakeLists.txt. There are some small complications that I can elaborate on later, but I don't think they should be a big deal.

henryiii · 2021-11-30T03:39:42Z

@pitrou Did I not respond? Sorry, I was traveling for Thanksgiving and thought I responded before.

I'm only looking for letters of collaboration to support the proposal; the funding would come from the NSF if accepted. I am looking for three things: A science driver statement explaining the impact on science that this collaboration will have, a description of how we will collaborate for the Facilities, Equipment, and Other Resources document, and a letter (template here) just saying you'd be happy to work with the PI (me) if the project was accepted. For most projects, I'd expect the second point will basically be that you will work with me to develop a scikit-build based build for your project (I would avoid promising you will use it, just that you will try it, that way you can decide to adopt it only if it's superior, which KitWare and I will try to make sure is true). I can write it then show it to you, or you can supply it. It's usually 1-2 sentences.

@vyasr could you provide a letter for cuDF, then? @pitrou for PyArrow? Also any suggestions on things to include in the science drivers or pointers to useful publications to cite would be helpful. @pitrou has provided a fantastic start for PyArrow. I'll get back to you when I have that section polished off.

vyasr · 2021-11-30T20:34:17Z

@henryiii sure I can probably write that up pending approval from @quasiben. We'll probably want to touch base on the contents a little bit since packages like Arrow and RAPIDS are sufficiently removed from direct impacts on science that I'm not sure what the appropriate level of discussion is in our letter.

pitrou · 2021-12-01T11:42:47Z

Also cc @jorisvandenbossche for PyArrow

beckernick · 2021-12-01T15:47:59Z

@dantegd @BradReesWork would this be of interest to cuML or cuGraph?

henryiii · 2021-12-01T18:55:06Z

I'd be happy to get a couple more letters if cuML or cuGraph is interested. It's just this template and then 1-3 sentences like:

* Letter from **Henry Schreiner, boost-histogram (Princeton University)**:

  Dr. Schreiner will work with the PI to integrate scikit-build into boost-histogram. 
  This will remove custom code and enable a new CMake configuration for downstream
  users to integrate boost-histogram into CMake and scikit-build code.

(I'm not actually writing a letter for myself, just making up an example.)

I'll also include a bit in the Science drivers, so suggestions there are welcome, such as references, maybe an image, etc.

henryiii · 2021-12-03T20:19:38Z

Can I get an idea of which projects (cuDF, PyArrow, cuML, cuGraph) and who will be able to supply a letter? I need to finalize the project description very soon. Letter themselves can be early next week, thought the sooner the better for my sanity. :) FYI, I've collected about half of my expected letters at this point, several collaborators will be supplying letter early next week too. But it still makes me nervous, especially when the topic is part of the science drivers, and would have to be removed if no letter was supplied. The "Facilities, Equipment, or Other Resources" document is easy to modify even next week if needed.

vyasr · 2021-12-03T21:53:40Z

I can write up a letter for cuDF. I may not get it to you today, but I should be able to have it done by Monday. I could also provide one for RMM if @harrism approves since that transition was already attempted. I'll ping him to ask.

henryiii · 2021-12-03T22:04:10Z

That's okay, I'd just like to know what I can count on, and letters can be provided early next week. Thanks!

That leaves @pitrou or @jorisvandenbossche for PyArrow, are you planning on one too?

pitrou · 2021-12-05T16:21:49Z

Sorry, we were not aware that you had such a tight deadline :-(. Arrow is a community project under Apache governance and I doubt that we can sign a letter on behalf of the project without at least some discussion on our side (and perhaps a vote). Realistically, it would be at least three weeks (no idea if more, but unlikely to be less). So feel free to without a letter from us if you have enough letters already.

henryiii · 2021-12-05T17:32:29Z

such a tight deadline

Well, the original post was from October and we started discussing PyArrow two weeks ago, so it was originally not that tight, though slightly less than three weeks from when PyArrow was first involved.

you have enough letters already

I've been told that the best chance of funding is if I can prove this is not a "if we build it, they will come" type project, but rather one addressing a real and pressing need in the community (which I strongly believe it is), and the key way to show that is to provide lots of letters of collaboration. Of all the 50+ pages in the proposal, the collaboration letters show that the best.

sign a letter on behalf of the project

This is fine, and absolutely up to you, but just to be clear (for everyone else, as well), I can't get a letter from a collaboration anyway. It's a letter from an individual that says "If the proposal submitted by Dr. Henry Schreiner entitled “Elements: Simplifying Compiled Python Packaging in the Sciences” is selected for funding by NSF, it is my intent to collaborate and/or commit resources as detailed in the Project Description or the Facilities, Equipment or Other Resources section of the proposal.". No other text is even allowed. The Facilities, Equipment or Other Resources section can have 1-3 sentences, such as "Dr. So-and-so will work with PI Schreiner to evaluate using scikit-build as the build system for such-and-such project. This will enable blah-blah that would enable us to blah blah", something like that, but really anything you want. It's not a promise that a project will use scikit-build, but rather that a person would be willing to work with me on it, for example to be attempted and compared. I can even get letters not tied to a project at all (such as for getting scikit-build links on websites).

henryiii · 2021-12-05T17:41:12Z

PS, unrelated, but Kitware will be helping us add a mechanism to CMake to discover configuration files stored in Python packages, and pybind11, NumPy, and SciPy will be using that to distribute configuration. This might be useful for the 1-3 sentence part - this would allow "extendable" (at the compiled level) Python packages to be much easier.

pitrou · 2021-12-07T11:08:56Z

@henryiii Ok, it certainly works if I sign it personally and/or on behalf of my employer, would you have a ready-to-use template? (really I'm not accustomed at all to such boilerplate :-))

henryiii · 2021-12-07T14:29:30Z

Yes, this one. You can send it to henryfs @ princeton.edu. The letter must contain exactly the statement it already does, and nothing else (the guide is from https://www.nsf.gov/pubs/2021/nsf21617/nsf21617.htm, under Letters of Collaboration).

And then there's 1-3 sentences in the facilities document that says anything you want, I'd put:

Antoine Pitrou will work with us on PyArrow to try a scikit-build build system. This would
simplify the developer experience and encourage new contributions. We will also work to provide a CMake configuration to allow external compiled modules to be much easier to create.

(Feel free to edit or replace! Assuming https://arrow.apache.org/docs/python/extending.html would be really helped by the configuration plan)

pitrou · 2021-12-08T17:32:11Z

Yes, this one. You can send it to henryfs @ princeton.edu.

Does it need a hand-written signature?

Antoine Pitrou will work with us on PyArrow to try a scikit-build build system. This would simplify the developer experience and encourage new contributions. We will also work to provide a CMake configuration to allow external compiled modules to be much easier to create.

This looks good to me.

henryiii · 2021-12-08T17:33:51Z

It can be a digitally added hand written signature. It doesn't need to be printed.

henryiii · 2021-12-08T17:34:35Z

And we are finalizing this in the next 2-3 hours. ;)

pitrou · 2021-12-08T17:42:18Z

Ok, sent it.

vyasr · 2022-07-12T20:41:39Z

Just to update here, #10919 makes installation from source much easier now. No need to build C++ separately, just pip install . should do the trick (it will take a very long time, though, since our C++ libraries are huge and can take >1 hour to build on a typical laptop).

vyasr · 2022-10-20T17:02:11Z

We now have experimental pip wheels available. Between that and the relative ease of installation from source after the scikit-build conversion (although still slow, installation of the Python package from source is now a single, fairly robust command), I'm going to close this issue as completed.

jangorecki added Needs Triage Need team to review and classify question Further information is requested labels Dec 11, 2020

kkraus14 added conda doc Documentation and removed Needs Triage Need team to review and classify labels Dec 11, 2020

github-actions bot added the stale label Feb 16, 2021

github-actions bot removed the inactive-30d label Mar 24, 2021

shwina mentioned this issue Mar 7, 2022

[FEA] PyPi release #10382

Closed

jrhemstad mentioned this issue Apr 13, 2022

Revise CONTRIBUTING.md #10644

Merged

vyasr closed this as completed Oct 20, 2022

[QST] more installation methods #6979

[QST] more installation methods #6979

Comments

jangorecki commented Dec 11, 2020 • edited Loading

kkraus14 commented Dec 11, 2020

quasiben commented Dec 13, 2020

github-actions bot commented Feb 16, 2021

vyasr commented Mar 24, 2021

kkraus14 commented Mar 24, 2021

Atharex commented Apr 20, 2021

kkraus14 commented Apr 20, 2021

henryiii commented Nov 19, 2021

vyasr commented Nov 19, 2021

shwina commented Nov 19, 2021

kkraus14 commented Nov 19, 2021

henryiii commented Nov 19, 2021

vyasr commented Nov 19, 2021

kkraus14 commented Nov 19, 2021

kkraus14 commented Nov 20, 2021

henryiii commented Nov 20, 2021

pitrou commented Nov 21, 2021

henryiii commented Nov 30, 2021 • edited Loading

vyasr commented Nov 30, 2021

pitrou commented Dec 1, 2021

beckernick commented Dec 1, 2021

henryiii commented Dec 1, 2021

henryiii commented Dec 3, 2021

vyasr commented Dec 3, 2021

henryiii commented Dec 3, 2021

pitrou commented Dec 5, 2021

henryiii commented Dec 5, 2021 • edited Loading

henryiii commented Dec 5, 2021

pitrou commented Dec 7, 2021 • edited Loading

henryiii commented Dec 7, 2021 • edited Loading

pitrou commented Dec 8, 2021

henryiii commented Dec 8, 2021

henryiii commented Dec 8, 2021

pitrou commented Dec 8, 2021

vyasr commented Jul 12, 2022

vyasr commented Oct 20, 2022

jangorecki commented Dec 11, 2020 •

edited

Loading

henryiii commented Nov 30, 2021 •

edited

Loading

henryiii commented Dec 5, 2021 •

edited

Loading

pitrou commented Dec 7, 2021 •

edited

Loading

henryiii commented Dec 7, 2021 •

edited

Loading