Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make netCDF4 optional. #2029

Closed
wants to merge 1 commit into from
Closed

Conversation

twiecki
Copy link
Contributor

@twiecki twiecki commented May 13, 2022

Closes #2028.

@codecov
Copy link

codecov bot commented May 13, 2022

Codecov Report

Merging #2029 (27720b8) into main (c78deef) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##             main    #2029   +/-   ##
=======================================
  Coverage   90.64%   90.64%           
=======================================
  Files         117      117           
  Lines       12553    12554    +1     
=======================================
+ Hits        11379    11380    +1     
  Misses       1174     1174           
Impacted Files Coverage Δ
arviz/data/inference_data.py 83.04% <100.00%> (+0.02%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c78deef...27720b8. Read the comment docs.

@OriolAbril
Copy link
Member

This will break a lot of code and environments, we need to be careful about how to do this, I don't think removing the dependency straight away is a good idea.

The whole of ArviZ assumes netcdf files can be stored and read. For example, the doc failures are due to that, using arviz.load_arviz_data reads netcdf files (either packaged with the library or after downloading from figshare) and most documentation examples are based on this. Changing that will also mean that after doing pip install arviz people won't be able to save inferencedata as netcdf as they are now

@twiecki
Copy link
Contributor Author

twiecki commented May 14, 2022

OK, maybe it should stay a dependency then?

@OriolAbril
Copy link
Member

I think it should be possible to install ArviZ without installing libraries that are not required throughout the library (netcdf, but also matplotlib for example like I said before), but I am also worried about making pip install arviz default to an installation that can't save to file nor plot (in the extreme) unless some of the possible optional dependencies are installed manually as well as backward compatibility issues.

Let's see what the rest of the team thinks. One thing we discussed was splitting the library in data/stats/plots but releasing in sync for simplicity and keeping arviz as the bundle of the three (plus maybe netcdf and/or matplotlib). But it might be better to rip the band aid and have arviz make a minimal install and recommend installing arviz[things]

@ColCarroll
Copy link
Member

I don't recall if it is possible to do like, pip install arviz[minimal] -- would that help with this? I'm pretty happy to allow bespoke [special_install_requires] if it allows usage.

@canyon289
Copy link
Member

canyon289 commented May 16, 2022

Catching up. Agree with Oriol, We definitely can't do it this way this "hardline" way its going to break a lot of things. I think we have do it the way Colin recommends, which is netcdf by default, but it can be turned off with a flag.

We should do this to enable support for pyscript, and I think jupyterlite as well since it runs on pyodide. Those would both be awesome tools to allow users to test out arviz or learn statistical computing without needing to install a local env.

Also note for the pyscript usecase we'll also need to compile both wheels everytime we release as well so its fetchable by the pyscript users. Outside of the scope of this PR but something to note

@OriolAbril
Copy link
Member

OriolAbril commented May 18, 2022

I don't recall if it is possible to do like, pip install arviz[minimal] -- would that help with this?

That would be amazing but I don't think it is possible. afaik the [name] can only be used to add dependencies to the set of requirements. Does someone know how to use this [] syntax or some kind of alternative to allow this?

I'm pretty happy to allow bespoke [special_install_requires] if it allows usage.

Just to make sure we are all on the same page, we already provide arviz[all] to install all requirements and optional dependencies at once. See https://python.arviz.org/en/latest/getting_started/Installation.html

Also note for the pyscript usecase we'll also need to compile both wheels everytime we release as well so its fetchable by the pyscript users. Outside of the scope of this PR but something to note

Whenever we release, the Azure job already builds a wheel and uploads it to pypi, see https://pypi.org/project/arviz/#files. Only one wheel is generated though and you mentioned "both wheels" @canyon289 can you create an issue with what is missing so we remember to update the release Azure job?

@canyon289
Copy link
Member

Whenever we release, the Azure job already builds a wheel and uploads it to pypi, see https://pypi.org/project/arviz/#files. Only one wheel is generated though and you mentioned "both wheels" @canyon289 can you create an issue with what is missing so we remember to update the release Azure job?

Sure but we only need to do this if we go through with the proposal above which its uncertain if we will

@varchasgopalaswamy
Copy link
Contributor

What about using h5netcdf? That might make it a lot less heavy since the netCDF4 C binaries won't be necessary.

@twiecki
Copy link
Contributor Author

twiecki commented Sep 27, 2022

Closing in favor of #2122.

@twiecki twiecki closed this Sep 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make netcdf4 dependency optional
5 participants