Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Swapped the netCDF4 dependency to h5netcdf #2122

Merged
merged 12 commits into from
Dec 28, 2022

Conversation

varchasgopalaswamy
Copy link
Contributor

@varchasgopalaswamy varchasgopalaswamy commented Sep 26, 2022

Description

Addresses #2029 and #2028 by swapping out the netCDF4 dependency to h5netcdf, which should be a lighter dependency, since it avoids needing the netCDF4 C binaries.

Checklist

  • Follows official PR format
  • Includes new or updated tests to cover the new feature
  • Code style correct (follows pylint and black guidelines)
  • Changes are listed in changelog

📚 Documentation preview 📚: https://arviz--2122.org.readthedocs.build/en/2122/

@twiecki twiecki mentioned this pull request Sep 27, 2022
@ahartikainen
Copy link
Contributor

I think you need to define engine

https://docs.xarray.dev/en/stable/user-guide/io.html

We probably should define engine in rcParams so that could be changed if needed?

cc @OriolAbril

Copy link
Contributor

@ahartikainen ahartikainen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of comments

arviz/data/inference_data.py Outdated Show resolved Hide resolved
requirements.txt Outdated Show resolved Hide resolved
arviz/data/inference_data.py Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
arviz/data/inference_data.py Outdated Show resolved Hide resolved
Copy link
Member

@OriolAbril OriolAbril left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about the late review. I don't know much about the differences between the two libraries. Can you develop a bit on the differences and advantages of h5netcdf over netcdf4 package? Seeing .legacy in the import makes me a bit nervous.

data.to_netcdf(filename, mode=mode, group=group, **kwargs)
data.close()
mode = "a"
else: # creates a netcdf file for an empty InferenceData object.
empty_netcdf_file = nc.Dataset(filename, mode="w", format="NETCDF4")
empty_netcdf_file = nc.Dataset(filename, mode="w")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the format always netcdf4 with h4netcdf?

Copy link
Contributor Author

@varchasgopalaswamy varchasgopalaswamy Oct 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @OriolAbril,

The basic reason for this is to remove the dependency on the netCDF4 C library. Since netCDF4 is a layer on top of HDF5, you could imagine using a basic HDF5 library in python to implement the netCDF4 standard, instead of calling netCDF4 C routines. h5netcdf does this using the h5py library.

I find installing the netCDF4 C binaries to be quite painful, especially when I need to support linux, windows, intel OSX, Silicon OSX, etc. On Windows, there are some times issues with binary incompatibilities between the HDF5 libraries installed by h5py and netCDF4 on conda Switching to using h5py as the backend would make the install significantly more straightforward. I think most scientific python users probably already have h5py installed, so practically I would imagine this wouldn't actually be a new dependency in those cases.

I see where you're coming from with something called legacy showing up in an import - but I believe that the legacy API you saw allows h5netcdf to be used as a 'drop-in' replacement for netCDF4 while the 'new' API looks more like h5py, rather than this being some kind of deprecated support issue.

Is the format always netcdf4 with h4netcdf?

Not sure, but I would guess yes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, FWIW xarray seems to fully support h5netcdf as a backend. That's an appeal to authority, but it does seem to be ok for them!

@theorashid
Copy link

theorashid commented Oct 26, 2022

Hey, just to let you know the netcdf4 import is currently failing on the latest python image. So on that particular image, I can't install arviz.

netcdf4 is always a pain to install, so having a more reliable dependency would be great (and a v0.13.1 release)

@OriolAbril
Copy link
Member

I have rebased to fix the git conflicts and done some changes to allow either library as io engine. I think that makes sense as imo long term we shouldn't have either as a dependency and allow users to install whichever they one (or none if they don't use that subset of features).

Thanks for your patience, I'll merge in the following days.

@codecov
Copy link

codecov bot commented Dec 22, 2022

Codecov Report

Merging #2122 (f782561) into main (0f8c9f7) will decrease coverage by 0.03%.
The diff coverage is 76.00%.

@@            Coverage Diff             @@
##             main    #2122      +/-   ##
==========================================
- Coverage   89.96%   89.92%   -0.04%     
==========================================
  Files         119      119              
  Lines       12408    12419      +11     
==========================================
+ Hits        11163    11168       +5     
- Misses       1245     1251       +6     
Impacted Files Coverage Δ
arviz/data/inference_data.py 82.51% <66.66%> (-0.54%) ⬇️
arviz/data/datasets.py 98.48% <100.00%> (ø)
arviz/data/io_netcdf.py 100.00% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@ahartikainen ahartikainen merged commit a2bdfda into arviz-devs:main Dec 28, 2022
bmwiedemann pushed a commit to bmwiedemann/openSUSE that referenced this pull request Mar 22, 2023
https://build.opensuse.org/request/show/1073712
by user dgarcia + dimstar_suse
- skip python 3.11 and python 3.8 because some dependencies doesn't
  support that python versions
- Update to 0.15.1
  - Fix memory usage and improve efficiency in `from_emcee`
    ([2215](arviz-devs/arviz#2215))
  - Lower pandas version needed
    ([2217](arviz-devs/arviz#2217))
- 0.15.0
  - Adds Savage-Dickey density ratio plot for Bayes factor
    approximation.
    ([2037](arviz-devs/arviz#2037),
    [2152](arviz-devs/arviz#2152))
  - Add `CmdStanPySamplingWrapper` and `PyMCSamplingWrapper` classes
    ([2158](arviz-devs/arviz#2158))
  - Changed dependency on netcdf4-python to h5netcdf
    ([2122](arviz-devs/arviz#2122))
  - Fix `reloo` outdate
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants