Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add compression encoding attributes to xarray variables by default? #441

Closed
deeplycloudy opened this issue Jul 31, 2024 · 0 comments
Closed
Labels
discussion A discussion between contributors is needed to choose the best course of action enhancement Addition of new features, or improved functionality of existing features

Comments

@deeplycloudy
Copy link
Contributor

Some data structures in tobac, for example segmentation masks used to indicate the spatial extent of features, are very sparse and compress fabulously well - about three orders of magnitude, from 2.6 GB to 2.9 MB in some examples @wx4stg pointed out. Users can therefore easily create large files if they choose to save those data.

In my experience, many/most users don't know to turn on compression when they save NetCDF data, and tobac could help those users out by adding the compression encoding attributes to any multidimensional 2D xarray variables it creates.

There are certainly some design challenges here:

  1. detecting where and when to compress
  2. preserving attributes as data flows through the library
  3. the basic fact that writing xarray data structures is really handled by that library, not tobac, and xarray supports multiple output formats with different encoding parameters for compression

@kelcyno had a function in #136 that did this, but that was deferred for later discussion as part of 2.0. I wanted to raise the idea again now that the xarray work is well underway.

@deeplycloudy deeplycloudy added enhancement Addition of new features, or improved functionality of existing features discussion A discussion between contributors is needed to choose the best course of action labels Jul 31, 2024
@tobac-project tobac-project locked and limited conversation to collaborators Oct 1, 2024
@w-k-jones w-k-jones converted this issue into discussion #456 Oct 1, 2024

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
discussion A discussion between contributors is needed to choose the best course of action enhancement Addition of new features, or improved functionality of existing features
Projects
None yet
Development

No branches or pull requests

1 participant