Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0.7updates #43

Closed
wants to merge 3 commits into from
Closed

0.7updates #43

wants to merge 3 commits into from

Conversation

dmbates
Copy link
Contributor

@dmbates dmbates commented Jul 25, 2018

No description provided.

@alyst
Copy link
Collaborator

alyst commented Jul 25, 2018

Thanks! Please feel free to pick up my fixes from this branch. It should pass the tests.

I think we should separate the necessary updates to Julia 0.7 and FileIO 1.0.0 from adding the support to .xz/.bzip2, which should go into another PR.

I'm not so sure about contextify() (in my branch I renamed it to RDAContext, since it's essentially a high-level RDAContext ctor):

  • it moves part of the format detection logic to context.jl making it harder to maintain
  • contextify() may optionally create a new decompressor IO stream, but it's never closed. Whereas the underlying original io is explicitly closed by open() do io ..end. The proper way is to only close the decompressor stream. In the previous version we made sure that the decompressed stream is always closed (even if there's exception).

Previously the support of multiple compression formats was discussed in #31. RData is used by e.g. RDatasets, which, in turn, is used in the unit tests of several packages.
So if RData will require all the decompression codecs, it will bring a lot of unnecessary dependencies (+binary ones) to multiple downstream packages.
I think the better way would be to make compression codecs optional.
It looks like now the use of Requires.jl is approved by the core Julia team (JuliaLang/julia#2025), so we should be able to use it to require specific codec depending on the RData file.
Actually, compression format detection is quite a common problem, maybe it's possible to provide some generic support for it in FileIO? (cc @RalphAS)

@dmbates
Copy link
Contributor Author

dmbates commented Jul 27, 2018

Regarding the different compression schemes, would it be possible to build the detection of the compression scheme into a detection function that would be in FileIO/src/registry.jl and have that always return a stream? It may be possible to have FileIO load the compression package such as CodecBzip2 on demand. That way the RData package may be able to avoid loading any of the Codec*packages.

I appreciate that the vast majority of .rda or .RData files are compressed with Gzip because that is the default. However, xz compression can make a big difference in file sizes when working with large data sets.

@andreasnoack andreasnoack mentioned this pull request Aug 10, 2018
@alyst
Copy link
Collaborator

alyst commented Aug 11, 2018

Thanks for working on this! Closing as #44 was merged.

@alyst alyst closed this Aug 11, 2018
@alyst alyst deleted the 0.7updates branch November 6, 2018 13:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants