-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Storing Chains Retrievably in Stable Data Formats #299
Comments
I agree, the current state is a bit unsatisfying.
The docs mention JLSO which saves also additional metadata such as Julia version and the package versions such that the state of the serialization is reproducible again. Maybe that could solve some of your issues? Additionally, the Tables interface is implemented for Chains, so every output format that support this interface (such as DataFrames) is supported automatically (although, in this case only the samples and no metadata are available). Personally, I think this package is already quite heavy and hence I think one should not add a dependency on HDF5 (or other storage formats some other users might be interested in). However, I guess it could be helpful to define it in a separate package (or at least mention it in the docs, but I guess a package would be good). |
I would love to see a separate package that handles saving to HDF5. I absolutely hate that we serialize chains to save them. How big is the HDF5 dep? |
I don't really know. (It does pull in a C library for HDF5 which is mature, but may not be small---HDF5 definitely offers a lot of infrastructure around parallel writes/reads.). On my system, (base) wfarr@C02WW0Q2HV2V O3aPISN % du -hs ~/.julia/packages/HDF5/cDXRT
600K /Users/wfarr/.julia/packages/HDF5/cDXRT
(base) wfarr@C02WW0Q2HV2V O3aPISN % du -hs ~/.julia/packages/HDF5_jll/BGk9m
44K /Users/wfarr/.julia/packages/HDF5_jll/BGk9m which doesn't seem like much. |
But it would be pretty easy for me to spin this off into a separate package if you would prefer that. (I think I'm still not totally in the "Julia frame of mind" with respect to lots of very lightweight packages.) |
I am particularly afraid of the binary dependency which is quite challenging to build apparently: JuliaPackaging/Yggdrasil#567 |
But a separate package would be really cool (it's the perfect example of a glue package, hopefully at some point this can be handled better: JuliaLang/Pkg.jl#1285). |
Oof---had no idea HDF5 was so hard to build. OK. I'll try to spin it out into a separate package that depends on MCMCChains and HDF5. Will let you all know if/when it's ready, so you can mention it in the docs / link it from here. |
Great, I am looking forward to it! |
Me too, I'm very excited. |
OK, here it is (I'll re-open this issue when it's accepted in the general package repository); for now it's only installable from git: |
MCMCChainsStorage is accepted into the general package repository. Could we add something like
As the final sentence of the first paragraph here? |
Sure, I opened a PR at #304 for this. |
Excellent---I see that the change in #304 has been merged into the docs, so I'm going to close this issue, too (again). Thanks so much! |
Currently the only way to store a chain is to use serialization (which is not an archival solution, as deserialization is dependent on machine details and Julia version, or else requires external translation libraries) or to translate it to a different data format entirely (e.g. DataFrame or w/e, which entails a loss of information about internal sampling parameters, or else awkward conversions, etc). I have written the following code for serializing chains to HDF5 that I use all the time; it's quite straightforward about the way it stores the parameters and internal parameters of the chain, and can handle arbitrary sections, etc. The chain can be stored as the root element in an HDF5 file, or inside any (isolated---can't put multiple chains in a single group) group of a larger HDF5 file.
Do people think code like this, or similar, could make it into the
MCMCChains.jl
library?The text was updated successfully, but these errors were encountered: