Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Checkpointer cannot serialize functions to disk with JLD. #141

Closed
ali-ramadhan opened this issue Mar 19, 2019 · 7 comments
Closed

Checkpointer cannot serialize functions to disk with JLD. #141

ali-ramadhan opened this issue Mar 19, 2019 · 7 comments
Labels
bug 🐞 Even a perfect program still has bugs help wanted 🦮 plz halp (guide dog provided)
Milestone

Comments

@ali-ramadhan
Copy link
Member

ali-ramadhan commented Mar 19, 2019

I cannot get the checkpointing test running in PR #140 as JLD is not able to serialize the model to disk with forcing functions. We can go back to forcing arrays but we I think that's a bad idea as we should avoid increasing GPU memory usage.

I believe that JLD2.jl might be able to serialize functions to disk but it's not actively maintained anymore and their README says "If your tolerance for data loss is low, JLD may be a better choice at this time."

If we can fix this and figure out how to serialize functions to disk, then we may also be able to serialize the FFTW and CuFFT plans to disk (although we might still want to reconstruct them as in case the model is restored on a different computer with a different architecture).

Stacktrace:

Deserializing model from disk: test_model_checkpoint_5.jld
error parsing type string Oceananigans.Forcing{Oceananigans.#zero_func,Oceananigans.#zero_func,Oceananigans.#zero_func,Oceananigans.#zero_func,Oceananigans.#zero_func}
Checkpointing: Error During Test at D:\Home\Git\Oceananigans.jl\test\runtests.jl:246
  Got exception outside of a @test
  syntax: incomplete: premature end of input
  Stacktrace:
   [1] eval at .\boot.jl:328 [inlined]
   [2] eval at C:\Users\Ali\.julia\packages\JLD\1BoSz\src\JLD.jl:3 [inlined]
   [3] _julia_type(::String) at C:\Users\Ali\.julia\packages\JLD\1BoSz\src\JLD.jl:983
   [4] julia_type(::String) at C:\Users\Ali\.julia\packages\JLD\1BoSz\src\JLD.jl:30
   [5] jldatatype(::JLD.JldFile, ::HDF5.HDF5Datatype) at C:\Users\Ali\.julia\packages\JLD\1BoSz\src\jld_types.jl:701
   [6] read(::JLD.JldDataset) at C:\Users\Ali\.julia\packages\JLD\1BoSz\src\JLD.jl:370
   [7] read_ref(::JLD.JldFile, ::HDF5.HDF5ReferenceObj) at C:\Users\Ali\.julia\packages\JLD\1BoSz\src\JLD.jl:502
   [8] jlconvert(::Type{Model}, ::JLD.JldFile, ::Ptr{UInt8}) at C:\Users\Ali\.julia\packages\JLD\1BoSz\src\jld_types.jl:387
   [9] read_scalar(::JLD.JldDataset, ::HDF5.HDF5Datatype, ::Type) at C:\Users\Ali\.julia\packages\JLD\1BoSz\src\JLD.jl:398
   [10] read(::JLD.JldDataset) at C:\Users\Ali\.julia\packages\JLD\1BoSz\src\JLD.jl:370
   [11] read(::JLD.JldFile, ::String) at C:\Users\Ali\.julia\packages\JLD\1BoSz\src\JLD.jl:346
   [12] restore_from_checkpoint(::String) at D:\Home\Git\Oceananigans.jl\src\output_writers.jl:77
   [13] run_basic_checkpointer_tests() at D:\Home\Git\Oceananigans.jl\test\test_output_writers.jl:34
   [14] top-level scope at D:\Home\Git\Oceananigans.jl\test\runtests.jl:247
   [15] top-level scope at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.1\Test\src\Test.jl:1083
   [16] top-level scope at D:\Home\Git\Oceananigans.jl\test\runtests.jl:247
   [17] top-level scope at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.1\Test\src\Test.jl:1083
   [18] top-level scope at D:\Home\Git\Oceananigans.jl\test\runtests.jl:244
   [19] top-level scope at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.1\Test\src\Test.jl:1083
   [20] top-level scope at D:\Home\Git\Oceananigans.jl\test\runtests.jl:243
   [21] include at .\boot.jl:326 [inlined]
   [22] include_relative(::Module, ::String) at .\loading.jl:1038
   [23] include(::Module, ::String) at .\sysimg.jl:29
   [24] include(::String) at .\client.jl:403
   [25] top-level scope at none:0
   [26] eval(::Module, ::Any) at .\boot.jl:328
   [27] exec_options(::Base.JLOptions) at .\client.jl:243
   [28] _start() at .\client.jl:436
@ali-ramadhan ali-ramadhan added bug 🐞 Even a perfect program still has bugs help wanted 🦮 plz halp (guide dog provided) labels Mar 19, 2019
@ali-ramadhan ali-ramadhan changed the title Checkpointer cannot serialize forcing functions (or functions in general). Checkpointer cannot serialize functions to disk with JLD. Mar 19, 2019
@ali-ramadhan
Copy link
Member Author

ali-ramadhan commented Mar 19, 2019

If we cannot serialize functions (or structures of functions) to disk at all then we might have to force users to reinsert all functions when restoring from a checkpoint. We can do the FFT plans automatically but we have no idea what the e.g. forcing functions might be.

But this is ugly and it'll be easy to make a mistake so I'd rather be able to serialize functions to disk.

ali-ramadhan added a commit that referenced this issue Mar 19, 2019
Helps avoid #141 when forcing functions aren't used.
Copy link
Member Author

Looks like Python STL has marshal (https://docs.python.org/3/library/marshal.html) and there's a third-party extension of pickle called dill (https://pypi.org/project/dill/) both of which seem able to serialize functions. But maybe they just serialize bytecode?
View in Slack

Copy link
Member Author

To follow up on the serialization of functions: it looks as though this isn't really possible: while there are certain hacks involving eval, these look to be rather unreliable.
View in Slack

@ali-ramadhan ali-ramadhan added this to the v1.0 milestone Apr 3, 2019
@ali-ramadhan
Copy link
Member Author

Closing as it seems that serializing functions to disk in Julia is impossible (well, probably just difficult/unreliable).

@glwagner
Copy link
Member

I agree that this issue should have been closed... but just to throw a wrench into things:

https://github.com/MikeInnes/BSON.jl

@ali-ramadhan
Copy link
Member Author

Ah nice. Probably worth trying if we stick with serializing the full Model to disk. Hard to tell when it might break though.

@glwagner
Copy link
Member

Also of note are

JuliaIO/BSON.jl#37

JuliaLang/julia#32028

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐞 Even a perfect program still has bugs help wanted 🦮 plz halp (guide dog provided)
Projects
None yet
Development

No branches or pull requests

2 participants