Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Making checkpoint a macro #20

Open
oxinabox opened this issue May 12, 2021 · 3 comments
Open

Making checkpoint a macro #20

oxinabox opened this issue May 12, 2021 · 3 comments

Comments

@oxinabox
Copy link
Member

We might likely to make checkpoint into a macro

Reasons for macro:

  • the main reason: we can use this to set things up such that if a checkpoint isn’t enabled then functions that it calls to store their values are not called, which could be expensive. @checkpoint("RegressionSummary", value=expensive_summary_function(foo)). (This is what the Base Logging macros do)
  • We can get rid of the need to register them in __init__ by making it, at parse time register it. (not 100% sure if this will work, since it is mutating a global variable at parse time, I think it does. If it doens’t then shouldn’t do this)
  • We can also automatically have also record the names of all the things it is saving, and then the user can query that with a function like checkpoint_info that would print a list. As a kind of documentation.
  • We can store the filename and line number (if we a really clever we can store the exact git commit and then we will be able to generate a link to that file and line, I had a proof of concept for this ages ago) so can lookup afterwards where it is from.
  • we can do like bases logging macros and have just writing a be the same as :a=>a (though we also get this if we changes to storing data in the kwarg position Change checkpoint to be checkpoint(name, tags...; data...) #16)

On the otherhand macros are harder to reason about. so the gains might not be worth it.
I think low priority

@rofinn
Copy link
Member

rofinn commented May 12, 2021

We can also automatically have also record the names of all the things it is saving

Can you give an example? How's this different from your last point?

if we a really clever we can store the exact git commit and then we will be able to generate a link to that file and line

Hmm, this feels like it's bordering on logging functionality. We could make the same argument for a Memento.jl macro which does the same thing.

we can do like bases logging macros and have just writing a be the same as :a=>a

Personally, this point doesn't seem worth it, but getting filename and line number might be.

On the otherhand macros are harder to reason about. so the gains might not be worth it. I think low priority

Yeah, we had a similar issue in Memento and the conclusion was that the limitations and maintenance overhead wasn't really worth it. That issue even had a performance argument. invenia/Memento.jl#15

@oxinabox
Copy link
Member Author

oxinabox commented May 13, 2021

We can also automatically have also record the names of all the things it is saving

Can you give an example? How's this different from your last point?

ah sorry, yeah i was unclear. I mean a record of the names of what it records.
For example, if we could get the following output

julia> checkpoint_info()
module       |  name         | stores
-------------|---------------|------------------------------------------------
Forecasters  | forecasts     | targets, nodes, distributions
NodeSelection| nodes         | selection_settings, all_nodes, selected_nodes
FooBar       | qux           | foobar, barfoo, quxbar, fooqux

Probably be able to filter by module.
Or maybe even by fuzzy match on the name of the things it stores?
Might also want to include filename/line number in such a table of info.

Hmm, this feels like it's bordering on logging functionality.

Yes.
I have long considered Checkpoints.jl to be a kind of logging.
And apparently researchers do also, as a section in our designed document they wrote was titled "Logging" and was actually just talking about Checkpoints.jl.

In someways you could replace Checkpoints.jl with something like TensorBoardLogger.
But i think that would be strictly worse.

The main difference from logging (other than spitting out binary artifacts) is that this is structured hierarchically, rather than sequentially.

@tpgillam
Copy link

tpgillam commented Oct 1, 2021

we can use this to set things up such that if a checkpoint isn’t enabled then functions that it calls to store their values are not called, which could be expensive

I ran into this just now, and happened across this issue. A (potentially less invasive) way we could approach this is to allow the "value" to be a zero-argument callable. So, to take @oxinabox 's example:

checkpoint(
    "RegressionSummary", 
    :value => () -> expensive_summary_function(foo),
)

We'd then need to add a special-case for values that are callables, and call them before writing.
This could break user code if we permitted serialising functions .. but not sure we do?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants