Description
This issue contains my thoughts and comments from working with VarInfo
a lot more during the course of TuringLang/Turing.jl#793. My experience is that VarInfo
is somewhat easy to use once you get over the very steep learning curve, but that learning curve can be a powerful deterrent to development from outside folks.
I make some strong statements here to encourage discussion. I'm not trying to bash anyone's superb contributions (particularly @mohamed82008's great work on VarInfo
), I just want to see if I can provoke some high-level thinking about VarInfo
without thinking too much about what it is right now. Try to keep the context of this discussion about what VarInfo
could be and not what it is now.
I don't want to see VarInfo
anywhere
I think we see VarInfo
too much. If I'm a non-Turing person and I'm building some kind of inference tool, I don't want to learn about our arcane system for managing variables. I just want to manipulate parameters, draw from priors, etc. Many of our functions should probably never have a do_something(vi, spl)
signature -- we should find ways to handle everything on the back end without anyone worrying about how to use VarInfo
. A better way would be to have the VarInfo
stored somewhere in a shared state or tied to a specific model.
I can imagine a case where VarInfo
is stored in some environment variable or state variable or something, and the sampler or model might just have a location to go look at where the VarInfo
is. Then you could just call logp(model)
and by default it would calculate the log probability using whatever the current state is. If you really wanted to, you could pass in a VarInfo
and work with a specific one if you're doing a lot of numerical work and such, but I think for almost all cases VarInfo
could sit far away and never be thought about.
An alternative fix would be to just have a very small handful of variables that are dead-simple to use an understand. See TuringLang/Turing.jl#886 for a better discussion.
update!(vi, new_vals)
should update the parameters.parameters(vi)
should get the current parameterization in aNamedTuple
orDict
format.logp(vi, model)
should give you a log probability, no questions asked and no hassle.priors(vi)
should give you aNamedTuple
orDict
of prior distributions to draw from.- If I want to change my priors or something, we should have a way to do that too.
priors!(vi, new_priors)
should set my priors to whatever the new distributions are.
VarInfo
is ultimately my biggest issue with Turing's internals. I understand why we need it and it is a masterful work of engineering, but from a usability side it is a disaster, particularly if our goal is to have a high degree of ease-of-use for inference designers.
If you asked me a question on how to do something with a VarInfo
right now, chances are very good it would take me more than an hour to think about what it is that VarInfo
is, what it does, and where in the source code I might find an answer. Add another half hour because whatever it was I though VarInfo
was is not true.
Where should VarInfo
live?
I'm not sure where the VarInfo
should go. I don't think it should be a free-floating entity like it has been in Turing's past, and I'm also not convinced that it's attachment to the sampler state as in TuringLang/Turing.jl#793 is correct either.
Is VarInfo
more a function of the model, or of the sampler? If it's more specific to the model, shouldn't we store it there? I don't really know. If it's in the model, then it's quite nice to use for non-MCMC methods, since nobody would have to add VarInfo
to their method -- they can just call the model's version. Ultimately the VarInfo
is constructed from the model, and the samplers just reference it. Right now I'm learning towards moving VarInfo
over to the model, but I'm open to discussion on that.
A downside to putting it on the model side is that it becomes harder to build new modeling tools on top of Turing, but easier to build inference methods. I think it's a trade-off that's worth considering.
Removing the Sampler.info
field
Build a VarInfoFlags
struct that handles all the various switches and gizmos and whatever that VarInfo
uses. Currently, all the Sampler
s have a dictionary called info
in them which will no longer be used on the inference side after TuringLang/Turing.jl#793. It'd be nice if we could remove the field entirely and separate the VarInfo
flags from the Sampler
either by storing the flags in the VarInfo
itself, or at least removing the dictionary by just storing the flags with the sampler.
This is really more mechanical than goal-oriented, and it's just something I or someone else might need to apply some elbow grease to.