-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Attempt at implementation of VarNamedVector
(Metadata
alternative)
#555
Conversation
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Pull Request Test Coverage Report for Build 11218828080Details
💛 - Coveralls |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #555 +/- ##
==========================================
+ Coverage 77.69% 79.02% +1.33%
==========================================
Files 29 30 +1
Lines 3591 4201 +610
==========================================
+ Hits 2790 3320 +530
- Misses 801 881 +80 ☔ View full report in Codecov by Sentry. |
Replied to the comments @mhauru , only very minor changes suggested, aaaand then I think we're good to maybe hit that big green button 👀 |
@torfjelde, I think I addressed all of them. Note that the procedure for merging should be
|
I unfortuantely can't approve this because I'm still technically the "owner" of the PR, but this is looking good to me 🎉 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the default metadata backend to Metadata
again. VarNamedVector
will be in master
, but not, by default, used by anything.
I think this is all ready to go. If anyone wants to put in any last minute comments, now is the time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm glad to see this finally gone through!
Thanks @torfjelde for doing the bulk of the work for this, and everyone, but especially @willtebbutt and @yebai, for great feedback! |
This is an attempt at a implementation similar to what was discussed in #528 (comment).
The idea is to introduce a simpler alternative to
Metadata
, which contains a fair bit of not-so-useful-anymore functionality and isn't particularly accessible.With the current state of this PR, it's possible to evaluate a model with a
VarNameVector
:But it's not possible to sample.
What are we trying to achieve here?
The idea behind
VarNameVector
is to enable both an efficient and convenient representation of avarinfo
, i.e. an efficient representation of a realization from a@model
with additional information necessary for both a sampler developer and end-user to extract what they need.In that vein, it seems like we need something that sort of acts like a
Vector
, and sort of acts like anOrderedDict
, but is neither (at least, I don't think it fits in either).Vector
because we need:Vector
representation).for
loop over variablesx[i]
in a model, we want the values forx
to be stored in a contiguous chunk of memory).OrderedDict
because we need:VarName
, e.g.trace[vn]
should result in the realization in it's "constrained" space (i.e. in the space that the distribution of which it came from commonly works in).Moreover, the above should be achieved while ensuring that we allow both mutable and immutable implementations, in addition to maintaining type-stability whenever possible and falling back to type unstable when not.
Current implementation:
VarInfo
withMetadata
The current implementation of
VarInfo
withMetadata
as it's underlying storage achieves some of these properties through a few different means:getindex(varinfo, varname)
, is achieved by effectively grouping thevarnames
contained in aVarInfo
by their symbol (which is part of the type), and putting each group in a separateMetadata
in aNamedTuple
. Basically, instead of representing pairs like(@varname(x) => 1.0, @varname(y) => 2.0)
in a flattened way, we represent them as(x = (@varname(x) => 1.0,), y = (@varname(y) => 2.0,))
, such that in the scenario where they different in types, e.g.y isa Int
, we can dispatch on thesym
inVarName{sym}
to determine which entry of theNamedTuple
to extract.x
is continuous whiley
is discrete. It also provides very straight-forward guidelines on how to speed up models, i.e. "group variables into a single higher-variate random variable, e.g.x[1] ~ Normal(); x[2] ~ Normal()
intox ~ fill(Normal(), 2)
".varinfo
to get type-information, and then use this for the subsequent runs to obtain type-stability (and thus much improved performance).VarInfo
is somewhat lacking, e.g. it does not support changing support even if the types stay the same,varinfo[vn]
, but is currently a) limited, and b) very confusing.VarInfo
implements a somewhat confusing subset of bothAbstractDict
andAbstractVector
interfaces, but neither of which are "complete" in any sense.VarInfo
andMetadata
, even thoughVarInfo
is effectively just a wrapper aroundMetadata
/NamedTuple
containingMetadata
, which adds further confusion.Metadata
comes with a lot of additional fields and information that we have slowly been moving away from using, as the resulting codebase ends up being overly complicated and difficult to work with. This also significantly reduces the utility ofVarInfo
for the end-user.Replacing
Metadata
withVarNameVector
VarNameVector
is an attempt at replacingMetadata
while preserving some of the good ideas fromVarInfo
withMetadata
(and in the process, establish a bare-minimum of functionality required to implement a underlying storage forvarinfo
):VarNameVector
(should) implement (almost) all operations forAbstractDict
+ all non-linear-algebra operations for aAbstractVector{<:Real}
, to the point where a user will find it easy to useVarNameVector
(and thus aVarInfo
wrapping aVarNameVector
).AbstractDict
interface is implemented as if keys are of typeVarName
and values are of type corresponding tovarinfo[varname]
.AbstractVector
interface is implemented wrt. underlying "raw" storage, e.g. if we're working with a unconstrained representation, then the vector will be in unconstrained space (unlikevarinfo[varname]
which will be in constrained space).VarNameVector
uses a contiguous chunk of memory to store the values in a flattened manner, both in the type stable and unstable scenarios, which overall should lead to better performance in both scenarios.VarNameVector
reduces the number of fields + replaces some now-less-useful fields such asdist
with fields requiring less information such astransformss
(holding the transformations used to map from "unconstrained" to "constrained" representation).Updates
2023-11-13
Okay, so I've made some new additions:
VarNameVector
.push!
andupdate!
aVarNameVector
where:push!
means what it usually means, but the varname / key must be unique.update!
works with either existing keys or new keys.push!
&update!
The idea behind the implementation is as follows:
varname
,push!
(andupdate!
, which is equivalent in this scenario), is straight-forward: we just callpush!
andsetindex!
on all the necessary underlying containers, e.g.ranges
.varname
, things become a bit more complicated for a few fields, namelyranges
andvalues
(everthing else is just call tosetindex!
):values
already allocated tovarname
.values
tovarname
and then shift all thevalues
occuring after this part; doing this on everyupdate!
will be expensive! Instead, we just move the new value to the end ofvalues
and mark the old location as "inactive". This leads to much more efficientupdate!
, and then we can just perform a "sweep" to re-contiguify the underlying storage every now and then.To make things a bit more concrete, consider the following example:
Then we update the entry for
@varname(x)
to a differently sized value:Notice that the order is still preserved, even though the underlying
ranges
is no longer ordered.But, if we inspect the underlying values, this contains now-inactive entries:
But in the scenario where we care about performance, we can easily fix this:
Type-stable sampling for dynamic suppport
The result of this is that we can even perform type-stable sampling for models with changing support:
The really nice thing here is that, unlike with
TypedVarInfo
, we don't need to mess around with boolean flags to indicate whether something should be resampled, etc. Instead we just callsimilar
on theVarNameVector
andpush!
onto this.We can also make this work nicely with
TypedVarInfo
:)2024-01-26T15:00:38
Okay, so now all tests should be passing and we should have, effectively, feature parity with
VarInfo
usingMetadata
.But this required quite a lot of code, and there are a few annoyances that are worth pointing out / discussing:
Transformations are (still) a pain
My original idea was that we really did not want to attach the entire
Distribution
from which the random variable came from to the metadata for a couple of reasons:VarInfo
fromModelA
inModelB
, even though they only differ by the RHS of a single~
statement).Distribution
to determine the transformation from the vectorized / flattened representation to the original one we want and linking.In fact, the above is only partially true:
getindex
inside amodel
evaluation actually usesgetindex(vi, vn, dist)
and uses the "tilde-local"dist
for the reshaping and linking, not thedist
present invi
.Hence it seemed to me that one immediate improvement of
Metadata
is to remove thedist
completely, and instead just store the transformation from the vectorized / flattened representation to whatever desired form we want (be that linked or invlinked).And this is what we do currently with
VarNameVector
.This then "resolves" (1) since now all we need is a map
f
that takes a vector and outputs something we can work with.However, (2) is still an issue: we still need the "tilde-local"
dist
to determine the necessary transformation for the particular realization we're interested in, while simultaenously wantinggetindex(vi, vn)
to also function as intended outside of amodel
, i.e. I should still be able to doSooo I don't think we can get around having to keep transformations in the metadata object that might not actually get used within a
model
evaluation if we want allow the indexing behavior above 😕2024-01-31: Update on transformations
See #575