Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

For VarInfo, fix merge and allow push!!ing new Symbols #690

Merged
merged 3 commits into from
Oct 17, 2024

Conversation

mhauru
Copy link
Member

@mhauru mhauru commented Oct 14, 2024

This introduces two unrelated changes to VarInfo, that came up in the context of TuringLang/Turing.jl#2328:

  1. Fix how gidset is handled in merge
  2. Allow pushing new values to TypedVarInfo that introduce a new symbol.

The first I think should be uncontroversial. The second is saying that if we have a TypedVarInfo that for instance so far only has vi.metadata = (:x => some_metadata), you should still be able to do push!!(vi, @varname(y), val, dist, gidset), and it should introduce a new entry in the NamedTuple. This is different from how TypedVarInfo is usually created, and might in some cases create Metadata objects with quite narrow element types. However, something like this would be needed for cases where new variables may be appear between samples (see the aforementioned PR), and I don't really see a downside to allowing this.

@coveralls
Copy link

Pull Request Test Coverage Report for Build 11331660892

Details

  • 16 of 16 (100.0%) changed or added relevant lines in 1 file are covered.
  • 25 unchanged lines in 4 files lost coverage.
  • Overall coverage decreased (-2.0%) to 77.424%

Files with Coverage Reduction New Missed Lines %
src/model.jl 1 93.68%
src/varinfo.jl 6 80.24%
src/simple_varinfo.jl 6 85.57%
src/threadsafe.jl 12 55.17%
Totals Coverage Status
Change from base Build 11327761860: -2.0%
Covered Lines: 3258
Relevant Lines: 4208

💛 - Coveralls

Copy link

codecov bot commented Oct 14, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 79.12%. Comparing base (1d10278) to head (78f12c5).
Report is 2 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #690      +/-   ##
==========================================
+ Coverage   79.02%   79.12%   +0.10%     
==========================================
  Files          30       30              
  Lines        4200     4211      +11     
==========================================
+ Hits         3319     3332      +13     
+ Misses        881      879       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@coveralls
Copy link

coveralls commented Oct 14, 2024

Pull Request Test Coverage Report for Build 11347620918

Details

  • 16 of 16 (100.0%) changed or added relevant lines in 1 file are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.1%) to 79.542%

Totals Coverage Status
Change from base Build 11327761860: 0.1%
Covered Lines: 3332
Relevant Lines: 4189

💛 - Coveralls

@sunxd3
Copy link
Collaborator

sunxd3 commented Oct 15, 2024

This looks very sensible to me, but I am not familiar with gids, couldn't really give a good review.

Maybe @torfjelde can also have a quick look?

@mhauru
Copy link
Member Author

mhauru commented Oct 15, 2024

I was thinking of this after work, and should have clarified that this still fails:

        untyped_vi = VarInfo()
        untyped_vi = push!!(untyped_vi, @varname(x), 1.0, Normal(), Selector())
        typed_vi = TypedVarInfo(untyped_vi)
        typed_vi = push!!(typed_vi, @varname(y[1]), 2.0, Normal(), Selector())
        typed_vi = push!!(typed_vi, @varname(y[2]), 2.0, MvNormal(0, 1), Selector())

with

ERROR: MethodError: Cannot `convert` an object of type MvNormal{Int64, PDMats.ScalMat{Int64}, FillArrays.Zeros{Int64, 1, Tuple{Base.OneTo{Int64}}}} to an object of type Normal{Float64}

This is what I meant by having very narrow eltypes. I have some ideas (and code in place in VarNamedVector) for how to support the above too, but I think that's a separate issue that needs a different solution, since we do want to keep the eltypes of a TypedVarInfo concrete whenever possible. Also, this is much less important to support.

Copy link
Member

@penelopeysm penelopeysm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like there's a lot of code repetition which could potentially be cleaned up, too.

Would something like this be better? Also unsure about conventions around f! and f!!, return types and mutation. There's probably also a better name for the inner function.

function merge_metadata(metadata_left::Metadata, metadata_right::Metadata)
    # ... (not touching these lines)

    # Range offset.
    offset = 0

    # Initialize metadata struct
    merged_metadata = Metadata(idcs, vns, ranges, vals, dists, gids, orders, flags)

    function update_metadata_from!(dest, source, vn, offset)
        vals_right = getindex_internal(source, vn)
        append!(dest.vals, vals_right)
        r = (offset + 1):(offset + length(vals_right))
        push!(dest.ranges, r)
        new_offset = r[end]
        dist_right = getdist(source, vn)
        push!(dest.dists, dist_right)
        gid = source.gids[getidx(source, vn)]
        push!(dest.gids, gid)
        push!(dest.orders, getorder(source, vn))
        for k in keys(flags)
            push!(dest.flags[k], is_flagged(source, vn, k))
        end
        return new_offset
    end

    for (idx, vn) in enumerate(vns_both)
        # `idcs`
        idcs[vn] = idx
        # `vns`
        push!(vns, vn)
        if vn in vns_left && vn in vns_right
            # `vals`: only valid if they're the same length.
            vals_left = getindex_internal(metadata_left, vn)
            vals_right = getindex_internal(metadata_right, vn)
            @assert length(vals_left) == length(vals_right)
            offset = update_metadata_from!(merged_metadata, metadata_right, vn, offset)
        elseif vn in vns_left
            offset = update_metadata_from!(merged_metadata, metadata_left, vn, offset)
        else
            offset = update_metadata_from!(merged_metadata, metadata_right, vn, offset)
        end
    end

    return merged_metadata
end

I ran the test/varinfo.jl tests on this and they passed, although I don't know what other downstream breakage this could cause.

@penelopeysm
Copy link
Member

Definitely a matter for another PR, but I also feel a bit uncomfortable having "del" and "trans" everywhere: I would feel better if flags was defined as

struct MetadataFlags
    del::BitVector
    trans::BitVector
end

@torfjelde
Copy link
Member

This is different from how TypedVarInfo is usually created, and might in some cases create Metadata objects with quite narrow element types. However, something like this would be needed for cases where new variables may be appear between samples (see the aforementioned PR), and I don't really see a downside to allowing this.

I guess I'm somewhat surprised this is needed to achieve feature parity with the current way we're doing things 😕 As in, the issue we've encountered in the mentioned PR doesn't require push to allow adding new named tuple entries (when using TypedVarInfo).

@torfjelde
Copy link
Member

Seems like there's a lot of code repetition which could potentially be cleaned up, too.

Very much like this @penelopeysm ! But maybe easiest to just do a seperate PR to keep things + commits nice and simple.

Copy link
Member

@torfjelde torfjelde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Don't have any objections except for a question regarding the push!! implementation. Don't have a strong opinion as to what's correct here, so approving.

Maybe worth bumping patch version?

end

sym = getsym(vn)
if vi isa TypedVarInfo && ~haskey(vi.metadata, sym)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess one downside of this ordeal is that it introduces a discrepancy between push!! and push!, the latter which cannot handle new named tuple entries.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm at peace with discrepancies in how ! and !! work, if the !! allows you to do something that can not be done with mutation, but is in line with the semantics of the ! version as well. Said differently, if the only reason ! doesn't do something is because it can not (since it must rely on mutation), then I think it's fine for !! to handle it.

@mhauru
Copy link
Member Author

mhauru commented Oct 15, 2024

Seems like there's a lot of code repetition which could potentially be cleaned up, too.

Would something like this be better? [...]

Yeah, I think that would be an improvement. Also agree with @torfjelde that might be worth a separate PR. Note also that all this stuff is on its way out once the Gibbs sampler stuff gets sorted (the whole Metadata type is due for removal). Not opposed to doing the refactor though, especially if its low effort or you already have it worked out.

Definitely a matter for another PR, but I also feel a bit uncomfortable having "del" and "trans" everywhere: I would feel better if flags was defined as

Agree on this too, though maybe not worth fixing since Metadata is on its way out anyway?

I guess I'm somewhat surprised this is needed to achieve feature parity with the current way we're doing things 😕 As in, the issue we've encountered in the mentioned PR doesn't require push to allow adding new named tuple entries (when using TypedVarInfo).

I'm not sure if it will be used by the final solution to the Gibbs PR, but I would used it in my current attempt at a fix, where it would allow a sub-VarInfo for one sampler to capture new variables that are introduced but that are not under the domain of that sampler. The sub-VarInfo would then pass the new variables back to the global VarInfo, and at the next iteration the correct sampler would pick them up.

I would usually avoid introducing new features to DynamicPPL when I'm still unsure if the upstream Turing.jl code will actually need them, but to me this sounds like generally an improvement to the semantics of VarInfo, even if it ends up going unused in Gibbs.

Maybe worth bumping patch version?

Good point, done.

@torfjelde
Copy link
Member

I'm not sure if it will be used by the final solution to the Gibbs PR, but I would used it in my current attempt at a fix, where it would allow a sub-VarInfo for one sampler to capture new variables that are introduced but that are not under the domain of that sampler. The sub-VarInfo would then pass the new variables back to the global VarInfo, and at the next iteration the correct sampler would pick them up.

I guess we spoke briefly about this yesterday, but I was more thinking that this wasn't explicitly used currently since the tests would be failiing in that case (as the current Gibbs sampler doesn't support this). But yeah, as we concluded yesterday, seems like a strict improvement to add this 👍

@mhauru mhauru enabled auto-merge October 17, 2024 08:06
@mhauru mhauru added this pull request to the merge queue Oct 17, 2024
Merged via the queue into master with commit 27ba772 Oct 17, 2024
14 checks passed
@mhauru mhauru deleted the mhauru/gid-merge-fix branch October 17, 2024 08:38
mhauru added a commit that referenced this pull request Oct 17, 2024
* Fix treatment of gid in merge(::Metadata)

* Allowing pushing new symbols to TypedVarInfo

* Bump patch version to 0.30.1
@mhauru mhauru mentioned this pull request Oct 17, 2024
penelopeysm pushed a commit that referenced this pull request Oct 17, 2024
* Allow empty subsets of VarInfos (#692)

* Allow empty subsets of VarInfos

* Run JuliaFormatter

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* For VarInfo, fix merge and allow push!!ing new Symbols (#690)

* Fix treatment of gid in merge(::Metadata)

* Allowing pushing new symbols to TypedVarInfo

* Bump patch version to 0.30.1

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants