Proposal: Separate package for abstract GFI #306

femtomc · 2020-09-09T09:33:00Z

This is part of a larger conversation on the modularity of the Gen project. I’m not in a position to judge whether this is possible - but it seems like a good idea from my small experiences in implementation.

I’m currently working on porting my implementation of the GFI as interception execution contexts into a DSL version for use with Gen inference. The goal here is to see if some of the experiments I’m performing on optimizing dynamic programs can immediately benefit the Gen user base.

In this case, I would like to target as a dependency just the abstract GFI - I don’t necessarily require the whole infrastructure (e.g. the existing modeling DSLs, the inference library, etc).

As a first step, it seems (barring large scale organizational decisions which I am not aware of) plausible to create an abstract base package which DSL implementations can depend on.

The full idea would be to separate out all of the modeling and inference DSL components into separate packages. E.g. there would be the abstract base package, then an implementation package for the dynamic DSL, one for the static DSL, one for vector shaped combinators (which I’m counting as a DSL, although this is slightly not true), one for inference algorithms (this work is already partially ongoing with the particle filtering package).

Choice maps and traces obviously live with their respective DSLs.

This would immediately solve my dependency problem - because now my DSL can target the abstract base package, and (if modularity is taken to the extreme) can utilize native Gen inference DSLs and algs on a by-need basis.

Tagging @marcoct (although I know you’re busy) and @georgematheos.

alex-lew · 2020-09-09T10:10:17Z

Thanks for opening this issue, McCoy!

I am in favor of this direction. (A couple small points are that the choicemap and trace interfaces are part of the abstract GFI, and that we have a non-"vector-shaped" combinator, recurse.)

I can imagine creating a repo that contains the abstract GFI, as well as new documentation designed for implementors of the GFI. This might include a README.md that gave people a high-level overview of when they might want to implement the GFI, and what they could expect to gain --- the gist would be, if you are writing a package that exposes structured probabilistic or differentiable computations (neural networks, probabilistic programs, Bayes nets, stochastic differential equations, etc.), you might consider implementing the Generative Function Interface. If you do: (1) all Gen-compatible inference algorithms will just work with the models your package provides, including user-written inference algorithms with custom proposals or variational families; and (2) any Gen dynamic/static DSL programs can call your models as subroutines. (It could be worth unpacking what that means.)

There would still presumably be a "Gen.jl" package that bundles the core Gen packages together for an easy onboarding experience for people just looking to write a probabilistic program / get their feet wet.

femtomc · 2020-09-09T12:20:34Z

Absolutely - and thanks for keeping track of those bits I lost. The Selection interface is also a part of the GFI right?

So concretely, the full abstract GFI lives in:

choice_map.jl
address.jl
gen_fn_interface.jl

alex-lew · 2020-09-09T21:46:15Z

Also optimization.jl and arguably diff.jl :-)

femtomc · 2020-09-09T23:27:22Z

Ah yes! I actually had a different proposal for diff.jl - instead of inclusion in the base package, I'm hoping that we might develop a separate diff system which codifies a number of "compile time" propagation rules (for e.g. Mjolnir) as well as a robust runtime system (which is currently in diff.jl).

When writing your own GFI - do you need to extend any diff interfaces?

marcoct · 2020-09-10T16:50:58Z

@femtomc It's so awesome to hear you are porting your work to inter-operate with Gen!

I've had a few conversations with others about breaking up Gen into packages and using different packages for each modeling language. Thanks for bringing this up. I agree that by breaking Gen.jl up into a few packages we will take advantage of the potential for modular development of modeling languages offered by the GFI. Here's a plan I think is reasonable:

New packages

GenCore.jl - contains abstract types for generative functions and primitive distributions, documentation of GFI, default implementations of some GFI methods in in terms of other GFI methods.

(technically other modeling languages do not need to use Gen's primitive distributions, so the interfaces for distributions could conceivably be split up from the GFI, but I think keeping an interface for distributions together with the GFI makes sense to provision for potential future merge of these interfaces as proposed in #259, and because they are similar in their level of abstraction and purpose)

GenDML.jl - dynamic modeling language

GenSML.jl - I think this should probably contain for starters both the SML and the existing control-flow combinators (map, unfold, recurse), since they are currently mostly useful when used together. It would also include choice_at and call_at combinators.

(There is currently some code shared between DML and SML e.g. builtin_optimization.jl; either the shared code could be moved to another package, or the code could just be split and duplicated to allow for more flexible independent development -- the two languages are generally very different in their implementation, don't actually share that much code, and should be allowed to develop independently).

@ztangent @georgematheos thoughts re GenSML.jl and GenDML.jl?

GenCoreInference.jl - This could start as the existing inference library, and may be factored more moving forward. It would also contain the new trace transform DSL.

GenDistributions.jl - Primitive probability distributions used by GenDML.jl and GenSML.jl and GenCoreInference.jl.

(The @dist DSL could become its own package or combined with GenDistributions.jl. I favor keeping it as part of GenDistributions.jl, since it is general-purpose, useful for all projects, and extensible. @alex-lew opinions?)

Gen.jl - @alex-lew I agree that Gen.jl should combine GenCore.jl, GenDML.jl, GenSML.jl, GenCoreInference.jl, GenDistributions.jl. I think that moving forward it can potentially contain other components that are maintained, general-purpose, and don't introduce complex dependencies like Python.

Re optimization.jl

I think this probably belongs in GenCoreInference.jl

Re diff.jl:

I think this will require some discussion and possibly iteration to get right. The Gen GFI does not prescribe any specific types for argdiff and retdiff values (which I call change hints in my thesis). However, for generative functions to be able to make use of one another's change hints, they need to use common types for these values. So I think we need a set of common generic change hint types including UnknownChange, NoChange, and change hint types for common collection types (e.g. VectorDiff, SetDiff, DictDiff). The semantics of these change hint types is separate from differencing of Julia code (transforming Julia code to compute change hint alongside the new value). Generative functions that are not implemented using Julia-based modeling languages should be able to consume and produce these types of change hints. Also, @femtomc as you pointed out, even for modeling languages based on executable Julia code, there are multiple implementation strategies.

So, I think it makes sense to (i) maintain a set of common change hints types as part of GenCore.jl, and to (ii) factor out the code that computes values of these types for Julia code (dynamically or statically) into separate package(s).

@femtomc I think that developing a more robust and complete package for computing change hints from Julia code is a nicely separable piece that could leverage a lot of the Julia-specific features you've been using and can live in a separate package. If we can develop a more robust diff system for Julia code than currently exists in diff.jl then I think it would be great for GenSML.jl and GenDML.jl (and Gen.jl) to depend on it, and the logic in diff.jl for computing change hints could be moved there or replaced.

I know that @georgematheos has worked with diff values as well. Curious to hear his thoughts here was well.

Documentation

One thing I discussed with @ztangent was how to best structure the documentation for Gen as it gets more split up across multiple repositories. I like the way Gen.jl has a single docs site, because it is easy to jump around and see the big picture using the left-hand-side menu, and you can use @ref links. I think we might want to maintain a single docs site/build for the content that is currently inside Gen.jl even if this content is split up into separate packages, but I'm open to suggestions.

femtomc · 2020-09-11T16:30:14Z

@marcoct Yes, I'd like to share the diff work I've been doing. I've been working with Mjolnir.jl to perform a diff dataflow/type inference style analysis at generated function expansion time.

The primitives I've defined there are definitely relevant to the diff system - but they are defined on an abstract lattice (just Change and NoChange) because that's what is available at that stage.

Ideally, we could put all this stuff in a package and have runtime incremental computing and compile-time abstract interpretation for packages or languages that so desire it.

My only comment here is that this part of Julia is unstable right now. I don't know what's going to happen with Keno's compiler work - if that will enable an easier avenue to express these ideas. I know that generated functions are stable. But we might find that there's a much easier access point. Regardless, I've started this work now - hopefully it won't be too much of a lift to change as more stages become available.

femtomc · 2020-09-18T14:28:54Z

@marcoct Two updates on this front:

I've cut out a lot of "unnecessary" code on a branch of Jaynes - this slims it down to just implement a dynamic language. There is no native inference - it relies completely on Gen inference. Here: https://github.com/femtomc/Jaynes.jl/tree/microJaynes

The goal is to present the system as a generative function DSL without any extra baggage - just language, structures, and compile time optimizations.

On the "change hints" front (which, by the way, I recently learned is consistent with the literature! https://arxiv.org/abs/1503.07792) - I've started trying to construct a more general "incremental computing" package with some of the compiler optimizations I've been working on: https://github.com/femtomc/DiffRules.jl

My goal here would be to have a small re-usable piece which 1. can be extended by the user with new rules for incremental computing and 2. combines "compile time" abstract rules with runtime diff propagation. Additionally, I'd like to explore the implementation of https://arxiv.org/abs/1503.07792 here as part of what is offered for use. Advanced users can rely on the core infrastructure to do whatever they want, but you might just use this package for incremental computing in general functions (unrelated to PP).

yebai mentioned this issue Oct 18, 2020

Notes from the September probprog community call TuringLang/Turing.jl#1410

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: Separate package for abstract GFI #306

Proposal: Separate package for abstract GFI #306

femtomc commented Sep 9, 2020

alex-lew commented Sep 9, 2020

femtomc commented Sep 9, 2020

alex-lew commented Sep 9, 2020

femtomc commented Sep 9, 2020

marcoct commented Sep 10, 2020 •

edited

Loading

femtomc commented Sep 11, 2020 •

edited

Loading

femtomc commented Sep 18, 2020

Proposal: Separate package for abstract GFI #306

Proposal: Separate package for abstract GFI #306

Comments

femtomc commented Sep 9, 2020

alex-lew commented Sep 9, 2020

femtomc commented Sep 9, 2020

alex-lew commented Sep 9, 2020

femtomc commented Sep 9, 2020

marcoct commented Sep 10, 2020 • edited Loading

femtomc commented Sep 11, 2020 • edited Loading

femtomc commented Sep 18, 2020

marcoct commented Sep 10, 2020 •

edited

Loading

femtomc commented Sep 11, 2020 •

edited

Loading