Implementation of DCC inference algorithm #1715

treigerm · 2024-01-08T08:38:16Z

This is an initial bare bones implementation of the Divide, Conquer, Combine inference algorithm for programs with stochastic support as discussed in #1697. I have also included a simple example to show how to use the interface. This is only a bery basic implementation of the algorithm to keep the size of the PR reasonable. In its current form the algorithm assumes that the branching inside the program is done based on the outcomes of discrete sampling sites which are annotated with infer={"branching": True}.

The algorithm then proceeds as follows:

Discover different branches, a.k.a. straight-line programs (SLPs), by sampling from the prior.
Run inference on each discovered branch separately.
Combine the inference results by weighting each branch proportional to its marginal likelihood estimate.

This is just a draft PR for now to see whether the general approach for the implementation makes sense. If the approach is sensible then I will go and add a more detailed example, some tests and documentation. So the main questions that need to be answered at the moment are:

Does the general interface for the algorithm seem sensible?
Did I place the new implementation in the correct location (I located it in the contrib folder for now)?
Am I using the Numpyro primitives correctly?

fehiepsi · 2024-01-12T16:28:54Z

Thanks for the contribution, @treigerm! The draft looks great.

Re location: yeah, it makes sense to put the algorithm in contrib. We can move it to infer in the future when the api is solid.
Re primitives: yes, the usage looks correct to me.

treigerm · 2024-01-17T10:03:55Z

I have now added tests and documentation. Please let me know if you think anything is missing!

After this PR is merged my plan would be to add the SDVI algorithm and after that is added I can write a tutorial about how to use these two inference algorithms and their respective trade-offs.

fehiepsi · 2024-01-26T15:35:25Z

numpyro/contrib/stochastic_support/dcc.py

+        """
+        Weight each SLP proportional to its estimated normalization constant.
+        The normalization constants are estimated using importance sampling with
+        the proposal centered on the MCMC samples.


Is this standard Gaussian a good practical choice for the proposal? Looking at the paper, it seems that the authors used a metropolis-within-gibbs sampler.

Thank you for spending the time reviewing the PR!

Note that the Gaussian is centered around the MCMC samples (more precisely, each MCMC sample gives rise to a single proposal distribution). As long as the MCMC chain(s) are well-mixed this generally leads to good proposals. This is also what the paper describes (and also what the author's implementation does which I have received upon request).

Looking at the paper, it seems that the authors used a metropolis-within-gibbs sampler.

Actually, the metropoylis-within-gibbs sampler is only used for the local inference tasks. For many models it isn't a very efficient inference algorithm because it only updates one variable at a time. Because the implementation here assumes that the branching is only done based on the outcomes of discrete sampling statements, it can use more efficient algorithms for local inference (like HMC or NUTS).

I see, thanks. How about using sample variance instead of unit variance?

I have just updated the PR to make the scale in the proposal a parameter that can be set by the user. I agree that unit variance is probably not always desirable. However, I'm not sure whether sample variance is desirable either. The main idea behind the algorithm for estimating the normalization constant (which in more detailed is described in another paper Layered Adaptive Importance Sampling) is that the proposal on top of each sample leads to fairly local proposals. If there are multiple modes in the posterior then using the sample variance could result in lots of proposed samples in the low density regions between the modes.

I'd still be open to adding the option to use the sample variance to the implementation but this is currently complicated by the way the AutoNormal guide is implemented. You would want to compute the sample variance for each individual variable in the program but as far as I can tell there is no way to set variable specific variances in the AutoNormal guide (it's possible to set variable specific means though). So this might be a feature that would be reasonable to add at a later time?

Sorry for the late response! Thank you for the insights - I don't have a strong opinion on whether it's helpful to expose init_scale in AutoNormal. There is another way to substitute sample variance, like substitute(guide, data={"auto_foo_scale": ...}) but we need to be careful at the domain (needs to be unconstrained) of such a foo variable.

Ah I seem, I guess this would require knowledge about all the variable names in the program though (but that could be extracted automatically)? For now I would lean towards leaving the implementation as is to keep it simple, if that is okay.

fehiepsi

LGTM pending linting issues. Thanks for the great contribution, @treigerm!

treigerm · 2024-02-16T20:30:53Z

Thanks, @fehiepsi ! I'm currently unsure why the tests are failing, it seems to be an import error that it is not able to find the new numpyro.contrib.stochastic_support.dcc module. For me, locally the test are passing if I run XLA_FLAGS="--xla_force_host_platform_device_count=2" pytest -vs test/contrib/stochastic_support/test_dcc.py, so not quite sure where the issue is coming from.

fehiepsi · 2024-02-16T20:41:36Z

How about adding __init__ file and exposing DCC etc. there? Then you can import

from numpyro.contrib.stochastic_support import DCC

treigerm · 2024-02-19T20:41:35Z

Ah yes of course! I have added an __init__ file now. I also added the XLA_FLAGS="--xla_force_host_platform_device_count=2" flag to the contrib module tests to make the parallel chain sampling method pass. Tests are passing locally for me.

fehiepsi · 2024-02-20T14:14:29Z

.github/workflows/ci.yml

@@ -102,7 +102,7 @@ jobs:
      run: |
        pytest -vs --durations=20 test/infer/test_mcmc.py
        pytest -vs --durations=20 test/infer --ignore=test/infer/test_mcmc.py
-        pytest -vs --durations=20 test/contrib
+        XLA_FLAGS="--xla_force_host_platform_device_count=2" pytest -vs --durations=20 test/contrib


Could you ignore dcc test here and move dcc to the test chain below instead? thanks

* Initial bare bones implementation of DCC * Add tests and documentation * Make scale in Normal proposal configurable * Run linter * Add __init__.py file and allow parallel inference in tests * Move DCC tests to 'test chains' group

Initial bare bones implementation of DCC

2ceb88d

treigerm marked this pull request as draft January 8, 2024 08:38

Add tests and documentation

0450340

treigerm marked this pull request as ready for review January 17, 2024 10:01

treigerm changed the title ~~[WIP] Implementation of DCC inference algorithm~~ Implementation of DCC inference algorithm Jan 17, 2024

fehiepsi reviewed Jan 26, 2024

View reviewed changes

Make scale in Normal proposal configurable

d1a435b

fehiepsi approved these changes Feb 15, 2024

View reviewed changes

treigerm added 2 commits February 16, 2024 19:45

Merge branch 'master' into initial_dcc_implementation

9a57c4a

Run linter

e60215f

fehiepsi added the awaiting review label Feb 16, 2024

fehiepsi added this to the 0.14 milestone Feb 16, 2024

Add __init__.py file and allow parallel inference in tests

b09a7f7

fehiepsi approved these changes Feb 20, 2024

View reviewed changes

Move DCC tests to 'test chains' group

00068d0

fehiepsi merged commit c4ca3d8 into pyro-ppl:master Feb 22, 2024
4 checks passed

treigerm mentioned this pull request Mar 14, 2024

Add Initial SDVI Implementation #1758

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation of DCC inference algorithm #1715

Implementation of DCC inference algorithm #1715

treigerm commented Jan 8, 2024 •

edited

Loading

fehiepsi commented Jan 12, 2024

treigerm commented Jan 17, 2024

fehiepsi Jan 26, 2024

treigerm Jan 29, 2024 •

edited

Loading

fehiepsi Jan 30, 2024

treigerm Feb 1, 2024

fehiepsi Feb 15, 2024

treigerm Feb 16, 2024

fehiepsi left a comment

treigerm commented Feb 16, 2024

fehiepsi commented Feb 16, 2024

treigerm commented Feb 19, 2024

fehiepsi Feb 20, 2024

treigerm Feb 21, 2024

Implementation of DCC inference algorithm #1715

Implementation of DCC inference algorithm #1715

Conversation

treigerm commented Jan 8, 2024 • edited Loading

fehiepsi commented Jan 12, 2024

treigerm commented Jan 17, 2024

fehiepsi Jan 26, 2024

Choose a reason for hiding this comment

treigerm Jan 29, 2024 • edited Loading

Choose a reason for hiding this comment

fehiepsi Jan 30, 2024

Choose a reason for hiding this comment

treigerm Feb 1, 2024

Choose a reason for hiding this comment

fehiepsi Feb 15, 2024

Choose a reason for hiding this comment

treigerm Feb 16, 2024

Choose a reason for hiding this comment

fehiepsi left a comment

Choose a reason for hiding this comment

treigerm commented Feb 16, 2024

fehiepsi commented Feb 16, 2024

treigerm commented Feb 19, 2024

fehiepsi Feb 20, 2024

Choose a reason for hiding this comment

treigerm Feb 21, 2024

Choose a reason for hiding this comment

treigerm commented Jan 8, 2024 •

edited

Loading

treigerm Jan 29, 2024 •

edited

Loading