Skip to content
This repository has been archived by the owner on Dec 7, 2018. It is now read-only.

Rollup: New sampler API -> support for emcee #68

Merged
merged 47 commits into from
Aug 22, 2018

Conversation

cdcapano
Copy link
Collaborator

@cdcapano cdcapano commented Jul 30, 2018

This introduces the new sampler API, and includes all changes such that emcee works with it. The other samplers are not supported yet, so this will break using them. A summary of the changes:

  • BaseSampler has been turned into an abstract base class, with only methods required of all samplers defined. All future samplers (including any nested samplers) should inherit from this.
  • There is now a BaseMCMC class that adds methods unique to all MCMC samplers. It fulfills some of the methods required by BaseSampler, but also introduces some abstract methods that all MCMC samplers need to implement.
  • Support for calculating ACFs/ACLs has been moved to a stand-alone class.
  • The individual samplers are constructed by inheriting from all of these classes, then add whatever methods are unique to the sampler and any required abstract methods not yet fulfilled. For example, EmceeEnsembleSampler inherits from MCMCAutocorrSupport, BaseMCMC, BaseSampler (in that order).
  • Samplers are now initialized using a [sampler] section in the config file, instead of by options on the command line.
  • Previously, some functions expected FieldArrays, other dictionaries, and yet others structure arrays. Now, all arrays of samples are passed between sampler methods as dictionaries of arrays. Only when samples are read from a file using read_samples are the results wrapped with a FieldArray.
  • The IO has been changed quite a bit. Like the samplers, there is now an abstract base class called BaseInferenceFile that implements common methods for reading/writing for all samplers. Things specific to MCMCs have been moved to MCMCIO. Finally, there is an IO class for each sampler that inherits from these base classes. For example, for emcee there is EmceeFile which inherits from MCMCIO, BaseInferenceFile.
  • Instead of having read/write functions be methods of the sampler class, the sampler class now has a (required abstract property) io, which is the IO class that that sampler uses. For example, EmceeEnsembleSampler.io = EmceeFile. This is used for checkpointing, and is the type of file that gets written by gwin. In gwin.io there is a convenience function, loadfile, that will check what kind of file a file is, then load it with the appropriate IO class.
  • Eventually (not in this PR) there will be a PosteriorFile which inherits from BaseInferenceFile that will be used for storing 1D arrays of posterior samples. All IO classes will have a method that allows them to write their contents to such a file.
  • The burn-in module has also been changed. Now there is a MCMCBurnInTests class in which each of the burn-in tests are instance methods. It is initialized from the config file. An instance of it is saved to the sampler's burn_in attribute. This is called at each checkpoint by the BaseMCMC to test for burn-in.
  • Multiple burn-in methods may be combined with boolean logic. Basically, in the [sampler-burn_in] section you must provide a burn-in-test = <stuff> string. The <stuff> gets read by the class. For example, you could have just burn-in-test = max_posterior in which case only the max_posterior test will be applied. Or you could have burn-in-test = max_posterior & nacl in which the sampler is not considered burned-in until both the max_posterior test and the nacl test are satisfied. If you had max_posterior | nacl, then the sampler is considered burned-in if either test is satisfied. More complicated chains are supported, such as (max_posterior | nacl) & ks_test, etc.
  • Metadata from the burn-in tests are saved to sampler_info/burn_in.
  • The BaseInferenceFile has thin_start, thin_interval, and thin_end attributes. These define what samples should be loaded by default from the file. If they are not specified, then they default to 0, 1, None, which loads all samples. This separates what samples should be loaded from the burn-in iteration and acl, which are unique to MCMC samplers. (Granted, Nested samplers will probably never use these attributes). This will allow the posterior file to still have metadata about when the sampler burned in.

@cdcapano cdcapano requested a review from cmbiwer as a code owner July 30, 2018 17:16
@cdcapano cdcapano removed the request for review from cmbiwer July 30, 2018 17:16
@pep8speaks
Copy link

pep8speaks commented Jul 30, 2018

Hello @cdcapano! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on August 04, 2018 at 19:11 Hours UTC

@cdcapano cdcapano added pr:backwards-incompatible Backwards-incompatible change (major version bump) gwin.sampler Change to gwin.sampler sub-package work in progress labels Jul 30, 2018
@cdcapano cdcapano added this to the 0.1.0 milestone Jul 31, 2018
@cdcapano cdcapano requested a review from vivienr July 31, 2018 09:02
@cdcapano
Copy link
Collaborator Author

cdcapano commented Aug 4, 2018

I think this is ready to be merged onto the new_sampler_api project branch. The diff is quite large, so you're probably not going to be able to check the code. I guess just check if the direction things are going in sound good?

I tested this with emcee and on the normal2D analytic model, gw150914, and an injection. I haven't updated the plotting code yet, so I haven't checked the distributions. I've just checked that the executable runs ok and produces an output file that looks ok.

For the analytic test with running for a set number of iterations, the ini file looks like:

[model]
name = test_normal

[sampler]
name = emcee
nwalkers = 5000
niterations = 12
checkpoint-interval = 4

[variable_params]
x =
y =

[prior-x]
name = uniform
min-x = -10
max-x = 10

[prior-y]
name = uniform
min-y = -10
max-y = 10

To run until a desired number of effective samples is achieved, the ini file looks like:

[sampler]
name = emcee
nwalkers = 5000
effective-nsamples = 5000
checkpoint-interval = 16

[sampler-burn_in]
burn-in-test = max_posterior & posterior_step
ndim = 2

(prior and model sections are the same). Both of these run with:

gwin --verbose --config-files normal2d.ini --output-file normal2d.hdf --nprocesses 4

If you approve, I'll squash and merge this onto the project branch, then fill out a new PR to track the project branch to master (apparently you can't start a PR until there is at least one difference).

After that, I'll work on updating the plotting codes, and then add back support for emcee_pt and kombine. @vivienr If you have time, it'd be good if you could update the mcmc sampler, to see if the changes make sense, or if you think there should be other additional changes.

@cdcapano
Copy link
Collaborator Author

@vivienr poke

@cdcapano
Copy link
Collaborator Author

@vivienr pokity poke

@vivienr
Copy link
Contributor

vivienr commented Aug 15, 2018

the travis failure is unrelated to this MR, correct?

@cdcapano
Copy link
Collaborator Author

No, the travis failures are due to this MR, but that's because I haven't updated the unit tests yet. That's why this is a pull request on to the project branch, not master. I figure we can fix up all the unit tests on a future PR to the project branch (which would be needed before the project could be merged on to master). The main point of this PR is to provide a checkpoint, so subsequent changes to the project aren't a single massive diff.

@cdcapano
Copy link
Collaborator Author

@vivienr poke

Copy link
Contributor

@vivienr vivienr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good !

@cdcapano cdcapano merged commit 8e1eff6 into gwastro:new_sampler_api Aug 22, 2018
@cdcapano cdcapano mentioned this pull request Aug 22, 2018
7 tasks
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
gwin.sampler Change to gwin.sampler sub-package pr:backwards-incompatible Backwards-incompatible change (major version bump)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants