Rollup: New sampler API -> support for emcee #68

cdcapano · 2018-07-30T17:16:17Z

This introduces the new sampler API, and includes all changes such that emcee works with it. The other samplers are not supported yet, so this will break using them. A summary of the changes:

BaseSampler has been turned into an abstract base class, with only methods required of all samplers defined. All future samplers (including any nested samplers) should inherit from this.
There is now a BaseMCMC class that adds methods unique to all MCMC samplers. It fulfills some of the methods required by BaseSampler, but also introduces some abstract methods that all MCMC samplers need to implement.
Support for calculating ACFs/ACLs has been moved to a stand-alone class.
The individual samplers are constructed by inheriting from all of these classes, then add whatever methods are unique to the sampler and any required abstract methods not yet fulfilled. For example, EmceeEnsembleSampler inherits from MCMCAutocorrSupport, BaseMCMC, BaseSampler (in that order).
Samplers are now initialized using a [sampler] section in the config file, instead of by options on the command line.
Previously, some functions expected FieldArrays, other dictionaries, and yet others structure arrays. Now, all arrays of samples are passed between sampler methods as dictionaries of arrays. Only when samples are read from a file using read_samples are the results wrapped with a FieldArray.
The IO has been changed quite a bit. Like the samplers, there is now an abstract base class called BaseInferenceFile that implements common methods for reading/writing for all samplers. Things specific to MCMCs have been moved to MCMCIO. Finally, there is an IO class for each sampler that inherits from these base classes. For example, for emcee there is EmceeFile which inherits from MCMCIO, BaseInferenceFile.
Instead of having read/write functions be methods of the sampler class, the sampler class now has a (required abstract property) io, which is the IO class that that sampler uses. For example, EmceeEnsembleSampler.io = EmceeFile. This is used for checkpointing, and is the type of file that gets written by gwin. In gwin.io there is a convenience function, loadfile, that will check what kind of file a file is, then load it with the appropriate IO class.
Eventually (not in this PR) there will be a PosteriorFile which inherits from BaseInferenceFile that will be used for storing 1D arrays of posterior samples. All IO classes will have a method that allows them to write their contents to such a file.
The burn-in module has also been changed. Now there is a MCMCBurnInTests class in which each of the burn-in tests are instance methods. It is initialized from the config file. An instance of it is saved to the sampler's burn_in attribute. This is called at each checkpoint by the BaseMCMC to test for burn-in.
Multiple burn-in methods may be combined with boolean logic. Basically, in the [sampler-burn_in] section you must provide a burn-in-test = <stuff> string. The <stuff> gets read by the class. For example, you could have just burn-in-test = max_posterior in which case only the max_posterior test will be applied. Or you could have burn-in-test = max_posterior & nacl in which the sampler is not considered burned-in until both the max_posterior test and the nacl test are satisfied. If you had max_posterior | nacl, then the sampler is considered burned-in if either test is satisfied. More complicated chains are supported, such as (max_posterior | nacl) & ks_test, etc.
Metadata from the burn-in tests are saved to sampler_info/burn_in.
The BaseInferenceFile has thin_start, thin_interval, and thin_end attributes. These define what samples should be loaded by default from the file. If they are not specified, then they default to 0, 1, None, which loads all samples. This separates what samples should be loaded from the burn-in iteration and acl, which are unique to MCMC samplers. (Granted, Nested samplers will probably never use these attributes). This will allow the posterior file to still have metadata about when the sampler burned in.

pep8speaks · 2018-07-30T17:16:55Z

Hello @cdcapano! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on August 04, 2018 at 19:11 Hours UTC

…add run method to base_mcmc

cdcapano · 2018-08-04T19:36:00Z

I think this is ready to be merged onto the new_sampler_api project branch. The diff is quite large, so you're probably not going to be able to check the code. I guess just check if the direction things are going in sound good?

I tested this with emcee and on the normal2D analytic model, gw150914, and an injection. I haven't updated the plotting code yet, so I haven't checked the distributions. I've just checked that the executable runs ok and produces an output file that looks ok.

For the analytic test with running for a set number of iterations, the ini file looks like:

[model]
name = test_normal

[sampler]
name = emcee
nwalkers = 5000
niterations = 12
checkpoint-interval = 4

[variable_params]
x =
y =

[prior-x]
name = uniform
min-x = -10
max-x = 10

[prior-y]
name = uniform
min-y = -10
max-y = 10

To run until a desired number of effective samples is achieved, the ini file looks like:

[sampler]
name = emcee
nwalkers = 5000
effective-nsamples = 5000
checkpoint-interval = 16

[sampler-burn_in]
burn-in-test = max_posterior & posterior_step
ndim = 2

(prior and model sections are the same). Both of these run with:

gwin --verbose --config-files normal2d.ini --output-file normal2d.hdf --nprocesses 4

If you approve, I'll squash and merge this onto the project branch, then fill out a new PR to track the project branch to master (apparently you can't start a PR until there is at least one difference).

After that, I'll work on updating the plotting codes, and then add back support for emcee_pt and kombine. @vivienr If you have time, it'd be good if you could update the mcmc sampler, to see if the changes make sense, or if you think there should be other additional changes.

cdcapano · 2018-08-13T14:27:47Z

@vivienr poke

cdcapano · 2018-08-15T10:10:51Z

@vivienr pokity poke

vivienr · 2018-08-15T12:53:38Z

the travis failure is unrelated to this MR, correct?

cdcapano · 2018-08-15T12:56:31Z

No, the travis failures are due to this MR, but that's because I haven't updated the unit tests yet. That's why this is a pull request on to the project branch, not master. I figure we can fix up all the unit tests on a future PR to the project branch (which would be needed before the project could be merged on to master). The main point of this PR is to provide a checkpoint, so subsequent changes to the project aren't a single massive diff.

cdcapano · 2018-08-22T08:40:56Z

@vivienr poke

vivienr

looks good !

cdcapano requested a review from cmbiwer as a code owner July 30, 2018 17:16

cdcapano removed the request for review from cmbiwer July 30, 2018 17:16

cdcapano added pr:backwards-incompatible gwin.sampler work in progress labels Jul 30, 2018

cdcapano added this to the 0.1.0 milestone Jul 31, 2018

cdcapano mentioned this pull request Jul 31, 2018

Implement new sampler API #36

Closed

cdcapano requested a review from vivienr July 31, 2018 09:02

cdcapano assigned vivienr Jul 31, 2018

Collin Capano added 13 commits August 3, 2018 13:53

start changing the base sampler api

6eaa748

start InferenceFile -> BaseInferenceFile

d41964d

rename hdf.py base_hdf.py

cef9e8c

add parse_parameters function

6972102

add module for base mcmc io

7c7e615

make _read_samples_data the abstract method

214609a

added read_samples_data to base_mcmc

9e10e08

add emcee file handling

af6e7b9

replace read/write functions with io in BaseSampler

b089dca

add checkpoint requirement; rename samples raw_samples

137dc14

start updating emcee

be9b8de

move emcee_pt to it's own module

f2b04f3

add base_mcmc (needs work)

5f9c091

cdcapano force-pushed the new_sampler_api branch from 6038ba5 to e871582 Compare August 3, 2018 11:53

Collin Capano added 6 commits August 3, 2018 13:54

add write_metadata to models

3d75cab

move setting up checkpoint and run interval to sampler methods

f81edab

rearrange read/write functions; add checkpoint and finalize methods; …

2f9a2b2

…add run method to base_mcmc

fix whitespace

866f39a

add acl support

5b90d77

update executable

764c741

Collin Capano added 18 commits August 3, 2018 13:54

fix typos, whitespace in burn_in

eead8a8

fix whitespace, typos in base_hdf

e765c12

rename EnsembleMCMCIO to MCMCIO; fix whitespace

ab40ad0

fix typo

ac6d514

fix whitespace

23366e3

write filetype to inference hdf files; provide a loadfile function

60d0e75

fix some import errors

704d417

remove sampler_class from io to avoid circular imports

adee9c3

fix bugs

36a5e75

fix bugs, move niterations/nsamples into config file

e871582

add halfchain, posterior_step, min_iterations back to burn_in

9046567

fix bugs to get acl working post burn in

7254c84

fix bugs in nacl burn in test

7f0952e

write more information to the logging messages

67e188c

fix bugs in min_iterations burn-in test

a73008b

fix more bugs

f6e1d5b

fix pep8 issues

a257aed

fix bugs for running with data

0a6f82d

cdcapano removed the work in progress label Aug 4, 2018

whitespace

370613e

vivienr approved these changes Aug 22, 2018

View reviewed changes

cdcapano merged commit 8e1eff6 into gwastro:new_sampler_api Aug 22, 2018

cdcapano mentioned this pull request Aug 22, 2018

New sampler API #70

Open

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rollup: New sampler API -> support for emcee #68

Rollup: New sampler API -> support for emcee #68

cdcapano commented Jul 30, 2018 •

edited

Loading

pep8speaks commented Jul 30, 2018 •

edited

Loading

cdcapano commented Aug 4, 2018

cdcapano commented Aug 13, 2018

cdcapano commented Aug 15, 2018

vivienr commented Aug 15, 2018

cdcapano commented Aug 15, 2018

cdcapano commented Aug 22, 2018

vivienr left a comment

Rollup: New sampler API -> support for emcee #68

Rollup: New sampler API -> support for emcee #68

Conversation

cdcapano commented Jul 30, 2018 • edited Loading

pep8speaks commented Jul 30, 2018 • edited Loading

Comment last updated on August 04, 2018 at 19:11 Hours UTC

cdcapano commented Aug 4, 2018

cdcapano commented Aug 13, 2018

cdcapano commented Aug 15, 2018

vivienr commented Aug 15, 2018

cdcapano commented Aug 15, 2018

cdcapano commented Aug 22, 2018

vivienr left a comment

Choose a reason for hiding this comment

cdcapano commented Jul 30, 2018 •

edited

Loading

pep8speaks commented Jul 30, 2018 •

edited

Loading