Test some benchmark configurations against fixtures #256

turion · 2023-02-08T13:58:56Z

@ohad suggested to test inference algorithms for the samples they produce in #245 (comment). I think this is a good idea independently of the issue discussed there. So I implemented a few fixture tests for the three algorithms and the three models implemented in the benchmarks.

turion · 2023-02-08T16:13:27Z

Are all inference algorithms covered by such a test now, and with sufficiently complex models?

reubenharry · 2023-02-08T17:03:35Z

To make sure I understand (I'm probably confused): is the idea to ensure that the code always produces the same samples given the same random seed? If so, is this a property we always want? For example, what if we change the rng? Presumably the correctness of monad-bayes does not hinge on using the Mersenne twister rng (and indeed, Dominic and I made monad-bayes parametric on the choice of rng) but would this cause the tests to fail?

On this front, note that to be really safe, we might also want to check that the version of monad-bayes from before Spring 2022 (which we can be reasonably confident was in-sync with the proven correct implementation of the paper) agrees on these tests with the current version. I think I avoided making semantically impactful changes, but it's always subtle.

turion · 2023-02-09T12:07:43Z

To make sure I understand (I'm probably confused): is the idea to ensure that the code always produces the same samples given the same random seed? If so, is this a property we always want?

Yes, for changes that are not supposed to change the semantics.

For example, what if we change the rng?

Then we should isolate this change in a single commit and update the fixtures there.

Presumably the correctness of monad-bayes does not hinge on using the Mersenne twister rng (and indeed, Dominic and I made monad-bayes parametric on the choice of rng) but would this cause the tests to fail?

Yes it might. As would changing the random seed hardcoded in sampleIOFixed. But that's ok, running the tests updates the fixtures.

On this front, note that to be really safe, we might also want to check that the version of monad-bayes from before Spring 2022 (which we can be reasonably confident was in-sync with the proven correct implementation of the paper) agrees on these tests with the current version. I think I avoided making semantically impactful changes, but it's always subtle.

Ok, I can try that if it's possible to backport the tests to there.

turion · 2023-02-09T12:11:17Z

The tests are failing on darwin apparently because the precision of the floating point numbers used there is ever so slightly different. That's really annoying. Some options:

Don't fixture test on darwin
Find out what is different, and use a type with system-independent precision. Might need to solve Generalize from Double #191 first.

reubenharry · 2023-02-09T14:31:13Z

My feeling would be that for now it's worth just ignoring Darwin and merging, especially since 191 risks changing the fixtures.

See tweag#256, there are precision issues on macs.

turion · 2023-03-10T15:16:31Z

Ok, I've tried to disable tests on MacOS, let's see whether that works.

test/fixtures/HMM10-MH.txt

See tweag#256, there are precision issues on macs.

turion · 2023-05-11T08:10:51Z

My feeling would be that for now it's worth just ignoring Darwin and merging,

I'll interpret this as approval ;)

test/TestSSMFixtures.hs

turion · 2023-05-11T15:25:34Z

@reubenharry can you review?

See tweag#256, there are precision issues on macs.

turion · 2023-07-17T09:29:33Z

@reubenharry @idontgetoutmuch can you review this PR?

reubenharry

Again, seems good.

turion requested a review from reubenharry February 8, 2023 13:59

turion force-pushed the dev_fixture_test_benchmarks branch 5 times, most recently from 67b7808 to 8b81129 Compare February 8, 2023 16:12

turion self-assigned this Feb 8, 2023

turion force-pushed the dev_fixture_test_benchmarks branch from 8b81129 to 7853ad8 Compare February 8, 2023 16:26

turion mentioned this pull request Feb 8, 2023

Free list transformer to replace deprecated ListT #253

Open

2 tasks

turion force-pushed the dev_fixture_test_benchmarks branch from 7853ad8 to 667d794 Compare March 10, 2023 13:50

turion added a commit to turion/monad-bayes that referenced this pull request Mar 10, 2023

Don't test on MacOS

76d6e9d

See tweag#256, there are precision issues on macs.

turion added a commit to turion/monad-bayes that referenced this pull request Mar 10, 2023

Don't test on MacOS

778c738

See tweag#256, there are precision issues on macs.

turion force-pushed the dev_fixture_test_benchmarks branch from 76d6e9d to 778c738 Compare March 10, 2023 14:50

turion added a commit to turion/monad-bayes that referenced this pull request Mar 10, 2023

Don't test on MacOS

3415071

See tweag#256, there are precision issues on macs.

turion force-pushed the dev_fixture_test_benchmarks branch from 778c738 to 3415071 Compare March 10, 2023 15:15

turion commented Mar 10, 2023

View reviewed changes

test/fixtures/HMM10-MH.txt Show resolved Hide resolved

turion added a commit to turion/monad-bayes that referenced this pull request May 2, 2023

Don't test on MacOS

70666e9

See tweag#256, there are precision issues on macs.

turion mentioned this pull request May 10, 2023

Add effective sample size to Population #268

Open

1 task

turion added a commit to turion/monad-bayes that referenced this pull request May 10, 2023

Don't test on MacOS

87c7098

See tweag#256, there are precision issues on macs.

turion force-pushed the dev_fixture_test_benchmarks branch from 3415071 to 87c7098 Compare May 10, 2023 10:20

turion added a commit to turion/monad-bayes that referenced this pull request May 10, 2023

Don't test on MacOS

13986e7

See tweag#256, there are precision issues on macs.

turion force-pushed the dev_fixture_test_benchmarks branch from 87c7098 to 13986e7 Compare May 10, 2023 11:19

turion added a commit to turion/monad-bayes that referenced this pull request May 10, 2023

Don't test on MacOS

564fa95

See tweag#256, there are precision issues on macs.

turion force-pushed the dev_fixture_test_benchmarks branch from 13986e7 to f2844de Compare May 10, 2023 12:54

turion added a commit to turion/monad-bayes that referenced this pull request May 10, 2023

Don't test on MacOS

26fb44a

See tweag#256, there are precision issues on macs.

turion force-pushed the dev_fixture_test_benchmarks branch from fc32136 to 26fb44a Compare May 10, 2023 14:17

turion added a commit to turion/monad-bayes that referenced this pull request May 11, 2023

Don't test on MacOS

b7b46b6

See tweag#256, there are precision issues on macs.

turion force-pushed the dev_fixture_test_benchmarks branch from 26fb44a to b7b46b6 Compare May 11, 2023 08:09

turion enabled auto-merge May 11, 2023 08:47

turion requested a review from idontgetoutmuch May 11, 2023 08:48

turion commented May 11, 2023

View reviewed changes

test/TestSSMFixtures.hs Outdated Show resolved Hide resolved

turion added a commit to turion/monad-bayes that referenced this pull request May 11, 2023

Don't test on MacOS

575cbcc

See tweag#256, there are precision issues on macs.

turion force-pushed the dev_fixture_test_benchmarks branch 2 times, most recently from 577647c to 9fee117 Compare May 11, 2023 15:40

turion added a commit to turion/monad-bayes that referenced this pull request May 17, 2023

Don't test on MacOS

8709901

See tweag#256, there are precision issues on macs.

turion force-pushed the dev_fixture_test_benchmarks branch from 9fee117 to e2a0a50 Compare May 17, 2023 08:29

Manuel Bärenz and others added 5 commits July 17, 2023 11:28

Test some benchmark configurations against fixtures

619bdbf

Fixture test SSM model for SMC, RMSMC, PMMH and SMC2

b5d7456

Don't test on MacOS

e056678

See tweag#256, there are precision issues on macs.

Refactor SSM benchmark & fixture test

1e0a42e

Clean up SSM benchmark

3307f9b

turion force-pushed the dev_fixture_test_benchmarks branch from e2a0a50 to 3307f9b Compare July 17, 2023 09:29

reubenharry approved these changes Jul 17, 2023

View reviewed changes

turion merged commit f2df3a9 into tweag:master Jul 17, 2023

turion deleted the dev_fixture_test_benchmarks branch July 20, 2023 09:48

turion mentioned this pull request Nov 6, 2023

Add fixtures to datadir #326

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test some benchmark configurations against fixtures #256

Test some benchmark configurations against fixtures #256

turion commented Feb 8, 2023

turion commented Feb 8, 2023

reubenharry commented Feb 8, 2023

turion commented Feb 9, 2023

turion commented Feb 9, 2023 •

edited

Loading

reubenharry commented Feb 9, 2023

turion commented Mar 10, 2023

turion commented May 11, 2023

turion commented May 11, 2023

turion commented Jul 17, 2023

reubenharry left a comment

Test some benchmark configurations against fixtures #256

Test some benchmark configurations against fixtures #256

Conversation

turion commented Feb 8, 2023

turion commented Feb 8, 2023

reubenharry commented Feb 8, 2023

turion commented Feb 9, 2023

turion commented Feb 9, 2023 • edited Loading

reubenharry commented Feb 9, 2023

turion commented Mar 10, 2023

turion commented May 11, 2023

turion commented May 11, 2023

turion commented Jul 17, 2023

reubenharry left a comment

Choose a reason for hiding this comment

turion commented Feb 9, 2023 •

edited

Loading