Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test some benchmark configurations against fixtures #256

Merged
merged 5 commits into from
Jul 17, 2023

Conversation

turion
Copy link
Collaborator

@turion turion commented Feb 8, 2023

@ohad suggested to test inference algorithms for the samples they produce in #245 (comment). I think this is a good idea independently of the issue discussed there. So I implemented a few fixture tests for the three algorithms and the three models implemented in the benchmarks.

@turion turion requested a review from reubenharry February 8, 2023 13:59
@turion turion force-pushed the dev_fixture_test_benchmarks branch 5 times, most recently from 67b7808 to 8b81129 Compare February 8, 2023 16:12
@turion turion self-assigned this Feb 8, 2023
@turion
Copy link
Collaborator Author

turion commented Feb 8, 2023

Are all inference algorithms covered by such a test now, and with sufficiently complex models?

@turion turion force-pushed the dev_fixture_test_benchmarks branch from 8b81129 to 7853ad8 Compare February 8, 2023 16:26
@reubenharry
Copy link
Contributor

To make sure I understand (I'm probably confused): is the idea to ensure that the code always produces the same samples given the same random seed? If so, is this a property we always want? For example, what if we change the rng? Presumably the correctness of monad-bayes does not hinge on using the Mersenne twister rng (and indeed, Dominic and I made monad-bayes parametric on the choice of rng) but would this cause the tests to fail?

On this front, note that to be really safe, we might also want to check that the version of monad-bayes from before Spring 2022 (which we can be reasonably confident was in-sync with the proven correct implementation of the paper) agrees on these tests with the current version. I think I avoided making semantically impactful changes, but it's always subtle.

@turion
Copy link
Collaborator Author

turion commented Feb 9, 2023

To make sure I understand (I'm probably confused): is the idea to ensure that the code always produces the same samples given the same random seed? If so, is this a property we always want?

Yes, for changes that are not supposed to change the semantics.

For example, what if we change the rng?

Then we should isolate this change in a single commit and update the fixtures there.

Presumably the correctness of monad-bayes does not hinge on using the Mersenne twister rng (and indeed, Dominic and I made monad-bayes parametric on the choice of rng) but would this cause the tests to fail?

Yes it might. As would changing the random seed hardcoded in sampleIOFixed. But that's ok, running the tests updates the fixtures.

On this front, note that to be really safe, we might also want to check that the version of monad-bayes from before Spring 2022 (which we can be reasonably confident was in-sync with the proven correct implementation of the paper) agrees on these tests with the current version. I think I avoided making semantically impactful changes, but it's always subtle.

Ok, I can try that if it's possible to backport the tests to there.

@turion
Copy link
Collaborator Author

turion commented Feb 9, 2023

The tests are failing on darwin apparently because the precision of the floating point numbers used there is ever so slightly different. That's really annoying. Some options:

  • Don't fixture test on darwin
  • Find out what is different, and use a type with system-independent precision. Might need to solve Generalize from Double #191 first.

@reubenharry
Copy link
Contributor

My feeling would be that for now it's worth just ignoring Darwin and merging, especially since 191 risks changing the fixtures.

@turion turion force-pushed the dev_fixture_test_benchmarks branch from 7853ad8 to 667d794 Compare March 10, 2023 13:50
turion added a commit to turion/monad-bayes that referenced this pull request Mar 10, 2023
See tweag#256,
there are precision issues on macs.
turion added a commit to turion/monad-bayes that referenced this pull request Mar 10, 2023
See tweag#256,
there are precision issues on macs.
@turion turion force-pushed the dev_fixture_test_benchmarks branch from 76d6e9d to 778c738 Compare March 10, 2023 14:50
turion added a commit to turion/monad-bayes that referenced this pull request Mar 10, 2023
See tweag#256,
there are precision issues on macs.
@turion turion force-pushed the dev_fixture_test_benchmarks branch from 778c738 to 3415071 Compare March 10, 2023 15:15
@turion
Copy link
Collaborator Author

turion commented Mar 10, 2023

Ok, I've tried to disable tests on MacOS, let's see whether that works.

turion added a commit to turion/monad-bayes that referenced this pull request May 2, 2023
See tweag#256,
there are precision issues on macs.
turion added a commit to turion/monad-bayes that referenced this pull request May 10, 2023
See tweag#256,
there are precision issues on macs.
@turion turion force-pushed the dev_fixture_test_benchmarks branch from 3415071 to 87c7098 Compare May 10, 2023 10:20
turion added a commit to turion/monad-bayes that referenced this pull request May 10, 2023
See tweag#256,
there are precision issues on macs.
@turion turion force-pushed the dev_fixture_test_benchmarks branch from 87c7098 to 13986e7 Compare May 10, 2023 11:19
turion added a commit to turion/monad-bayes that referenced this pull request May 10, 2023
See tweag#256,
there are precision issues on macs.
@turion turion force-pushed the dev_fixture_test_benchmarks branch from 13986e7 to f2844de Compare May 10, 2023 12:54
turion added a commit to turion/monad-bayes that referenced this pull request May 10, 2023
See tweag#256,
there are precision issues on macs.
@turion turion force-pushed the dev_fixture_test_benchmarks branch from fc32136 to 26fb44a Compare May 10, 2023 14:17
turion added a commit to turion/monad-bayes that referenced this pull request May 11, 2023
See tweag#256,
there are precision issues on macs.
@turion turion force-pushed the dev_fixture_test_benchmarks branch from 26fb44a to b7b46b6 Compare May 11, 2023 08:09
@turion
Copy link
Collaborator Author

turion commented May 11, 2023

My feeling would be that for now it's worth just ignoring Darwin and merging,

I'll interpret this as approval ;)

@turion turion enabled auto-merge May 11, 2023 08:47
@turion turion requested a review from idontgetoutmuch May 11, 2023 08:48
test/TestSSMFixtures.hs Outdated Show resolved Hide resolved
@turion
Copy link
Collaborator Author

turion commented May 11, 2023

@reubenharry can you review?

turion added a commit to turion/monad-bayes that referenced this pull request May 11, 2023
See tweag#256,
there are precision issues on macs.
@turion turion force-pushed the dev_fixture_test_benchmarks branch 2 times, most recently from 577647c to 9fee117 Compare May 11, 2023 15:40
turion added a commit to turion/monad-bayes that referenced this pull request May 17, 2023
See tweag#256,
there are precision issues on macs.
@turion turion force-pushed the dev_fixture_test_benchmarks branch from 9fee117 to e2a0a50 Compare May 17, 2023 08:29
@turion turion force-pushed the dev_fixture_test_benchmarks branch from e2a0a50 to 3307f9b Compare July 17, 2023 09:29
@turion
Copy link
Collaborator Author

turion commented Jul 17, 2023

@reubenharry @idontgetoutmuch can you review this PR?

Copy link
Contributor

@reubenharry reubenharry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, seems good.

@turion turion merged commit f2df3a9 into tweag:master Jul 17, 2023
@turion turion deleted the dev_fixture_test_benchmarks branch July 20, 2023 09:48
@turion turion mentioned this pull request Nov 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants