-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test some benchmark configurations against fixtures #256
Conversation
67b7808
to
8b81129
Compare
Are all inference algorithms covered by such a test now, and with sufficiently complex models? |
8b81129
to
7853ad8
Compare
To make sure I understand (I'm probably confused): is the idea to ensure that the code always produces the same samples given the same random seed? If so, is this a property we always want? For example, what if we change the rng? Presumably the correctness of monad-bayes does not hinge on using the Mersenne twister rng (and indeed, Dominic and I made monad-bayes parametric on the choice of rng) but would this cause the tests to fail? On this front, note that to be really safe, we might also want to check that the version of monad-bayes from before Spring 2022 (which we can be reasonably confident was in-sync with the proven correct implementation of the paper) agrees on these tests with the current version. I think I avoided making semantically impactful changes, but it's always subtle. |
Yes, for changes that are not supposed to change the semantics.
Then we should isolate this change in a single commit and update the fixtures there.
Yes it might. As would changing the random seed hardcoded in
Ok, I can try that if it's possible to backport the tests to there. |
The tests are failing on darwin apparently because the precision of the floating point numbers used there is ever so slightly different. That's really annoying. Some options:
|
My feeling would be that for now it's worth just ignoring Darwin and merging, especially since 191 risks changing the fixtures. |
7853ad8
to
667d794
Compare
See tweag#256, there are precision issues on macs.
See tweag#256, there are precision issues on macs.
76d6e9d
to
778c738
Compare
See tweag#256, there are precision issues on macs.
778c738
to
3415071
Compare
Ok, I've tried to disable tests on MacOS, let's see whether that works. |
See tweag#256, there are precision issues on macs.
See tweag#256, there are precision issues on macs.
3415071
to
87c7098
Compare
See tweag#256, there are precision issues on macs.
87c7098
to
13986e7
Compare
See tweag#256, there are precision issues on macs.
13986e7
to
f2844de
Compare
See tweag#256, there are precision issues on macs.
fc32136
to
26fb44a
Compare
See tweag#256, there are precision issues on macs.
26fb44a
to
b7b46b6
Compare
I'll interpret this as approval ;) |
@reubenharry can you review? |
See tweag#256, there are precision issues on macs.
577647c
to
9fee117
Compare
See tweag#256, there are precision issues on macs.
9fee117
to
e2a0a50
Compare
e2a0a50
to
3307f9b
Compare
@reubenharry @idontgetoutmuch can you review this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, seems good.
@ohad suggested to test inference algorithms for the samples they produce in #245 (comment). I think this is a good idea independently of the issue discussed there. So I implemented a few fixture tests for the three algorithms and the three models implemented in the benchmarks.