-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Intermittent failures in test_continuous_and_interrupted_simulations_equal
#1560
Comments
tests/test_simulation.py::test_continuous_and_interrupted_simulations_equal
test_continuous_and_interrupted_simulations_equal
Out of 12 runs of test locally on current For this specific failure, the difference is in the logs for key
with the precision there not sufficient to see difference. Looking at the full values we have that one is 0.1328333333333333 and the other 0.13283333333333333 so differing in only the last decimal place! Not clear if all of failures are due to such tiny differences as appears that not always occurring for same log key from above early failures. This level of difference could arise for example though from non-associativity of floating point addition / accumulating values in different orders. I'm not 100% sure that Pandas (or NumPy under the hood) doesn't for example use multithreading to parallelise reductions which might lead to such differences. Will continue investigating. |
I think you mean
|
Nope I did mean
I think we probably want to switch to I'm looping running the test with |
Yeah, it's difficult to imagine this is floating-point error. I think the float value is consistent ( |
Not sure if it's all that helpful but I seem to consistently (at least have over three tries now) get a failure with fixed hash seed of 44, that is running
I've had a look through schisto module and I can see once instance of a set which isn't sorted in TLOmodel/src/tlo/methods/schisto.py Lines 1004 to 1006 in 05d3679
but changing list to sorted there doesn't seem to resolve issue here as still get the same test failure with this change.
|
Checked the bytes representation of the two values and it does look like they are actually different, not just an artefact of rounding when printing:
|
Oh wait I possibly misinterpreted
as I was forgetting this is a value parsed from log file where it will have been printed out. So I guess I should check if the dataframe the log records are created from shows the same discrepancy before being written to file |
Okay have checked and the dataframes returned by TLOmodel/src/tlo/methods/healthburden.py Lines 522 to 530 in 05d3679
do differ in the same value ( |
Great hunting. I flipped my suspicion after you said setting hash seed makes the test fail consistently. |
Tracked this back a bit further to the difference specifically arising in
with values 0.03333333333333333/111111111111a13f (continuous) and 0.033333333333333326/101111111111a13f (suspend/resume). The logic however for how |
Numbers would be coming from schisto module's |
As noticed and discussed in #1527 and previously in #1507 (comment) the test for consistency of simulations run continuously and with suspend/resume in
tests/test_simulation.py::test_continuous_and_interrupted_simulations_equal
is intermittently and non-reproducibly failingFailing tests workflow run on #1507
tests.test_simulation.test_continuous_and_interrupted_simulations_equal[83563095832589325021]
Failing tests workflow run on #1527
tests.test_simulation.test_continuous_and_interrupted_simulations_equal[83563095832589325021]
Both failures appear to be specifically due to differences in the log output across the simulations but the specific log key the difference is being seen in appears to differ and unfortunately they don't appear to differ in the summary output shown so it's not clear if this can help narrow down where the difference might be arising.
The text was updated successfully, but these errors were encountered: