Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky test in rem/tests/test_inverse_confusion_matrix.py #2431

Closed
purva-thakre opened this issue Jul 1, 2024 · 7 comments · Fixed by #2464
Closed

Flaky test in rem/tests/test_inverse_confusion_matrix.py #2431

purva-thakre opened this issue Jul 1, 2024 · 7 comments · Fixed by #2464
Assignees
Labels
bug Something isn't working rem Readout Error Mitigation technique

Comments

@purva-thakre
Copy link
Collaborator

purva-thakre commented Jul 1, 2024

Noticed this failure twice in a CI run for Windows. If I rerun the test, it passes.

https://github.com/unitaryfund/mitiq/actions/runs/9748119578/job/26902405324?pr=2347#step:6:4326

image

@purva-thakre purva-thakre added bug Something isn't working rem Readout Error Mitigation technique labels Jul 1, 2024
@purva-thakre
Copy link
Collaborator Author

@purva-thakre
Copy link
Collaborator Author

@purva-thakre purva-thakre self-assigned this Jul 3, 2024
@purva-thakre purva-thakre added this to the 0.39.0 milestone Jul 3, 2024
@purva-thakre purva-thakre removed their assignment Jul 10, 2024
@purva-thakre
Copy link
Collaborator Author

I am unassigning this for myself because I cannot replicate this locally. My laptop does not have enough memory for local docker containers.

@natestemen
Copy link
Member

@natestemen
Copy link
Member

Took a look at this in the mitiq coding call today and made some progress.

Problem

bitstrings = sample_probability_vector(np.array([0.5, 0.5]), 1000)
assert isclose(sum([b[0] for b in bitstrings]), 500, rel_tol=0.1)

This test (in essence)

  1. generates a random sequence of 1000 0s and 1s
  2. sums that sequence (as ints)
  3. Asserts that the resulting sum is between 450 and 550

Some stats tell us the sum should fall in this range 99.94% of the time, meaning that on average we should see a value outside this range every 1,667 runs.

Solutions

We considered the following solutions

  1. Removing the test. Since this is non-deterministic behavior and the test is mostly testing to np.random.choice, it is not totally needed.
  2. Increasing the valid range of values (say it must fall between 400 and 600).
  3. Setting a seed to make the sum deterministic.

Solution 1 was determined to be unnecessary since we still want this code's functionality to be tested with cases other than the rudimentary ones tested in this same test. Solution 2 was determined to not be a complete fix, as it is still possible for the test to fail, albeit much less likely. Solution 3 was deemed the best solution.

Diagnosis

We ran the failing test on my machine (macos) on repeat 2000 times using the approach outlined here. We found it failed roughly at that cadence (1/2000 runs) which aligns with the math in the section above.

@natestemen
Copy link
Member

And just so no one else clashes here, I have a PR incoming.

@nathanshammah
Copy link
Member

Great investigation!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working rem Readout Error Mitigation technique
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants