Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random exception in python when using multiprocessing due to rdseed failure #26

Closed
alexisshaw opened this issue Apr 7, 2021 · 6 comments · Fixed by #29
Closed

Random exception in python when using multiprocessing due to rdseed failure #26

alexisshaw opened this issue Apr 7, 2021 · 6 comments · Fixed by #29

Comments

@alexisshaw
Copy link

When I am running a large number of samples simulations simultaneously using stim in a massively multi-process environment (python not supporting multi-threading, I use multi-process computation using python multithreading starmap), this happens semi-randomly when I run a multi-threaded simulation, see the stack trace below:

image001

This seems to be related to a multi-threading bug in libstdc++, so perhaps using the approach used by the google cloud c++ api would be a good approach. See:

googleapis/google-cloud-cpp-common#208

googleapis/google-cloud-cpp-common#272

seeing as the call to random_device is only done once per process in PYBIND_SHARED_RNG(), it is probably safe performance-wise to use the second fix, as the performance regression should be very minor.

This seems to be related to the shared rdseed buffer in certain Intel cpus.

@Strilanc
Copy link
Collaborator

Strilanc commented Apr 8, 2021

The fix you're suggesting is

#if defined(__linux) && defined(__GLIBCXX__) && __GLIBCXX__ >= 20200128
    std::random_device rd("/dev/urandom");
#else
  std::random_device rd;
#endif 

instead of just std::random_device rd;?

@Strilanc
Copy link
Collaborator

Strilanc commented Apr 8, 2021

Urgh, looking over some of the threads on this around the internet. what a gigantic mess. There's something deeply wrong when you have to workaround standard library functionality not doing what it's supposed to do.

@alexisshaw
Copy link
Author

Yeah, basically, for the immediate fix.

And don't I know it. This really should be handled by the standard library, why it isn't is super annoying.

@alexisshaw
Copy link
Author

I mean technically it is allowed to except in the standard, but the failure mode really should be more graceful here.

@Strilanc
Copy link
Collaborator

Strilanc commented Apr 12, 2021

@alexisshaw Could you try pip installing this file into the relevant environment, and confirming that it makes the problem go away?

stim-1.3.dev0.tar.gz

@Strilanc
Copy link
Collaborator

I'm going to move forward with publishing a release with this fix to avoid holding up other bug fixes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants