-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Testing in parallel can make GAP crash #3089
Comments
Source line
I think what happens here is that the installation of browse fails (maybe due to issues with the system-ncurses installation (I have seen that happen as well). If this does not work on the main process then this usually silently soft-fails and Oscar continues loading. |
Alright, installing |
So there are multiple issues here (all of which should be fixed)
Points 1 and 2 are quick to do. Point 3: the first part (installing |
@fingolfin
Concerning the occasional errors, I think the point is not the failure of the installation but the fact that several instances try to install the package in the same place, at the same time. Concerning the dependency on |
I think that we shouldn't terminate the session with an error in these cases due to the reasons already stated by you. But we could signal the situation to the user with e.g. a |
Maybe we should decide that child processes must never try to install gap packages? (Either the main process already successfully installed it and we don't need to install anymore, or the main process has already failed to install these, and we know in advance that the child processes will also fail to install these). |
This sounds like a good idea. |
Making sure child processes don't install GAP packages seems reasonable but is not really a solution. I think the quickest solution would be to use Regarding ncurses, we do have Ncurses_jll in our dependencies so we could avoid the system package if we could pass the correct build flags for the compilation. PS: I noticed that there is already another ticket about the same issue: oscar-system/GAP.jl#561 |
O.k., I am going to change |
I have recently seen a lot of the following, when running the Oscar tests in parallel:
Is this error included in the list above? |
The original error should not appear anymore with GAP 0.10.1 (oscar-system/GAP.jl#956). So I think this can be closed. I haven't seen that Regarding the GAP Browse / ncurses stuff, there is oscar-system/GAP.jl#614. |
OK closing this for now |
Multi threading in OSCAR unfortunately still is not really supported. That said, I still wouldn't expect it to fail like that. But here is the wrong place to discuss this: please open a new issue at the GAP.jl repository including precise steps for reproducing (as in: how did you start Julia, what did you enter, and on which OS / setup). If you can repro it with just Thanks! |
When running testing in parallel, sometimes GAP will fail to load on one of the workers, which throws an error stopping all the workers. Issue appears to not manifest in regular testing.
Note that this does not always happen. But it still happens often enough to be a problem. Using more workers increases the likelihood of this appearing.
To Reproduce
Gets the error:
Expected behavior
Oscar should be able to load on all workers, and testing should continue
System (please complete the following information):
The text was updated successfully, but these errors were encountered: