-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More GC regressions in nightly and 1.10 resulting in OOM crashes and weird stats, possibly related to gcext #50705
Comments
This looks like the same thing reported in #40644 (comment) |
#50682 unfortunately did not help. |
We would like to fix this and will continue to work on it with you but I believe it is not a release blocker. @gbaraldi is still looking into it. |
Note that this also happens for https://github.com/Nemocas/AbstractAlgebra.jl and 1.10-beta2. This is a pure julia package, with no GC shenanigans or C libraries. Github actions CI used to work flawlessly so far, but with 1.10-beta2 the job is often killed because of memory consumption. Here is an example: https://github.com/Nemocas/AbstractAlgebra.jl/actions/runs/5867762282/job/16017400384?pr=1405. |
Is this only a 1.10 thing or have you seen any nightly failures for AbstractAlgebra.jl as well ? |
Only on 1.10 so far, but CI has not run very often in meantime. I just noticed it recently. |
Just to say, currently the AbstractAlgebra CI tests consistently fail with Julia nightly (but pass with 1.10 and older versions), and it really looks like GC is involved. Let me stress again that this is a pure Julia package. I've written more detailed observations at Nemocas/AbstractAlgebra.jl#1432 but in a nutshell it looks as if it grows the heap target exponentially, but never shrinks it. My conjecture is that the crash happens when it tries to grow the heap from 8 to 16 GB which is too much for those little GitHub CIs -- but since it crashes without a stack trace, and obviously before GC stats can be printed, I am not sure (if someone has a hint how to figure that out, I am all ears). Of course all of this does not change the fact that for Oscar we heave similar crashes consistently with both Julia nightly and 1.10 |
What's the status of this? |
The CI for Oscar.jl on 1.10 was afaict fixed by the GC revert, I haven't noticed any OOM crashes for quite a while. (I think even before github changed the Linux runners to have 14 GB of RAM) Regarding nightly: We couldn't test on nightly for quite a while because this requires a working CxxWrap and various related binaries. But right now nightly seems to be mostly stable, there are a few crashes from time to time that we are still investigating but these are definitely not OOM crashes. We worked on improving the testsuite on our side as well by splitting it into two CI jobs, this also helps in avoiding OOM issues. So in summary this seems resolved. Due to some of these: improved julia code, larger runners, and split up testsuite. Regarding AbstractAlgebra, I think the CI also looks good on 1.10 and nightly, maybe @thofma can say more. |
I have not encountered any GC problems for a while now. Both 1.10 and nightly are looking good. |
Our test suite has started to crash more and more frequently, and now almost constantly, with the latest Julia nightly and 1.10 updates.
It seems we get OOM crashes (but it's hard to say because there are no stacktraces, just messages like this (if there is a way to get a backtrace here, that would be super helpful)
We have collect a ton of more data on oscar-system/Oscar.jl#2441 but no MWE as it is difficult to trigger this locally -- it "helps" that the CI runners on GitHub have only a few GB of RAM.
There is also something weird going on some of the statistics; note the crazy
heap_target
The text was updated successfully, but these errors were encountered: