-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use GC.enable_logging(true)
at start of test suite
#1432
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1432 +/- ##
==========================================
- Coverage 84.69% 82.46% -2.24%
==========================================
Files 110 110
Lines 29361 29316 -45
==========================================
- Hits 24868 24174 -694
- Misses 4493 5142 +649 see 57 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
Looking at the log with the failed Julia nightly run, we note that the last reported GC runs occur here:
This is up to line 2305 of the logs; then it proceeds to do a lot of test without any GC, and then in line 2439ff:
So what we see is that the GC repeatedly doubles the In contrast in the passing 1.10 job log, the heap grows more slowly. E.g. the section corresponding to what we have above does:
I.e. the heap target is just ~2.5 GB instead of ~8GB. There are then multiple further GC runs afterwards before and after the place in which the Julia nightly log reports a crash. The heap size fluctuates a bit afterwards -- notably it even decreases sometimes. See e.g. this excerpt:
which is from before the point where the Julia nightly run crashed. Now we can speculate what is causing this. I am guessing some differences in heuristics for the GC? But of course it could also have non-GC related compounding factors, e.g. something maybe takes up more heap space in nightly in 1.10/1.9 (but what)? |
I would disagree that this is a pure nightly thing. In the last few runs I found two cases, where nightly succeeded but 1.10 failed. https://github.com/Nemocas/AbstractAlgebra.jl/actions/runs/6165888371/job/16734460315 and https://github.com/Nemocas/AbstractAlgebra.jl/actions/runs/6165878185/job/16734433308 |
I see. In that case I'll restart the 1.10 job here a couple times to see if it will fail eventually... |
(No worries, the logs for older runs are then still available) |
Could you set the |
@gbaraldi hmmm, is there a way to tell the Is there a way to get Julia to print what it thinks are the heap limits? |
You can get and set the limits with ccalls instead: jlmax = @ccall jl_gc_get_max_memory()::UInt64
totalmem = @ccall uv_get_total_memory()::UInt64
constrmem = @ccall uv_get_constrained_memory()::UInt64
println("Memory: max: ", Base.format_bytes(jlmax))
println(" total: ", Base.format_bytes(totalmem))
println(" constr: ", Base.format_bytes(constrmem))
memenv = parse(Int, get(ENV, "OSCARCI_MAX_MEM_GB", "5")) * 2^30
println("Setting heap size hint to ", Base.format_bytes(memenv))
@ccall jl_gc_set_max_memory(memenv::UInt64)::Cvoid And then use the default of 5 or pass The defaults are set like this:
|
1175d36
to
e07d28b
Compare
OK thanks @benlorenz I've now added this to this PR, let's see |
With the 5GB limit 1.10 passes just fine, but nightly is stuck with a full heap, collecting tiny amounts of memory in never ending collections:
That suggests memory usage has grown considerably from Julia 1.10 to nightly/master. Would be good to find out what; perhaps we should file a separate Julialang issue for that? |
Yeah, this does't look like a GC issue but a GC symptom ;). If you can minimize something that reproduces the regression, or at least bisect it, it would be awesome! |
I tried to look at the memory issue but right now this is rather difficult because of the random nature of many testsets. This would easily explain why it sometimes fails, but it doesn't really explain why it happens more often with 1.10 and 1.11. Edit: For a similar crash with julia 1.9 see for example: https://github.com/Nemocas/AbstractAlgebra.jl/actions/runs/6275271908/job/17042406344#step:6:1289 |
@gbaraldi I am not so sure I agree that "this doesn't look like a GC issue but a GC symptom", at least not entirely. After all we also saw/see crashes in 1.10 without the 5GB heap limit. We can of course manually set such a limit, but it seems like a GC bug that it tries to exceed the RAM... Point in case, here is the output of the debug code given to me by @benlorenz which I inserted: In Julia 1.9:
In Julia 1.10 and nightly:
Note how in 1.9 it uses ~4GB as "max" while in 1.10 and nightly it uses 2 petabytes? Is that intentional? |
GC.enable_logging(true)
at start of test suite
Perhaps we should just merge this (possibly with the max heap adjustment disabled): getting this GC output would possible be helpful in debugging further CI failures. Thoughts? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be good to have this by default, as it produces more runs with the debug output to compare against. (And usually it isn't that much output, unless the GC is thrashing and about to die anyway)
And I would change the jl_gc_set_max_memory
to off for now, I think this needs more testing.
- call `GC.enable_logging(true)` at start of test suite - print information about what the GC thinks is the available memory We've been seeing crashes with Julia 1.10 and nightly, which may be GC related. Enabling this logging may reveal whether abnormal GC behavior is involved.
e07d28b
to
1a21def
Compare
I've disabled the call to I'll wait a bit to see if @thofma has concerns. |
Looks good. I agree that we should not mess with the |
We've been seeing crashes with Julia 1.10 and nightly, which may be GC
related. Enabling this logging may reveal whether abnormal GC behavior
is involved.
CC @thofma @benlorenz