-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High (default) gcthreads count causing slowdown on Julia 1.10.0-beta2 #51044
Comments
Can't reproduce:
|
The results in the post above were measured using a smaller number of threads than the number of cores reported in versioninfo. Setting the number of GC threads to something higher than the number of virtual cores seems to cause a regression:
|
This problem persists on 1.10.0-rc1. This is a Linux box with 2 x AMD EPYC 7371.
does not print anything as the the Julia process spins at 3200 %CPU. Without
|
@KristofferC I think this was removed from milestone 1.10 prematurely. |
Re-adding to the milestone because this regressing is visible with Perhaps adjusting the default number of gc threads would be a reasonable hotfix for the 1.10 regression? |
Just limit the default number of gc threads to four? Or to 8? |
FWIW, I run https://github.com/JuliaCI/GCBenchmarks on Julia 1.10.0-rc1
|
One more benchmark run with 64 threads
|
What is the way forward? Just limit the default number of gcthreads to 4 or 8, or something else? |
We're capping the number of GC threads to number of cores. X-ref: #52192. |
On my machine |
This is not sufficient. At least not for the DEFAULT number of threads. If a user explicitly specifies a higher number than 4 or 8 that might be OK, but not as default. |
Solving this issue would only require a few lines of code, for example like this:
If we would agree that this is the solution I could write a pull request. |
|
Capping it to 8 might be the "quick fix" for 1.10. I don't think we have seen any CPU where that would be problematic. |
Well, on this MacBook it probably does not make any sense to use multi-threaded garbage collection, so zero is correct... |
So current suggestion:
|
https://github.com/JuliaLang/julia/pull/52192/files would be slightly tweaked yes. |
Well, you cannot tweak a commit, you can only create a new commit/ pull request, correct? |
@KristofferC Can you review my pull request? |
As mentioned above, I was able to reproduce a slowdown (on both master and 1.10) when oversubscribing my machine. For this scenario, #52294 seems to help:
Unfortunately I don't have easy access to an AMD EPYC at the moment, so not sure if it fixes the issue above. We've also backported #52294 at RAI and run our CI and a few benchmarks internally: it seems fine from a correctness and performance point of view. Backporting the GC scheduler PR above to 1.10 could be an option, but obviously it has its own share of risks. |
Isn't it best to merge that then, and backport to 1.10, to not have it different from 1.11? And NOT cap at 8, was that a hack that should then not be needed? We want aggressive good defaults for 1.10 and could add a cap in a point release if needed. Anyone could also do it manually (e.g. those on unusual hardware; two-socket), with CLI option. It seems worse for me to cap too much (and otherwise have different from 1.11) and then later to relax in a point-release. [You can allocate in all your threads, and it seems to me same amount of GC threads (or more?!) would be useful to not fall behind many allocating threads. I'm skeptical of 1/2 let alone 1/4th the amount of GC threads. What do other languages do?] |
For me Julia (sometimes) crashes when I use PyPlot together with too many GC threads... So I prefer a lower GC thread count... I prefer safe defaults. |
This is derailing the discussion of this issue. If Julia is crashing with GC threads, then that's a serious correctness bug and we should have a separate issue for that. I'd encourage you to open an issue and share a reproducer so that we can take a look. |
I tried to find the artifact of #52294, but I could not figure out the path. I could test the PR preferably in binary format. This is not it |
@d-netto Can we merge #52294 ASAP for 1.11/master, is it likely to make the "crash" go away? I'm not sure why but I see it has "This pull request may be merged without approvals."
I also prefer safe, non-crashing, if you meant that as in segfault (as opposed to OOM, which is also annoying, not a "crash", not what you're experiencing?). I want safe with or without -t auto, but at least it's not the default, and then no GC threads, so you will be safe by default. It's just that I think capping GC threads at 8 or any value would hide a bug, i.e. make it less likely, but still non-zero probability. Then it might actually be better to have it more frequent... @ufechner7 I don't rule out merging your capping PR but I think we should first try his, you and others test 1.11, to see if it gives confidence that it can be backported to 1.10, then your PR could be the alternative. Since this is the last open issue for 1.10, then I think we should move quickly on this, maybe even merge to 1.10 for testing? FYI: This crash #52256 was removed from 1.10 milestone, 3 days ago. I'm not sure what to read into that, since it's till open, is it just not considered a blocker, or was the GC incorrectly implicated (even if the GC not the reason, should it still be a blocker)? And one hour ago this (1.10) crash issue #52363 opened, not yet on the milestone. Also FYI, I DID suggest -t auto be made the default, but despite many using it, e.g. I believe Pluto by default and VS Code, it was suggested unsafe, not thread-safe, and I realized it never will be/made default. So maybe it's just not supported option in some sense, though I would like it not to get worse between versions... |
Even though this crash manifests as a GC corruption/segfault, it seems to be stemming from a bug in the LLVM compiler. |
The latest nightlies at https://julialang.org/downloads/nightlies/ now include #52294, which merged today |
It works for me!
|
Can this be closed now, or only after #52294 is back-ported? |
Rationale for closing seems fair. Someone can reopen if not. |
Just showing
versioninfo()
takes 55 s (varies from run to run)but then setting
--threads auto
takes more than I could waitSetting
--threads auto --gcthreads 1
seems to be fine.The text was updated successfully, but these errors were encountered: