-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix synchronization issues on the GC scheduler. #53355
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this would be quite correct.
I'm afraid this might also be vulnerable to the issue described in #53350.
In particular, note that's possible for a thread to be context switched out in this line, and whenever it continues execution, a large enough number of cycles may have been passed and the master thread may have already set gc_all_tls_states = NULL
here, so that when the straggler tries to access gc_all_tls_states
in gc_count_work_in_queue
it will just read NULL
.
If the master thread has exited, we can never reach that part of the code, the value has to be 0 in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm. Note for the commit message we think we fixed an extra bug in sweep also
Segfault here seems related to the PR. |
The segfault in question happened because |
We maintain a side byte-map that holds information about the state of the page (GC_PAGE_ALLOCATED , GC_PAGE_LAZILY_FREED, etc). If page_metadata is returning NULL, it likely means that the byte in the byte-map for that given page is set to something other than GC_PAGE_ALLOCATED. I.e. we may never have allocated this page or may have freed it prematurely. |
The part which removes the fast path in which we check We should backport it. |
@@ -1676,7 +1677,7 @@ void gc_sweep_wake_all(jl_ptls_t ptls, jl_gc_padded_page_stack_t *new_gc_allocd_ | |||
void gc_sweep_wait_for_all(void) | |||
{ | |||
jl_atomic_store(&gc_allocd_scratch, NULL); | |||
while (jl_atomic_load_relaxed(&gc_n_threads_sweeping) != 0) { | |||
while (jl_atomic_load_acquire(&gc_n_threads_sweeping) != 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@d-netto We also changed this just because of reading the code but not from an observed failure. This acquires the writes on the mutator threads, so that the next stage of using gc_allocd_scratch
here will acquire all of previous writes from the threads that did a write-release of gc_n_threads_sweeping. I don't think this would likely effect TSO systems, but we theorized that on a weaker ordering system, those last remaining writes might not have become visible yet on this thread, leading to a slightly stale free-page list pointer, resulting in some dropped pages from future sweeping.
Merge? |
Is this, or can it be made into a separate commit? |
Shouldn't all changes to |
This aims to slightly simplify the synchronization by making `n_threads_marking` the sole memory location of relevance for it, it also removes the fast path, because being protected by the lock is quite important so that the observed gc state arrays are valid. Fixes: #53350 Fixes: #52757 Maybe fixes: #53026 Co-authored-by: Jameson Nash <vtjnash@gmail.com> (cherry picked from commit a96726b)
Backported PRs: - [x] #53361 <!-- 🤖 [master] Bump the SparseArrays stdlib from c9f7293 to cb602d7 --> - [x] #53300 <!-- allow external absint to hold custom data in `codeinst.inferred` --> - [x] #53342 <!-- Add `Base.wrap` to docs --> - [x] #53372 <!-- Silence warnings in `test/file.jl` --> - [x] #53357 <!-- 🤖 [master] Bump the Pkg stdlib from 6dd0e7c9e to 76070d295 --> - [x] #53373 <!-- fix sysimage-native-code=no option with pkgimages --> - [x] #53333 <!-- More consistent return value for annotations, take 2 --> - [x] #53354 <!-- fix code coverage bug in tail position and `else` --> - [x] #53407 <!-- fix sysimage-native-code=yes option --> - [x] #53388 <!-- Fix documentation: thread pool of main thread --> - [x] #53355 <!-- Fix synchronization issues on the GC scheduler. --> - [x] #53429 <!-- Subtype: skip slow-path in `local_∀_∃_subtype` if inputs contain no ∃ typevar. --> - [x] #53437 <!-- Add debug variant of loader_trampolines.o --> - [x] #53284 <!-- Add annotate! method for AnnotatedIOBuffer --> - [x] #53466 <!-- [MozillaCACerts_jll] Update to v2023-12-12 --> - [x] #53467 <!-- [LibGit2_jll] Update to v1.7.2 --> - [x] #53326 <!-- RFC: when loading code for internal purposes, load stdlib files directly, bypassing DEPOT_PATH, LOAD_PATH, and stale checks --> - [x] #53332 - [x] #53320 <!-- Add `Sys.isreadable, Sys.iswriteable`, update `ispath` --> - [x] #53476 Contains multiple commits, manual intervention needed: - [ ] #53285 <!-- Add update mechanism for Terminfo, and common user-alias data --> Non-merged PRs with backport label: - [ ] #53424 <!-- yet more atomics & cache-line fixes on work-stealing queue --> - [ ] #53408 <!-- task splitting: change additive accumulation to multiplicative --> - [ ] #53403 <!-- Move parallel precompilation to Base --> - [ ] #53402 <!-- Add `jl_getaffinity` and `jl_setaffinity` --> - [ ] #53391 <!-- Default to the medium code model in x86 linux --> - [ ] #53125 <!-- coverage: count coverage where explicitly requested by inference only --> - [ ] #52694 <!-- Reinstate similar for AbstractQ for backward compatibility -->
This aims to slightly simplify the synchronization by making `n_threads_marking` the sole memory location of relevance for it, it also removes the fast path, because being protected by the lock is quite important so that the observed gc state arrays are valid. Fixes: JuliaLang#53350 Fixes: JuliaLang#52757 Maybe fixes: JuliaLang#53026 Co-authored-by: Jameson Nash <vtjnash@gmail.com>
This aims to slightly simplify the synchronization by making `n_threads_marking` the sole memory location of relevance for it, it also removes the fast path, because being protected by the lock is quite important so that the observed gc state arrays are valid. Fixes: JuliaLang#53350 Fixes: JuliaLang#52757 Maybe fixes: JuliaLang#53026 Co-authored-by: Jameson Nash <vtjnash@gmail.com>
Cherry-pick the parts of #53355 which are relevant to the 1.10 GC.
Cherry-pick the parts of JuliaLang#53355 which are relevant to the 1.10 GC.
Cherry-pick the parts of JuliaLang#53355 which are relevant to the 1.10 GC.
Cherry-pick the parts of JuliaLang#53355 which are relevant to the 1.10 GC.
Cherry-pick the parts of JuliaLang#53355 which are relevant to the 1.10 GC.
Cherry-pick the parts of JuliaLang#53355 which are relevant to the 1.10 GC.
Me and @vtjnash stared at this code for a while and it seems that making
n_threads_marking
the sole synchronization is the way forward, and to remove the fast path because being protected by the lock is quite important