Deterministic query cycles for parallel front-end#149849
Deterministic query cycles for parallel front-end#149849zetanumbers wants to merge 6 commits intorust-lang:mainfrom
Conversation
|
These commits modify the If this was unintentional then you should revert the changes before this PR is merged. |
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Hi, first of all, thank you for your work on this. I can see a lot has been done to address the issue!
Unable to speak about entire implementation since don't know much about this part, but one note on this particular file: the logic seems a bit unclear to me due to the magic numbers and bit manipulations. Even though the file isn't large, adding a few explanatory comments would be a big help for clarity
There was a problem hiding this comment.
Also, we'll need a regression test to check it doesn't ICE with these changes
There was a problem hiding this comment.
How about now? I also moved it to tree_node_index.rs.
|
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Deterministic query cycles for parallel front-end
|
Ok this is surprising |
This comment has been minimized.
This comment has been minimized.
|
Finished benchmarking commit (306a768): comparison URL. Overall result: ❌ regressions - please read the text belowBenchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @bors rollup=never Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (secondary -1.0%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (primary 2.2%, secondary 2.0%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 471.42s -> 474.532s (0.66%) |
|
It's a bit unclear to me what source of non-determinism this tries to fix. Does this alter parallel execution in any way other than picking which query in the cycle use for resumption? It looks to me like you're trying to pick the point in the cycle to break which corresponds to a single threaded execution, but I don't think that point is guaranteed to be resumable. Currently it looks like we're not deterministically picking which query in a cycle to resume when multiple is present. That's fairly simple to improve, though I think the queries available for resumption is non-deterministic to start with. |
I make an assumption that when we get a query cycle there is the "point in the cycle to break which corresponds to a single threaded execution" which currently
That's what I am trying to make deterministic. |
0a5000f to
ba45f0b
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
ba45f0b to
206fec5
Compare
|
Some changes occurred in compiler/rustc_codegen_cranelift cc @bjorn3 |
This comment has been minimized.
This comment has been minimized.
0485e5b to
7a227c8
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
Already? Are you freaking serious? |
This comment has been minimized.
This comment has been minimized.
c09bb0b to
cf1bd6f
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
cc @Zoxc |
|
Say we have a query cycle And a program: On a single thread we take the left task and end up resuming Now consider a parallel case: Thread 2 runs the right side first so we get the stack: Then Thread 1 runs the left side and gets the following stack: All threads are waiting so we enter the deadlock handler, and to match the single thread case we need to resume |
|
So we do the next most deterministic thing and resume thread 1 calling cycle hander for We can guarantee some thread waits on |
|
We can do that because any query call in a cycle outputs an ill-defined value from a compiler's standpoint and isn't being dependent on until compiler abort/exits. Except for representability, layout_of and type_of_opaque for which I assume output is consistent. At the time of writing this code I think I've ran query cycles tests with multi-threading enabled using a hack, but after you mentioned it I have to add Representability passes through Type_of_opaque seems to boil down to call from either |
|
Query layout_of seems to propagate cycle error too. Although new cycles could be introduced by multi-threaded front-end too. If those cycles are disjoint then I assume those are independent and are valid cycles to report. Otherwise as in your example the right thread waiting on |
|
This PR wouldn't solve #142063's family of bugs related to diagnostic inconsistencies, but in this instance it is solved due to I should add all this explanation to new cycle breaking code and query cycle handler code too. |
|
Either way I don't see what could this code possibly do worse for picking and resuming queries. Although I might not know if the old query cycle breaking code does anything special aside from... breaking those cycles with no preference. |
cf1bd6f to
cbfbeca
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
c1b5dff to
f9c6527
Compare
|
This PR was rebased onto a different main commit. Here's a range-diff highlighting what actually changed. Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers. |
|
I've sliced these commits as granulary as I have ever done till this day. |
The mechanism is similar to cycle detection in single-threaded mode. We traverse the deadlocked query graph from the top active query downwards to subqueries until we visit some query a second time, thus finding a cycle. With multi-thread front-end enabled one query may now have more than one active subqueries, aka we used one of parallel interfaces
parallel!,join,par_for_each, etc. As such we have to traverse the "leftmost" active subquery to recover the sequential behavior of these parallel interfaces in single-threaded mode. NewTreeNodeIndexsaves implicit context information about whatjoin(orscope) task we entered while executing a query, which we then use inbreak_query_cycle.However we then have to guarantee the query stack from single-threaded mode is included in the active query graph. This is true for
joinfunction as their first task will be completed on the same thread and same will be tried for the second task unless stolen which is fine for us.scopeplaces tasks in local queue and pops them in LIFO maner, while other worker threads could only steal from that queue in FIFO maner, thus we can guarantee the next task is either stolen or available for execution.Fixes #142064
Fixes #142063
Fixes #127971
UPDATE: commits are sliced to the finest detail