Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Abort in deadlock handler if we fail to get a query map #138581

Merged
merged 3 commits into from
Mar 25, 2025

Conversation

Zoxc
Copy link
Contributor

@Zoxc Zoxc commented Mar 16, 2025

Resolving query cycles requires the complete active query map, or it may miss query cycles. We did not check that the map is completely constructed before. If there is some error collecting the map, something has gone wrong already. This adds a check to abort/panic if we fail to construct the complete map.

This can help differentiate errors from the deadlock detected case if constructing query map has errors in practice.

An Option is not used for collect_active_jobs as the panic handler can still make use of a partial map.

@rustbot
Copy link
Collaborator

rustbot commented Mar 16, 2025

r? @petrochenkov

rustbot has assigned @petrochenkov.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added A-query-system Area: The rustc query system (https://rustc-dev-guide.rust-lang.org/query.html) S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Mar 16, 2025
@petrochenkov
Copy link
Contributor

Is anybody else familiar with this logic?
The changes look reasonable but I do not understand the details.

@Zoxc
Copy link
Contributor Author

Zoxc commented Mar 17, 2025

@oli-obk looked at my last PR in this area at least.

if !complete {
eprintln!("internal compiler error: failed to get query map in deadlock handler, aborting process");
// We need to abort here as we failed to resolve the deadlock,
// otherwise the compiler could just hang,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does failing to collect_active_jobs mean that we failed to resolve the deadlock?
Why the compiler can hang in this case?

Copy link
Member

@SparrowLii SparrowLii Mar 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deadlock detection depends on obtaining all the unfinished query tasks in the state and collecting them into one QueryMap.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deadlock detection depends on obtaining all the unfinished query tasks in the state and collecting them into one QueryMap.

@Zoxc Could you add this as a comment to code (preferably with a brief explanation of why it depends on obtaining all the tasks).

let error =
try_execute.find_cycle_in_stack(qcx.collect_active_jobs(), &qcx.current_query_job(), span);
let (query_map, complete) = qcx.collect_active_jobs();
assert!(complete, "failed to collect active queries");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is incomplete map insufficient here?
What can happen if we try to find_cycle_in_stack with an incomplete map?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will just panic if the relevant queries are missing. It's useful to differentiate the errors though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add this as a comment to code?

@petrochenkov
Copy link
Contributor

cc @nnethercote @zetanumbers

@petrochenkov petrochenkov added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 17, 2025
@nnethercote
Copy link
Contributor

@SparrowLii might be the best person to review?

I think a better description of the motivation and change would help. The PR description says the following:

Resolving query cycles requires the complete active query map. If for some reason it's incomplete, we abort instead.

An Option is not used as the panic handler can still make use of a partial map.

I can't tell if that's describing the current state, or the change being made. It needs a clear description of the problem with the current code, and the change being made, and how that change improves things.

if !complete {
eprintln!("internal compiler error: failed to get query map in deadlock handler, aborting process");
// We need to abort here as we failed to resolve the deadlock,
// otherwise the compiler could just hang,
Copy link
Member

@SparrowLii SparrowLii Mar 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deadlock detection depends on obtaining all the unfinished query tasks in the state and collecting them into one QueryMap.

@@ -693,7 +695,7 @@ macro_rules! define_queries {
}
}

pub(crate) fn try_collect_active_jobs<'tcx>(tcx: TyCtxt<'tcx>, qmap: &mut QueryMap) {
pub(crate) fn try_collect_active_jobs<'tcx>(tcx: TyCtxt<'tcx>, qmap: &mut QueryMap, complete: &mut bool) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just return bool here? I'd like to avoid having a function rely on extra mutable references.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That adds more complexity though.

@SparrowLii
Copy link
Member

SparrowLii commented Mar 18, 2025

@nnethercote AFAIK, in the previous design, try_collect_active_jobs would have returned an Option<QueryMap> for each query, and then merged all QueryMaps. The current design avoids the cost of merging, but loses the ability to check whether the collection is complete by Option, so the motivation for this PR looks reasonable.

@Zoxc Can you add a description to the PR so that people without the knowledge about try_collect_active_jobs can understand the history?

@petrochenkov
Copy link
Contributor

r? @SparrowLii

fn collect_active_jobs(self) -> QueryMap {
/// Returns a query map representing active query jobs and a bool being false
/// if there was an error constructing the map.
fn collect_active_jobs(self) -> (QueryMap, bool) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fn collect_active_jobs(self) -> (QueryMap, bool) {
fn collect_active_jobs(self) -> Result<QueryMap, QueryMap> {

Might be more explicit. When not caring about whether the map is complete or not one can do:

let query_map = match qcx.collect_active_jobs() {
  Ok(complete_map) => complete_map,
  Err(incomplete_map) => incomplete_map,
};

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That does feel a bit odd to me since QueryMap is not an error.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using QueryMap as an error does force callers of collect_active_jobs to check if it is incomplete or not, the bool can still easily be ignored by accident. Pretty sure i've seen this pattern of Result<T, T> in rustc and elsewhere.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use Result<QueryMap, QueryMap>, we should at least do something about the incomplete QueryMap, but we doesn't. I've not had to deal with an incomplete QueryMap in practice.

@Zoxc Zoxc force-pushed the abort-handler-if-locked branch from e8ad42d to 34244c1 Compare March 21, 2025 07:26
@Zoxc
Copy link
Contributor Author

Zoxc commented Mar 22, 2025

@rustbot ready

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Mar 22, 2025
Copy link
Member

@SparrowLii SparrowLii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@SparrowLii
Copy link
Member

@bors r+

@bors
Copy link
Contributor

bors commented Mar 24, 2025

📌 Commit 34244c1 has been approved by SparrowLii

It is now in the queue for this repository.

@bors
Copy link
Contributor

bors commented Mar 24, 2025

🌲 The tree is currently closed for pull requests below priority 100. This pull request will be tested once the tree is reopened.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 24, 2025
bors added a commit to rust-lang-ci/rust that referenced this pull request Mar 25, 2025
…iaskrgr

Rollup of 8 pull requests

Successful merges:

 - rust-lang#135745 (Recognise new IPv6 non-global range from IETF RFC 9602)
 - rust-lang#137247 (cg_llvm: Reduce the visibility of types, modules and using declarations in `rustc_codegen_llvm`.)
 - rust-lang#138317 (privacy: Visit types and traits in impls in type privacy lints)
 - rust-lang#138581 (Abort in deadlock handler if we fail to get a query map)
 - rust-lang#138776 (coverage: Separate span-extraction from unexpansion)
 - rust-lang#138886 (Fix autofix for `self` and `self as …` in `unused_imports` lint)
 - rust-lang#138924 (Reduce `kw::Empty` usage, part 3)
 - rust-lang#138929 (Visitors track whether an assoc item is in a trait impl or an inherent impl)

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit 43297ff into rust-lang:master Mar 25, 2025
6 checks passed
@rustbot rustbot added this to the 1.87.0 milestone Mar 25, 2025
rust-timer added a commit to rust-lang-ci/rust that referenced this pull request Mar 25, 2025
Rollup merge of rust-lang#138581 - Zoxc:abort-handler-if-locked, r=SparrowLii

Abort in deadlock handler if we fail to get a query map

Resolving query cycles requires the complete active query map, or it may miss query cycles. We did not check that the map is completely constructed before. If there is some error collecting the map, something has gone wrong already. This adds a check to abort/panic if we fail to construct the complete map.

This can help differentiate errors from the `deadlock detected` case if constructing query map has errors in practice.

An `Option` is not used for `collect_active_jobs` as the panic handler can still make use of a partial map.
@Zoxc Zoxc deleted the abort-handler-if-locked branch March 26, 2025 01:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-query-system Area: The rustc query system (https://rustc-dev-guide.rust-lang.org/query.html) S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants