Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix mpp hang error if some error happens during compile of mpp plan in TiFlash #1533

Merged
merged 12 commits into from
Mar 11, 2021

Conversation

windtalker
Copy link
Contributor

What problem does this PR solve?

Issue Number: close #1529

Problem Summary:

As the issue described.

What is changed and how it works?

Proposal: xxx

What's Changed:

How it Works:
The root cause is if a mpp task failed in DispatchMPPTask stage, TiFlash does not handle the error properly, which makes the query hangs. In this pr, error handle logical is added.

Related changes

  • PR to update pingcap/docs/pingcap/docs-cn:
  • Need to cherry-pick to the release branch

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression
    • Consumes more CPU
    • Consumes more MEM
  • Breaking backward compatibility

Release note

  • Fix mpp hang error if some error happens during compile of mpp plan in TiFlash

@windtalker windtalker added the type/bugfix This PR fixes a bug. label Mar 10, 2021
@windtalker
Copy link
Contributor Author

/run-all-tests

1 similar comment
@windtalker
Copy link
Contributor Author

/run-all-tests

Copy link
Contributor

@tisonkun tisonkun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leave inline comments. Generally a good exception handling.

}
catch (...)
{
LOG_WARNING(log, "Failed to finish all tunnels");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could leverage the follow method for logging the exception either. Otherwise we know only it failed but lost the reason.

https://github.com/pingcap/tics/blob/6e89bad624e4a9bd4255ee86684dceea6eeed153/dbms/src/Common/Exception.cpp#L59

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea!

}
catch (...)
{
LOG_ERROR(log, "Fail to handle error in while trying to handle error and clean task");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Contributor

@leiysky leiysky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM

/// root task. Because for root task, if DispatchMPPTask fails, TiDB does
/// not sending establish MPP connection request at all, it is meaningless
/// to check the connect status in this case, just finish the tunnel.
void finish()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about use a more critical word like destroy?

Copy link
Contributor Author

@windtalker windtalker Mar 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I will use close

@@ -279,6 +291,20 @@ struct MPPTask : std::enable_shared_from_this<MPPTask>, private boost::noncopyab

void cancel();

void finishAllTunnel()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto.

@ti-srebot ti-srebot added the status/LGT1 Indicates that a PR has LGTM 1. label Mar 11, 2021
@windtalker
Copy link
Contributor Author

/run-all-tests

2 similar comments
@windtalker
Copy link
Contributor Author

/run-all-tests

@windtalker
Copy link
Contributor Author

/run-all-tests

Copy link
Member

@hanfei1991 hanfei1991 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@ti-srebot ti-srebot removed the status/LGT1 Indicates that a PR has LGTM 1. label Mar 11, 2021
@ti-srebot ti-srebot added the status/LGT2 Indicates that a PR has LGTM 2. label Mar 11, 2021
@windtalker windtalker added the status/can-merge Indicates a PR has been approved by a committer. label Mar 11, 2021
@ti-srebot
Copy link
Collaborator

/run-all-tests

@ti-srebot ti-srebot merged commit 4538877 into pingcap:master Mar 11, 2021
@windtalker windtalker deleted the mpp_hangs_for_complex_sql branch March 12, 2021 01:19
@leiysky leiysky linked an issue Mar 19, 2021 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2. type/bugfix This PR fixes a bug.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

The aggregation query hangs when mpp is on mpp query hangs if TiFlash meet some error for complex sql
5 participants