Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

invalidate stale regions for Mpp query. #1859

Merged
merged 12 commits into from
May 8, 2021
Merged

Conversation

hanfei1991
Copy link
Member

@hanfei1991 hanfei1991 commented May 3, 2021

What problem does this PR solve?

Problem Summary:
issue: #1873
Right now, we won't update the stale regions during executing Mpp queries. Then we added the try regions field in MppDispatchTask reponse and try to invalidate the stale regions on TiDB side.

What is changed and how it works?

Related changes

pingcap/kvproto#751

Check List

Tests

  • Manual test (add detailed scripts or steps below)
    • Insert data continously
    • observe that tiflash is retrying regions. In my case, the log is like ["DAGQueryBlockInterpreter: Start to retry 1 regions (10936,)"]
    • observe log in tidb, find ["invalid region because tiflash detected stale region"] ["region id"="{ region id: 10936, ver: 116, confVer: 2 }"] proving this function works.

Release note

  • invalidate stale regions for Mpp query.

@leiysky
Copy link
Contributor

leiysky commented May 5, 2021

Should we add this for batch coprocessor either?

@hanfei1991
Copy link
Member Author

Should we add this for batch coprocessor either?

yeah, I plan to add it later

@@ -239,6 +239,9 @@ void DAGQueryBlockInterpreter::executeTS(const tipb::TableScan & ts, Pipeline &
// For those regions which are not presented in this tiflash node, we will try to fetch streams by key ranges from other tiflash nodes, only happens in batch cop mode.
if (!region_retry.empty())
{
for (auto it: region_retry) {
retry_regions.push_back(it.second);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just put the retry regions into DAGContext.retry_regions directly?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The region_retry include two kinds of regions:

  1. region with epoch error
  2. region with lock error
    do we need to return all of them or just return the ones with epoch error?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we needn't. Is there an easy way to pick all the stale ones?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid there is no easy way since region_retry is only a set and does not keep the retry reason, you can refine it to record the retry reason alongside with the region id if the cost of invalidate a region everytime when it meet lock exception is unacceptable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the cost is too high. How about keeping it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

@hanfei1991
Copy link
Member Author

./build

@@ -239,6 +239,9 @@ void DAGQueryBlockInterpreter::executeTS(const tipb::TableScan & ts, Pipeline &
// For those regions which are not presented in this tiflash node, we will try to fetch streams by key ranges from other tiflash nodes, only happens in batch cop mode.
if (!region_retry.empty())
{
for (auto it: region_retry) {
retry_regions.push_back(it.second);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The region_retry include two kinds of regions:

  1. region with epoch error
  2. region with lock error
    do we need to return all of them or just return the ones with epoch error?

if (dag_inter != nullptr)
{
context.getQueryContext().getDAGContext()->retry_regions = dag_inter->retry_regions;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since the retry regions are already put into DAGContext.retry_regions, why this piece of code is needed?

@@ -248,6 +248,10 @@ void DAGQueryBlockInterpreter::executeTS(const tipb::TableScan & ts, Pipeline &
// For those regions which are not presented in this tiflash node, we will try to fetch streams by key ranges from other tiflash nodes, only happens in batch cop mode.
if (!region_retry.empty())
{
for (auto it: region_retry) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should format the code

Copy link
Contributor

@windtalker windtalker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-srebot ti-srebot added the status/LGT1 Indicates that a PR has LGTM 1. label May 7, 2021
@hanfei1991
Copy link
Member Author

/run-all-tests

@hanfei1991 hanfei1991 added the needs-cherry-pick-release-5.0 PR which needs to be cherry-picked to release-5.0 label May 7, 2021
@hanfei1991
Copy link
Member Author

/run-all-tests

@windtalker
Copy link
Contributor

/merge

@ti-srebot ti-srebot added the status/can-merge Indicates a PR has been approved by a committer. label May 7, 2021
@ti-srebot
Copy link
Collaborator

/run-all-tests

1 similar comment
@hanfei1991
Copy link
Member Author

/run-all-tests

@hanfei1991 hanfei1991 merged commit 2956eaf into master May 8, 2021
@hanfei1991 hanfei1991 deleted the hanfei/relay-region-err branch May 8, 2021 03:42
@ti-srebot
Copy link
Collaborator

cherry pick to release-5.0 in PR #1877

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-cherry-pick-release-5.0 PR which needs to be cherry-picked to release-5.0 status/can-merge Indicates a PR has been approved by a committer. status/LGT1 Indicates that a PR has LGTM 1.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants