Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

backupccl: RESTORE slowing at end #23509

Closed
maddyblue opened this issue Mar 6, 2018 · 11 comments
Closed

backupccl: RESTORE slowing at end #23509

maddyblue opened this issue Mar 6, 2018 · 11 comments
Assignees
Labels
A-disaster-recovery C-performance Perf of queries or internals. Solution not expected to change functional behavior. T-disaster-recovery

Comments

@maddyblue
Copy link
Contributor

maddyblue commented Mar 6, 2018

During a TPCC restore, we noticed that the speed of the restore of a table slows way down near the end. Looking at a goroutine dump, a single node (out of a 24 node cluster) had hundreds of goroutines in beginLimitedRequest, which throttles import requests to one at a time.

Next question was: why did that node (node 9) have so many requests after a scatter had run. Doing a range dump for the new table, we made a histogram with the number of leader leases per node of the new table:

ss

This indeed does show node 9 having a much higher number of leader leases compared to other nodes.

Our guess is that scatter is indeed scattering, but since various restores have already happened, it is possible that scatter has no choice but to put the new ranges onto some underfull nodes, which makes the restore eventually take a long time on that node. We don't think there's any specific thing to do regarding this scatter problem at the current time.

However, we do think it's worth removing or upping the limit in beginLimitedRequest since we have some other limits during addsstable that may make the beginLimitedRequest throttling unnecessary.

Epic CRDB-6406

@maddyblue maddyblue added this to the 2.1 milestone Mar 6, 2018
@maddyblue
Copy link
Contributor Author

var importRequestLimit = 1
should be changed to numcpu probably during the test.

@petermattis
Copy link
Collaborator

I think the problem here is that scattering a small number of ranges (hundreds) in a cluster containing tens of thousands of ranges will often be a no-op. See #23358 (comment) for a suggestion about plumbing a new flag down to SCATTER so that we can force replicas to be spread out across a cluster. This would be helpful for manually ensuring a table is spread across a cluster and very helpful for RESTORE.

danhhz added a commit to danhhz/cockroach that referenced this issue Mar 7, 2018
This should speed up `./workload fixture load`, especially when we're
seeing the long-tail behavior described in cockroachdb#23509.

Release note: None
@maddyblue maddyblue added the C-performance Perf of queries or internals. Solution not expected to change functional behavior. label Apr 26, 2018
@petermattis petermattis removed this from the 2.1 milestone Oct 5, 2018
@awoods187
Copy link
Contributor

I observe this on all of my TPC-C restores of 10k. The last 0.2tb take ~40m when the first 2tb take 120m.

@tbg
Copy link
Member

tbg commented Oct 29, 2018

Saw this in the CPU usage graph in roachtest/restore2TB. Not sure if this is what was talked about before or if it's some replication hiccup, but here we go.

image

@awoods187
Copy link
Contributor

awoods187 commented Nov 14, 2018

I can see similar things on master. For example:
image

The job has been at 99% for at least the last 2 hours:
image

But it is, ever so glacially slowly, making progress.

@tbg
Copy link
Member

tbg commented Nov 14, 2018 via email

@tbg
Copy link
Member

tbg commented Nov 14, 2018

@awoods187 just sent me a cluster of his that had the same symptom. Not sure it's the root cause of the behavior he's seeing but the cluster had all the signs of #31875 (comment) which is hopefully fixed once that PR lands.

@danhhz
Copy link
Contributor

danhhz commented Jan 25, 2019

The time-bound iterator workaround (#32909) eliminates at least one class of these tail issues for incremental backup. We're also seeing this behavior in restore, import, and non-incremental backup, so that was clearly not the only cause. Just FYI that it was a partial fix.

@mwang1026
Copy link

cc @pbardea thoughts on how relevant this still is with the work you did in 20.2?

@pbardea pbardea self-assigned this Mar 11, 2021
@pbardea
Copy link
Contributor

pbardea commented May 26, 2021

There are a few things going on here:

  • AdminScatter was augmented with a flag that forces the leases of the target range to be scatter (this was added a while ago and I don't know if it was introduced after this bug was reported)
  • The 20.2 refactor should enable all of the nodes to process the ranges that they're scattered as soon as they're available (ie they're not bottlenecked on other nodes)
  • Further improvements based on recent restore performance investigation suggested that we could do a better job sizing the chunks that we hand out to worker nodes based on cluster size. backupccl: create more chunks on larger clusters #64067 improves that.

A lot of this issue is also covered in #63925. One thing that hasn't been benchmarked is the performance of restoring a table into a cluster that already holds a significant amount of data. My hypothesis is that the RandomizeLeases flag on the AdminScatter request should help ensure that the entries are probably distributed, but I can run a benchmark to confirm. If we see pretty even AddSSTable traffic across the nodes in that benchmark I think we can close out this issue.

@shermanCRL
Copy link
Contributor

We think we’ve made substantial improvements here, will close for now. There is probably more to be done, but not urgently, and can re-open if need be.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-disaster-recovery C-performance Perf of queries or internals. Solution not expected to change functional behavior. T-disaster-recovery
Projects
No open projects
Archived in project
Development

No branches or pull requests

9 participants