backupccl: RESTORE slowing at end #23509

maddyblue · 2018-03-06T21:44:10Z

During a TPCC restore, we noticed that the speed of the restore of a table slows way down near the end. Looking at a goroutine dump, a single node (out of a 24 node cluster) had hundreds of goroutines in beginLimitedRequest, which throttles import requests to one at a time.

Next question was: why did that node (node 9) have so many requests after a scatter had run. Doing a range dump for the new table, we made a histogram with the number of leader leases per node of the new table:

This indeed does show node 9 having a much higher number of leader leases compared to other nodes.

Our guess is that scatter is indeed scattering, but since various restores have already happened, it is possible that scatter has no choice but to put the new ranges onto some underfull nodes, which makes the restore eventually take a long time on that node. We don't think there's any specific thing to do regarding this scatter problem at the current time.

However, we do think it's worth removing or upping the limit in beginLimitedRequest since we have some other limits during addsstable that may make the beginLimitedRequest throttling unnecessary.

Epic CRDB-6406

maddyblue · 2018-03-06T21:50:21Z

cockroach/pkg/ccl/storageccl/import.go

Line 40 in fe81631

var importRequestLimit = 1

should be changed to numcpu probably during the test.

petermattis · 2018-03-06T22:46:10Z

I think the problem here is that scattering a small number of ranges (hundreds) in a cluster containing tens of thousands of ranges will often be a no-op. See #23358 (comment) for a suggestion about plumbing a new flag down to SCATTER so that we can force replicas to be spread out across a cluster. This would be helpful for manually ensuring a table is spread across a cluster and very helpful for RESTORE.

This should speed up `./workload fixture load`, especially when we're seeing the long-tail behavior described in cockroachdb#23509. Release note: None

awoods187 · 2018-10-24T12:43:22Z

I observe this on all of my TPC-C restores of 10k. The last 0.2tb take ~40m when the first 2tb take 120m.

tbg · 2018-10-29T15:51:32Z

Saw this in the CPU usage graph in roachtest/restore2TB. Not sure if this is what was talked about before or if it's some replication hiccup, but here we go.

awoods187 · 2018-11-14T02:34:14Z

I can see similar things on master. For example:

The job has been at 99% for at least the last 2 hours:

But it is, ever so glacially slowly, making progress.

tbg · 2018-11-14T05:08:54Z

That looks like a worse problem. Is that cluster still up?

…

On Wed, Nov 14, 2018, 03:34 Andy Woods ***@***.*** wrote: I can see similar things on master. For example: [image: image] <https://user-images.githubusercontent.com/22278911/48456107-cd8f6f80-e78b-11e8-8160-6994c56534f9.png> The job has been at 99% for at least the last 2 hours: [image: image] <https://user-images.githubusercontent.com/22278911/48456121-dbdd8b80-e78b-11e8-80eb-82337b9688f4.png> — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#23509 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AE135O4AFibgh5MdXOn6iFnNBb14VSdHks5uu4EygaJpZM4SffCa> .

tbg · 2018-11-14T13:05:45Z

@awoods187 just sent me a cluster of his that had the same symptom. Not sure it's the root cause of the behavior he's seeing but the cluster had all the signs of #31875 (comment) which is hopefully fixed once that PR lands.

danhhz · 2019-01-25T18:26:47Z

The time-bound iterator workaround (#32909) eliminates at least one class of these tail issues for incremental backup. We're also seeing this behavior in restore, import, and non-incremental backup, so that was clearly not the only cause. Just FYI that it was a partial fix.

mwang1026 · 2020-11-18T18:30:36Z

cc @pbardea thoughts on how relevant this still is with the work you did in 20.2?

pbardea · 2021-05-26T16:21:59Z

There are a few things going on here:

AdminScatter was augmented with a flag that forces the leases of the target range to be scatter (this was added a while ago and I don't know if it was introduced after this bug was reported)
The 20.2 refactor should enable all of the nodes to process the ranges that they're scattered as soon as they're available (ie they're not bottlenecked on other nodes)
Further improvements based on recent restore performance investigation suggested that we could do a better job sizing the chunks that we hand out to worker nodes based on cluster size. backupccl: create more chunks on larger clusters #64067 improves that.

A lot of this issue is also covered in #63925. One thing that hasn't been benchmarked is the performance of restoring a table into a cluster that already holds a significant amount of data. My hypothesis is that the RandomizeLeases flag on the AdminScatter request should help ensure that the entries are probably distributed, but I can run a benchmark to confirm. If we see pretty even AddSSTable traffic across the nodes in that benchmark I think we can close out this issue.

shermanCRL · 2021-06-22T19:11:52Z

We think we’ve made substantial improvements here, will close for now. There is probably more to be done, but not urgently, and can re-open if need be.

maddyblue added the A-disaster-recovery label Mar 6, 2018

maddyblue added this to the 2.1 milestone Mar 6, 2018

danhhz added a commit to danhhz/cockroach that referenced this issue Mar 7, 2018

workloadccl: restore fixture tables in parallel

a40f9e0

This should speed up `./workload fixture load`, especially when we're seeing the long-tail behavior described in cockroachdb#23509. Release note: None

danhhz mentioned this issue Mar 7, 2018

workloadccl: restore fixture tables in parallel #23539

Merged

maddyblue added the C-performance Perf of queries or internals. Solution not expected to change functional behavior. label Apr 26, 2018

petermattis removed this from the 2.1 milestone Oct 5, 2018

awoods187 mentioned this issue Jan 25, 2019

Large amplification of import causes more disk to be used than needed #32088

Closed

kenliu added the T-disaster-recovery label Dec 5, 2020

pbardea self-assigned this Mar 11, 2021

shermanCRL closed this as completed Jun 22, 2021

github-project-automation bot added this to Disaster Recovery Backlog Aug 28, 2024

github-project-automation bot moved this to Done in Disaster Recovery Backlog Aug 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

backupccl: RESTORE slowing at end #23509

backupccl: RESTORE slowing at end #23509

maddyblue commented Mar 6, 2018 •

edited by elinorgarcia

Loading

maddyblue commented Mar 6, 2018

petermattis commented Mar 6, 2018

awoods187 commented Oct 24, 2018

tbg commented Oct 29, 2018

awoods187 commented Nov 14, 2018 •

edited

Loading

tbg commented Nov 14, 2018 via email

tbg commented Nov 14, 2018

danhhz commented Jan 25, 2019

mwang1026 commented Nov 18, 2020

pbardea commented May 26, 2021

shermanCRL commented Jun 22, 2021

backupccl: RESTORE slowing at end #23509

backupccl: RESTORE slowing at end #23509

Comments

maddyblue commented Mar 6, 2018 • edited by elinorgarcia Loading

maddyblue commented Mar 6, 2018

petermattis commented Mar 6, 2018

awoods187 commented Oct 24, 2018

tbg commented Oct 29, 2018

awoods187 commented Nov 14, 2018 • edited Loading

tbg commented Nov 14, 2018 via email

tbg commented Nov 14, 2018

danhhz commented Jan 25, 2019

mwang1026 commented Nov 18, 2020

pbardea commented May 26, 2021

shermanCRL commented Jun 22, 2021

maddyblue commented Mar 6, 2018 •

edited by elinorgarcia

Loading

awoods187 commented Nov 14, 2018 •

edited

Loading