Skip to content

Conversation

@dt
Copy link
Contributor

@dt dt commented Mar 31, 2019

Nodes appear to getting swamped with AddSSTable requests when they are highly overlapping and thus very expensive.
In particular, splits are getting stuck waiting for the AddSSTable calls since they were not marked as OK to backpressure, e.g.

storage/replica_command.go:246 [n4,split,s4,r202/2:/{Table/53/2/4…-Max}] initiating a split of this range at key /Table/53/2/4/82724/45 [r306] (1.0 GiB above threshold size 64 MiB)

h/t to @nvanbenschoten for the suggestion.

Release note (performance improvement): allow oversized ranges to split sooner.

Nodes appear to getting swamped with AddSSTable requests when they are highly overlapping and thus very expensive.
In particular, splits are getting stuck waiting for the AddSSTable calls since they were not marked as OK to backpressure.

Release note (performance improvement): allow oversized ranges to split sooner.
@dt dt requested review from a team, bdarnell and nvb March 31, 2019 17:27
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Contributor

@nvb nvb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Have you run any tests with this yet?

Copy link
Contributor

@bdarnell bdarnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should consider making this a flag on the methods in roachpb/api.go. We've tried to centralize things like this instead of ad-hoc set scattered throughout the codebase.

@nvb
Copy link
Contributor

nvb commented Mar 31, 2019

+1 to specifying this using a flag. I'm happy to do that after this PR goes in if you don't want to make the change here.

@dt
Copy link
Contributor Author

dt commented Mar 31, 2019

I'd rather do broader refacotring/cleanup in a follow-up as I intend to backport this so I'd like to keep it minimal.

Without this change, a highly overlapping index creation appeared to lock up ~completely: After a couple hours it didn't really appear to be making progress, with one node busy and the others more or less idle, and OLTP traffic to the table was stuck at 0qps the whole time. Looking at the one node, it was spending >95% of its CPU time in EvalAddSSTable's MVCC stats. I think as the range kept growing, those just kept getting more expensive as the SSTs being added got wider.

With this change, the same index was created in 30min -- it was bad on one node for a bit, but did eventually split and started making progress.

I think we will still want to do more here to reduce impact on forground traffic -- both back pressuring expensive evals and ingestions that could cause rocksdb slowdowns -- but at least with this it seems like the job eventually completes and OLTP traffic mostly continues, so that's progress.

@dt
Copy link
Contributor Author

dt commented Mar 31, 2019

bors r+

@craig
Copy link
Contributor

craig bot commented Mar 31, 2019

Build failed

@dt
Copy link
Contributor Author

dt commented Mar 31, 2019

bors r+

craig bot pushed a commit that referenced this pull request Mar 31, 2019
36363: storage: allow backpressuring AddSSTable for splits r=dt a=dt

Nodes appear to getting swamped with AddSSTable requests when they are highly overlapping and thus very expensive.
In particular, splits are getting stuck waiting for the AddSSTable calls since they were not marked as OK to backpressure, e.g. 
```
storage/replica_command.go:246 [n4,split,s4,r202/2:/{Table/53/2/4…-Max}] initiating a split of this range at key /Table/53/2/4/82724/45 [r306] (1.0 GiB above threshold size 64 MiB)
```
h/t to @nvanbenschoten for the suggestion.

Release note (performance improvement): allow oversized ranges to split sooner.

Co-authored-by: David Taylor <tinystatemachine@gmail.com>
@craig
Copy link
Contributor

craig bot commented Apr 1, 2019

Build succeeded

@craig craig bot merged commit c02c03c into cockroachdb:master Apr 1, 2019
@dt dt deleted the split-backpressure-sst branch April 1, 2019 01:56
nvb added a commit to nvb/cockroach that referenced this pull request Apr 1, 2019
This commit adds a new request flag that was discussed in cockroachdb#36363.
Doing so allows us to remove the `backpressurableReqMethods` set.

Release note: None
craig bot pushed a commit that referenced this pull request Apr 2, 2019
36380: roachpb: add canBackpressure request flag r=nvanbenschoten a=nvanbenschoten

This commit adds a new request flag that was discussed in #36363.
Doing so allows us to remove the `backpressurableReqMethods` set.

Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
ajwerner pushed a commit to ajwerner/cockroach that referenced this pull request Apr 2, 2019
This commit adds a new request flag that was discussed in cockroachdb#36363.
Doing so allows us to remove the `backpressurableReqMethods` set.

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants