Number of SQL queries drops repeatedly #31546

ridwanmsharif · 2018-10-17T15:43:49Z

Describe the problem
I noticed that the number of SQL queries drop repeatedly when running a simple
workload. It happens reliably and can be reproduced easily. It might be because
if followers are far behind in processing their raft logs, then there’s an awkward delay
between when the leader processes a split and when the followers do, and during that
time the followers won’t be able to acknowledge new commands from the leader,
stalling new writes. cc @a-robinson

Here's a reproduction using the master from 2cbfb51
and on the admin UI I see the following:

Please describe the issue you observed, and any steps we can take to reproduce it:

To Reproduce

What did you do? Describe in your own words.

If possible, provide steps to reproduce the behavior:

Set up a 3 node CockroachDB cluster using roachprod (us-central1-b)
Run KV workload using the --sequential flag, with the
--min-block-bytes set to 1024, --max-block-bytes set to 2048
and --read-percent set to 0.
The admin UI then shows the behavior. (I also notice that the number of
splits due to the increased block sizes are roughly equal to the number of drops)

cc @nvanbenschoten @petermattis

The text was updated successfully, but these errors were encountered:

a-robinson · 2018-10-17T15:59:50Z

@nvanbenschoten can you share your patch for the slow raft follower problem with @ridwanmsharif for him to try out?

nvanbenschoten · 2018-10-17T16:33:42Z

The patch is #31330 (comment). I suspect that it will fix the issue. We only see this when running with the --sequential flag, right @ridwanmsharif?

ridwanmsharif · 2018-10-17T17:58:21Z

Thats right @nvanbenschoten and your patch seems to fix it (left is before, right is with the patch):

Can you have that patch up in its own PR with some test? or is something blocking that?

nvanbenschoten · 2018-10-17T18:35:40Z

Great! Thanks for testing. We can close this in favor of #31330 then.

a-robinson · 2018-10-17T18:39:57Z

@bdarnell how sure are we that we don't want to include a minimal version of this fix in 2.1.0? This sort of performance on a simple sequential workload is pretty terrible.

bdarnell · 2018-10-17T18:47:12Z

We've gotta get a fix on master first. Then we can see if there's time to backport to 2.1.0 or if it should be 2.1.1. The issue is also present in 2.0.5 and 2.0.6, right? (I think it was introduced with the large raft log firefighting). So it'll need to be fixed in a patch release for 2.0 as well.

nvanbenschoten · 2018-10-17T19:00:03Z

I'm trying to get a PR out for this today, so hopefully it will land on master with enough time to confidently get it into 2.1.

nvanbenschoten closed this as completed Oct 17, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Number of SQL queries drops repeatedly #31546

Number of SQL queries drops repeatedly #31546

ridwanmsharif commented Oct 17, 2018 •

edited

Loading

a-robinson commented Oct 17, 2018

nvanbenschoten commented Oct 17, 2018

ridwanmsharif commented Oct 17, 2018 •

edited

Loading

nvanbenschoten commented Oct 17, 2018

a-robinson commented Oct 17, 2018

bdarnell commented Oct 17, 2018

nvanbenschoten commented Oct 17, 2018

Number of SQL queries drops repeatedly #31546

Number of SQL queries drops repeatedly #31546

Comments

ridwanmsharif commented Oct 17, 2018 • edited Loading

a-robinson commented Oct 17, 2018

nvanbenschoten commented Oct 17, 2018

ridwanmsharif commented Oct 17, 2018 • edited Loading

nvanbenschoten commented Oct 17, 2018

a-robinson commented Oct 17, 2018

bdarnell commented Oct 17, 2018

nvanbenschoten commented Oct 17, 2018

ridwanmsharif commented Oct 17, 2018 •

edited

Loading

ridwanmsharif commented Oct 17, 2018 •

edited

Loading