Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Number of SQL queries drops repeatedly #31546

Closed
ridwanmsharif opened this issue Oct 17, 2018 · 7 comments
Closed

Number of SQL queries drops repeatedly #31546

ridwanmsharif opened this issue Oct 17, 2018 · 7 comments

Comments

@ridwanmsharif
Copy link

ridwanmsharif commented Oct 17, 2018

Describe the problem
I noticed that the number of SQL queries drop repeatedly when running a simple
workload. It happens reliably and can be reproduced easily. It might be because
if followers are far behind in processing their raft logs, then there’s an awkward delay
between when the leader processes a split and when the followers do, and during that
time the followers won’t be able to acknowledge new commands from the leader,
stalling new writes. cc @a-robinson

Here's a reproduction using the master from 2cbfb51
and on the admin UI I see the following:
image

Please describe the issue you observed, and any steps we can take to reproduce it:

To Reproduce

What did you do? Describe in your own words.

If possible, provide steps to reproduce the behavior:

  1. Set up a 3 node CockroachDB cluster using roachprod (us-central1-b)
  2. Run KV workload using the --sequential flag, with the
    --min-block-bytes set to 1024, --max-block-bytes set to 2048
    and --read-percent set to 0.
  3. The admin UI then shows the behavior. (I also notice that the number of
    splits due to the increased block sizes are roughly equal to the number of drops)

cc @nvanbenschoten @petermattis

@a-robinson
Copy link
Contributor

@nvanbenschoten can you share your patch for the slow raft follower problem with @ridwanmsharif for him to try out?

@nvanbenschoten
Copy link
Member

The patch is #31330 (comment). I suspect that it will fix the issue. We only see this when running with the --sequential flag, right @ridwanmsharif?

@ridwanmsharif
Copy link
Author

ridwanmsharif commented Oct 17, 2018

Thats right @nvanbenschoten and your patch seems to fix it (left is before, right is with the patch):
image

Can you have that patch up in its own PR with some test? or is something blocking that?

@nvanbenschoten
Copy link
Member

Great! Thanks for testing. We can close this in favor of #31330 then.

@a-robinson
Copy link
Contributor

@bdarnell how sure are we that we don't want to include a minimal version of this fix in 2.1.0? This sort of performance on a simple sequential workload is pretty terrible.

@bdarnell
Copy link
Contributor

We've gotta get a fix on master first. Then we can see if there's time to backport to 2.1.0 or if it should be 2.1.1. The issue is also present in 2.0.5 and 2.0.6, right? (I think it was introduced with the large raft log firefighting). So it'll need to be fixed in a patch release for 2.0 as well.

@nvanbenschoten
Copy link
Member

I'm trying to get a PR out for this today, so hopefully it will land on master with enough time to confidently get it into 2.1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants