Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow raft TrailingLogs to be configured. #6186

Merged
merged 3 commits into from
Jul 23, 2019
Merged

Allow raft TrailingLogs to be configured. #6186

merged 3 commits into from
Jul 23, 2019

Conversation

banks
Copy link
Member

@banks banks commented Jul 22, 2019

This fixes pathalogical cases where the write throughput and snapshot size are both so large that more than 10k log entries are written in the time it takes to restore the snapshot from disk. In this case followers that restart can never catch up with leader replication again and enter a loop of constantly downloading a full snapshot and restoring it only to find that snapshot is already out of date and the leader has truncated its logs so a new snapshot is sent etc.

In general if you need to adjust this, you are probably abusing Consul for purposes outside its design envelope and should reconsider your usage to reduce data size and/or write volume.

Notes

This is the minimal change to allow that situation to be recovered from. There are nicer solutions we can also add later with Raft library changes, e.g. dynamically deciding how much to truncate based on current recovery progress but those changes are much more involved to test and reason about as well as backport to existing versions. This provides a low-risk and simple change we can potentially backport to older versions in case users are stuck on those and hitting this situation in production with no other way to resolve it.

Questions

Should we document this config? If we do I think we should warn against it's use like the other raft tunables but this seems especially like a crutch to work around deployments that are using Consul in cases it's not well suited for. On the other hand, if you are in that situation not being able to self-discover that a solution exists also seems kinda silly.

This fixes pathalogical cases where the write thoughput and snapshot size are both so large that more than 10k log entries are written in the time it takes to restore the snapshot from disk. In this case followers that restart can never catch up with leader replication again and enter a loop of constantly downloading a full snapshot and restoring it only to find that snapshot is already out of date and the leader has truncated its logs so a new snapshot is sent etc.

In general if you need to adjust this, you are probably abusing Consul for purposes outside its design envelope and should reconsider your usage to reduce data size and/or write volume.
@banks banks requested a review from a team July 22, 2019 14:20
Copy link
Member

@mkeeler mkeeler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks good. Should we also set the previous 10k limit here:

leader_lease_timeout = "` + raft.LeaderLeaseTimeout.String() + `"

@freddygv
Copy link
Contributor

freddygv commented Jul 22, 2019

This should probably be documented with a warning. Here's a potential one based on your original comment:

This should only be adjusted when followers cannot catch up to the leader due to a large snapshot size and high write throughput. However, consider reducing write throughput or the amount of data stored on Consul first. Consul is likely under a load it was not designed to handle.

It seems unlikely that knowing this setting exists would encourage people to get into the situation where they need it. If they're already in that situation, posting the warning with alternative courses of action may help get them out of it.

By the way, I think @mkeeler mentioned that we should also improve the logging around installing snapshots as well. That way users can diagnose the situation if it does come up.

@banks
Copy link
Member Author

banks commented Jul 22, 2019

@mkeeler i looked but right now we don't default any of the other raft configs - we just conditionally set them only if they are non-zero in the consul config so accept Raft's defaults implicitly. I.e. I did the same as we do for -raft-snapshot-threshold and friends.

@banks
Copy link
Member Author

banks commented Jul 22, 2019

I added docs. I also realised the other raft tunables are documented as CLI flags out of habit but are actually not valid CLI flags so I've moved them, preserving the old anchor names too.

Copy link
Contributor

@freddygv freddygv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just one small comment inline

website/source/docs/agent/options.html.md Outdated Show resolved Hide resolved
Copy link
Contributor

@pearkes pearkes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense as part of 1.5.3 👍

@pearkes pearkes added this to the 1.5.3 milestone Jul 22, 2019
@banks banks merged commit f38da47 into master Jul 23, 2019
@banks banks deleted the trailing-logs branch July 23, 2019 14:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants