Make raft snapshot commit threshold configurable #4105

preetapan · 2018-05-10T15:25:18Z

This PR adds raft_snapshot_threshold as a new configurable param to pass down to the raft layer. This was previously always set to 8192 and not modifiable by Consul operators. This change should help operators of Consul in larger installations with lots of writes to have better control over how often snapshots are taken.

kyhavlov · 2018-05-11T00:09:42Z

agent/consul/config.go

@@ -448,9 +448,12 @@ func DefaultConfig() *Config {
 	// Disable shutdown on removal
 	conf.RaftConfig.ShutdownOnRemove = false

-	// Check every 5 seconds to see if there are enough new entries for a snapshot
+	// Check every 5 seconds to see if there are enough new entries for a snapshot, can be overridden
 	conf.RaftConfig.SnapshotInterval = 5 * time.Second


I wonder if we should bump this back up again to something like 30-60s just to make things a little nicer in the case where a busy cluster hasn't configured this and gets really frequent snapshots.

Plan was to set this to 30 seconds and the threshold to 16384

banks

Don't want to block this if there is consensus already but some thoughts inline.

banks · 2018-05-11T13:11:10Z

agent/config/config.go

@@ -267,6 +269,7 @@ type Consul struct {
 		ElectionTimeout    *string `json:"election_timeout,omitempty" hcl:"election_timeout" mapstructure:"election_timeout"`
 		HeartbeatTimeout   *string `json:"heartbeat_timeout,omitempty" hcl:"heartbeat_timeout" mapstructure:"heartbeat_timeout"`
 		LeaderLeaseTimeout *string `json:"leader_lease_timeout,omitempty" hcl:"leader_lease_timeout" mapstructure:"leader_lease_timeout"`
+		SnapshotThreshold  *int    `json:"snapshot_threshold,omitempty" hcl:"snapshot_threshold" mapstructure:"snapshot_threshold"`


I'm sure you tested this so there is some other place the interval is passed to the server but just thought I'd mention in case we missed updating this to add Interval too.

Actually, this can be removed. This specific struct is for config values that are controlled/changed via raftMultiplier

banks · 2018-05-11T13:18:28Z

website/source/docs/agent/options.html.md

+
+* <a name="_raft_snapshot_interval"></a><a href="#_raft_snapshot_interval">`-raft-snapshot-interval`</a> - This
+  controls how often servers check if they need to save a snapshot to disk.
+


I wonder if we should actually not document these immediately.

I know Armon was concerned about them being kinda low level and as soon as we document them we are really committed to keeping them working/available. Once we have proper WAL we shouldn't really need to allow user to tune these.

I think if we do keep them though we should be more explicit about when you'd need to change them and what effect they have.

Something like:

-raft-snapshot-threshold - This controls the minimum number of raft commit entries between snapshots that are saved to disk. This is a low-level parameter that should rarely need to be changed. Very busy clusters experiencing excessive disk IO because the servers are constantly snapshotting may increase this to reduce disk IO and increase the chance that multiple servers are not snapshotting at the same time. Increasing this trades off disk IO for disk space since the log will grow much larger and the space in the raft.db file can never be reclaimed. Servers may take longer to recover from crashes or failover if this is increased significantly as more logs will need to be replayed.

-raft-snapshot-interval - This controls how often servers check if they need to save a snapshot to disk. This is a low-level parameter that should rarely need to be changed. Very busy clusters experiencing excessive disk IO because the servers are constantly snapshotting may increase this to reduce disk IO and increase the chance that multiple servers are not snapshotting at the same time. Increasing this trades off disk IO for disk space since the log will grow much larger and the space in the raft.db file can never be reclaimed. Servers may take longer to recover from crashes or failover if this is increased significantly as more logs will need to be replayed

banks

Nice

…ade notes

preetapan added this to the 1.1.0 milestone May 10, 2018

preetapan requested review from banks and kyhavlov May 10, 2018 15:25

kyhavlov reviewed May 11, 2018

View reviewed changes

banks reviewed May 11, 2018

View reviewed changes

banks approved these changes May 11, 2018

View reviewed changes

Preetha Appan added 5 commits May 11, 2018 10:43

Make raft snapshot commit threshold configurable

66f31cd

fix spacing

ad09865

Also make snapshot interval configurable

d721da7

More docs and removed SnapShotInterval from raft timing struct stanza

3ff5fd6

Change default raft threshold config values and add a section to upgr…

ca67094

…ade notes

preetapan force-pushed the f-raft-threshold-config branch from 0fcf701 to ca67094 Compare May 11, 2018 15:46

preetapan merged commit 4c2c4c8 into master May 11, 2018

preetapan deleted the f-raft-threshold-config branch May 11, 2018 15:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make raft snapshot commit threshold configurable #4105

Make raft snapshot commit threshold configurable #4105

preetapan commented May 10, 2018

kyhavlov May 11, 2018

preetapan May 11, 2018

banks left a comment

banks May 11, 2018

preetapan May 11, 2018

banks May 11, 2018

banks left a comment


		* <a name="_raft_snapshot_interval"></a><a href="#_raft_snapshot_interval">`-raft-snapshot-interval`</a> - This
		controls how often servers check if they need to save a snapshot to disk.

Make raft snapshot commit threshold configurable #4105

Make raft snapshot commit threshold configurable #4105

Conversation

preetapan commented May 10, 2018

kyhavlov May 11, 2018

Choose a reason for hiding this comment

preetapan May 11, 2018

Choose a reason for hiding this comment

banks left a comment

Choose a reason for hiding this comment

banks May 11, 2018

Choose a reason for hiding this comment

preetapan May 11, 2018

Choose a reason for hiding this comment

banks May 11, 2018

Choose a reason for hiding this comment

banks left a comment

Choose a reason for hiding this comment