-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Correct ae.commit
on recovery to equal call to applyCommit(index)
#5946
Conversation
server/jetstream_cluster.go
Outdated
@@ -3452,8 +3452,6 @@ func (js *jetStream) processStreamAssignment(sa *streamAssignment) bool { | |||
return false | |||
} | |||
|
|||
var didRemove bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
didRemove
used to be set a long time ago, but now it would always be false
.
To ensure the meta snapshot is written both for creation and removal, return true
at the end instead.
3893551
to
650b155
Compare
We do not need to snapshot as the log with the add will be replayed. Do you mean the whole machine goes away? |
I haven't seen that to be the case. Same machine and same storage, just a non-graceful shutdown so the meta snapshot can't be written before restart. |
Most of the time the snapshot is not written, and the log remaining after any given snapshot is replayed. I was testing something the other day and disabled meta snapshots in the monitorCluster (and setLeader) calls and behavior was ok. Do we have a test that can show what you are seeing? |
Yeah, this PR has a test that simulates a hard kill by copying the storage directories after As well as spinning up a cluster myself locally and doing a real hard kill, results in an empty snapshot being sent and the stream being removed. |
Sending empty snapshot could be a problem. We should protect against that for sure, but not force a snapshot IMO on add stream. |
This might be related to those calls to |
Looked at the replaying, and it does replay it at least. |
Signed-off-by: Maurice van Veen <github@mauricevanveen.com>
90ed0ab
to
191de4a
Compare
ae.commit
on recovery to equal call to applyCommit(index)
Pushed a fix for an off-by-one that fixes this issue as well.
|
The above was an incorrect fix, instead came to another alternative which actually turned out to be @neilalexander's PR #5700. This PR can be closed when Neil's PR is approved/merged 🎉 |
Test was included in #5700 and got merged, closing. |
A stream would be removed from the cluster when hard-killing all servers directly after the stream has been added (or at least before meta was snapshotted).
To reproduce:
Then restart all servers and notice the stream being removed once a new meta leader has been chosen:
$ nats server report jetstream ╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────╮ │ JetStream Summary │ ├─────────┬──────────────┬─────────┬───────────┬──────────┬───────┬────────┬──────┬─────────┬─────────┬─────────┤ │ Server │ Cluster │ Streams │ Consumers │ Messages │ Bytes │ Memory │ File │ API Req │ API Err │ Pending │ ├─────────┼──────────────┼─────────┼───────────┼──────────┼───────┼────────┼──────┼─────────┼─────────┼─────────┤ │ nats-0* │ nats-cluster │ 0 │ 0 │ 0 │ 0 B │ 0 B │ 0 B │ 0 │ 0 │ 0 │ │ nats-1 │ nats-cluster │ 0 │ 0 │ 0 │ 0 B │ 0 B │ 0 B │ 0 │ 0 │ 0 │ │ nats-2 │ nats-cluster │ 0 │ 0 │ 0 │ 0 B │ 0 B │ 0 B │ 0 │ 0 │ 0 │ ├─────────┼──────────────┼─────────┼───────────┼──────────┼───────┼────────┼──────┼─────────┼─────────┼─────────┤ │ │ │ 0 │ 0 │ 0 │ 0 B │ 0 B │ 0 B │ 0 │ 0 │ 0 │ ╰─────────┴──────────────┴─────────┴───────────┴──────────┴───────┴────────┴──────┴─────────┴─────────┴─────────╯
On recovery the only snapshot the server has doesn't contain the stream, as the snapshot was created before the stream was. When a new meta leader is chosen it will create a snapshot and send out to the followers, but because that snapshot is empty the stream will be removed.
This PR ensures that the stream addition is properly replayed on recovery.
Also added a test that reproduces the hard kill scenario, by copying and reverting storage directories to previous state.
Signed-off-by: Maurice van Veen github@mauricevanveen.com