[Jetstream] Stream and consumer went out of sync after rolling restart of NATS servers [v2.10.6, v2.10.7] #4875

jzhn · 2023-12-12T01:11:15Z

Observed behavior

In a Kubernetes statefulset deployment of NATs cluster, I have a simple 3-replica Interest-based stream, that has a single 3-replica consumer.

After an rolling update deployment that upgrades the NATs cluster from v2.10.6 to v2.10.7, the stream and consumer went into unrecoverable bad state, that:

message published to the subject (stream) is accepted but immediately dropped, as if the stream doesn't have any consumer.
the stream sequence number went out of sync between 3 nodes. nats-0, which is the stream leader at the time, got its stream seq numbers reset to 0, while nats-1 and nats-2 kept the previous stream seq numbers (~23K)
- when new message comes in, only the nats-0 sees seq numbers increasing. nats-1 and nats-2 have seq numbers stuck still.
the seq numbers at consumer remained at ~23K.
- when new message comes in, consumer seq numbers aren't moving.

I've done several attempts to fix:

Rolling restart the nats server
- The nats server restart bought the stream seq number back in sync (all servers now dropped to single-digit, no more servers with ~23K). but the consumer seq number were still stuck in ~23K. The messages were still dropped by stream and never delivered to consumer.
Rolling restart my consumer application
- no effect
Delete and recreate consumer
- the consumer seq number are finally reset, and moves as messages comes in. My consumer application finally was able to receive messages.

Expected behavior

Throughout rolling update and version upgrade,

Stream sequence number is in sync between replicas
Consumer does not go out of sync with Stream
Messages published to Stream are delivered to Consumers

Server and client version

Server: upgraded from v2.10.6 to v2.10.7
Client: jnats (java) 2.17.1

Host environment

Kubernetes deployment with official helm chart
3 Replicas
Ephemeral storage (emptyDir)

Steps to reproduce

No response

The text was updated successfully, but these errors were encountered:

derekcollison · 2023-12-12T01:19:42Z

You could try to make sure the leader is correct and scale to R1 and back up to R3.

jzhn · 2023-12-12T01:56:20Z

Yea, that's a good idea to try to bring the stream and consumer back in sync without losing the state. Since in my case the leader (nats-0) had sequence number reset while the other two replicas still have the right sequence number, maybe what could have been done is to nats stream cluster step-down the stream, so that (hopefully) the stream's sequence number is back to normal.

Sorry, as the broken stream was in critical environment I had to mitigate the issue by dropping and recreating the consumer. Can't verify other mitigations.

@derekcollison : are you aware of any issue that could cause the stream sequence number to go out of sync on the first place? I do use ephemeral storage (emptyDir) for my file storage however it is my understanding that a StatefulSet rolling restart shouldn't lose any state as the startupProbe ensures that new node is in sync with the cluster before moving on to the next node.

derekcollison · 2023-12-12T02:46:26Z

Yes moving the leader might have helped.

We would need to look closely at your setup and upgrade procedure to properly triage.

jzhn · 2023-12-12T16:53:33Z

I think the impact of this issue is concerning. Would be great if it can be addressed. I'm happy to provide all the deployment setups for troubleshooting. Though I'm not confident if this issue can be reproduced though, as I feel it's a race condition, something like the leader was elected to a new node before it's in sync with the cluster.

Please let me know, thanks.

jzhn · 2023-12-12T23:39:47Z

Did more testing in a test environment with the same nats setup, found that this issue isn't related to NATs version upgrade. A rolling restart of NATs statefulset could trigger this issue on nats:2.10.5-alpine.

To reproduce this issue:

Keep rolling restart NATs statefulset (k rollout restart statefulset/kubernetes-nats) until at least one of the node started with nats_stream_last_seq reset to 0 for the target stream.
Use nats stream cluster step-down to shuffle the leader of the stream until it lands on the bad node that had nats_stream_last_seq reset.

After above, the stream and its consumer is out of sync, that messages published to the stream would increase stream sequence number on the bad leader node only, and the consumer never get the message.

@derekcollison I did capture the -DV logs during startup, if you are interested to triage why the node comes up and passing startupProbe but have its stream seq number reset to 0. Thanks.

jzhn · 2023-12-13T00:17:28Z

Above graph shows the nats_stream_last_seq of my test stream. In theory three lines representing three replicas should stay closely together. But we can see that sometimes one or two lines suddenly drop to 0, while some other lines remained high. In that case, when the leader is shifted to the nodes with 0 nats_stream_last_seq, issue occur.

Interestingly, I was never able to reproduce the out of sync situation for the nats_consumer_ack_floor_consumer_seq at the consumer of the test stream, and shown above, the, it's always in sync. Looks like the issue is specific to stream.

derekcollison · 2023-12-13T00:32:49Z

We do not recommend ephemeral storage in general, however, if you decide to use it do you make sure that healthz returns ok from a restarted / updated server before moving to then next one?

jzhn · 2023-12-13T00:38:26Z

We are using the latest nats helm chart (1.1.6) which provides this startupProbe by default:

          startupProbe:
            failureThreshold: 90
            httpGet:
              path: /healthz
              port: monitor
              scheme: HTTP
            initialDelaySeconds: 10
            periodSeconds: 10
            successThreshold: 1
            timeoutSeconds: 5

So GET /healthz must have passed at some point before Kubernetes takes down another node. I'll try to increase the successThreshold to 3 and see if that helps with the situation. But given that when a node gets the stream seq number reset, it never self-recovers, I doubt that increasing successThreshold would help.

derekcollison · 2023-12-13T01:16:37Z

It could, since healthz could possibly return ok before it knows certain aspects are behind. Keep us posted.

jzhn · 2023-12-13T18:54:06Z

successThreshold of startupProbe can't be set to anything other than 1. Instead I added a 30s preStop sleep before NATs enters lame duck mode so that there's 30s gap between a new pod passing GET /healthz and the next pod entering LDM.

        lifecycle:
          preStop:
            exec:
              command:
              - /bin/sh
              - -c
              - sleep 30 && nats-server -sl=ldm=/var/run/nats/nats.pid

Unfortunately, issue is still reproducible. What I noticed is that kubernetes-nats-0, which is always the last node to be restarted during a StatefulSet rolling restart, tend to have its stream sequence number reset to 0, which is interesting, because there's less race condition at startup time of nats-0, that there won't be any other nodes taken down after nats-0 is started.

Above shows that kubernetes-nats-0 had its nats_stream_last_seq reset to 0 after a rolling restart, and once it becomes the leader of stream, existing stream consumer won't get messages due to seq number mismatch.

Also worth mentioning that, if a node with bad stream seq number didn't become the leader of stream, once messages are published to the stream, the node would "catch up" again, with seq number suddenly jump back in sync. I wonder if kubernetes-nats-0 tend to be the one with seq number reset back to 0 for extended period is because that no other node restarts happen after it, so a "catch up" is never triggered.

yoadey · 2023-12-19T10:15:32Z

This sounds very similar to the problem we have: #4351

jzhn · 2023-12-19T18:34:46Z

Indeed. and nice theory about the publishNotReadyAddresses. In my setup although we are using the latest helm chart, but some clients are connecting through headless service, so if an early client connection to NATs could cause issue with Jetstream initial synchronization, my setup would suffer the same.

@yoadey : did you test your theory about publishNotReadyAddresses after August? I'll test on my side to switch to regular SVC too and keep you updated. Thanks.

jzhn · 2023-12-19T22:18:58Z

Just moved all nats clients to the ClusterIP SVC (well except NATs itself which uses headless SVC for discovery as expected). Unfortunately issue is still reproducible that after a rolling restart one of the node (kubernetes-nats-0 again) had its stream sequence numbers reset to zero again.

derekcollison · 2023-12-19T22:33:33Z

Maybe @wallyqs could give his opinions.

yoadey · 2023-12-19T23:01:49Z

Hi @jzhn, I've tested it and it definitely wasn't the headless service, have tested already with the latest Helm chart and latest version and still can reproduce the issue.
For us we had it nearly every week during a scheduled maintenance from our provider, where we wrote a quite some data to kv stores. Currently still not resolved

jzhn · 2023-12-20T00:39:50Z

@derekcollison @wallyqs : here are the -DV log of a pod that had its stream seq num reset after restart. I grepped the log by the RAFT group id of the stream, S-R3F-NDTQJ6CJ.

I'm not sure if below looks interesting to you:

2023-12-12T15:22:01-08:00	[33] 2023/12/12 23:22:01.578821 [DBG] RAFT [0EXN8Tyd - S-R3F-NDTQJ6CJ] Catchup may be stalled, will request again
2023-12-12T15:22:01-08:00	[33] 2023/12/12 23:22:01.580227 [DBG] RAFT [0EXN8Tyd - S-R3F-NDTQJ6CJ] Canceling catchup subscription since we are now up to date

Wonder if stall detection would be a source of race condition.

Logs

(stream name is redacted)

2023-12-12T15:21:37-08:00	[33] 2023/12/12 23:21:37.492552 [TRC] 10.42.67.181:6222 - rid:8 - <<- [RS+ $NRG.AE.S-R3F-NDTQJ6CJ]
2023-12-12T15:21:37-08:00	[33] 2023/12/12 23:21:37.492901 [TRC] 10.42.67.181:6222 - rid:8 - <<- [RS+ $NRG.V.S-R3F-NDTQJ6CJ]
2023-12-12T15:21:37-08:00	[33] 2023/12/12 23:21:37.495082 [TRC] 10.42.239.205:6222 - rid:9 - <<- [RS+ $NRG.AE.S-R3F-NDTQJ6CJ]
2023-12-12T15:21:37-08:00	[33] 2023/12/12 23:21:37.495238 [TRC] 10.42.239.205:6222 - rid:9 - <<- [RS+ $NRG.V.S-R3F-NDTQJ6CJ]
2023-12-12T15:21:37-08:00	[33] 2023/12/12 23:21:37.495365 [TRC] 10.42.239.205:6222 - rid:9 - <<- [RS+ $NRG.P.S-R3F-NDTQJ6CJ]
2023-12-12T15:21:37-08:00	[33] 2023/12/12 23:21:37.495920 [TRC] 10.42.239.205:6222 - rid:9 - <<- [RS+ $NRG.RP.S-R3F-NDTQJ6CJ]
2023-12-12T15:21:37-08:00	[33] 2023/12/12 23:21:37.709548 [DBG] JetStream cluster creating raft group:&{Name:S-R3F-NDTQJ6CJ Peers:[uG7ZhloB fb4CZJ37 0EXN8Tyd] Storage:File Cluster:kubernetes-nats Preferred: node:<nil>}
2023-12-12T15:21:37-08:00	[33] 2023/12/12 23:21:37.710134 [TRC] 10.42.239.205:6222 - rid:9 - ->> [RS+ $NRG.V.S-R3F-NDTQJ6CJ]
2023-12-12T15:21:37-08:00	[33] 2023/12/12 23:21:37.710143 [TRC] 10.42.67.181:6222 - rid:8 - ->> [RS+ $NRG.V.S-R3F-NDTQJ6CJ]
2023-12-12T15:21:37-08:00	[33] 2023/12/12 23:21:37.710176 [TRC] 10.42.67.181:6222 - rid:8 - ->> [RS+ $NRG.AE.S-R3F-NDTQJ6CJ]
2023-12-12T15:21:37-08:00	[33] 2023/12/12 23:21:37.710179 [TRC] 10.42.239.205:6222 - rid:9 - ->> [RS+ $NRG.AE.S-R3F-NDTQJ6CJ]
2023-12-12T15:21:37-08:00	[33] 2023/12/12 23:21:37.710183 [DBG] RAFT [0EXN8Tyd - S-R3F-NDTQJ6CJ] Started
2023-12-12T15:21:37-08:00	[33] 2023/12/12 23:21:37.710532 [DBG] RAFT [0EXN8Tyd - S-R3F-NDTQJ6CJ] Update peers from leader to map[0EXN8Tyd:0xc00002a360 fb4CZJ37:0xc00002a390 uG7ZhloB:0xc00002a378]
2023-12-12T15:21:37-08:00	[33] 2023/12/12 23:21:37.710728 [DBG] Starting stream monitor for '$G > <redacted-stream-name>' [S-R3F-NDTQJ6CJ]
2023-12-12T15:21:38-08:00	[33] 2023/12/12 23:21:38.026016 [TRC] 10.42.239.205:6222 - rid:9 - <<- [RMSG $NRG.AE.S-R3F-NDTQJ6CJ $NRG.R.hJPBmvdW 42]
2023-12-12T15:21:38-08:00	[33] 2023/12/12 23:21:38.026060 [DBG] RAFT [0EXN8Tyd - S-R3F-NDTQJ6CJ] AppendEntry updating leader to "uG7ZhloB"
2023-12-12T15:21:38-08:00	[33] 2023/12/12 23:21:38.026136 [DBG] RAFT [0EXN8Tyd - S-R3F-NDTQJ6CJ] AppendEntry did not match 467 12385 with 467 0
2023-12-12T15:21:38-08:00	[33] 2023/12/12 23:21:38.028310 [TRC] 10.42.239.205:6222 - rid:9 - <<- [RS- $NRG.P.S-R3F-NDTQJ6CJ]
2023-12-12T15:21:38-08:00	[33] 2023/12/12 23:21:38.028770 [TRC] 10.42.239.205:6222 - rid:9 - <<- [RS- $NRG.RP.S-R3F-NDTQJ6CJ]
2023-12-12T15:21:40-08:00	[33] 2023/12/12 23:21:40.027344 [DBG] RAFT [0EXN8Tyd - S-R3F-NDTQJ6CJ] Not current, no commits
2023-12-12T15:21:40-08:00	[33] 2023/12/12 23:21:40.784835 [DBG] RAFT [0EXN8Tyd - S-R3F-NDTQJ6CJ] Not current, no commits
2023-12-12T15:21:42-08:00	[33] 2023/12/12 23:21:42.027107 [DBG] RAFT [0EXN8Tyd - S-R3F-NDTQJ6CJ] Not current, no commits
2023-12-12T15:21:44-08:00	[33] 2023/12/12 23:21:44.026681 [DBG] RAFT [0EXN8Tyd - S-R3F-NDTQJ6CJ] Not current, no commits
2023-12-12T15:21:44-08:00	[33] 2023/12/12 23:21:44.383681 [TRC] 10.42.239.205:6222 - rid:9 - <<- [RMSG $NRG.V.S-R3F-NDTQJ6CJ $NRG.R.C3C0J0NW 32]
2023-12-12T15:21:44-08:00	[33] 2023/12/12 23:21:44.383832 [DBG] RAFT [0EXN8Tyd - S-R3F-NDTQJ6CJ] Received a voteRequest &{term:1 lastTerm:0 lastIndex:0 candidate:uG7ZhloB reply:$NRG.R.C3C0J0NW}
2023-12-12T15:21:44-08:00	[33] 2023/12/12 23:21:44.383964 [DBG] RAFT [0EXN8Tyd - S-R3F-NDTQJ6CJ] Sending a voteResponse &{term:467 peer:0EXN8Tyd granted:false} -> "$NRG.R.C3C0J0NW"
2023-12-12T15:21:46-08:00	[33] 2023/12/12 23:21:46.027364 [DBG] RAFT [0EXN8Tyd - S-R3F-NDTQJ6CJ] Not current, no commits
2023-12-12T15:21:48-08:00	[33] 2023/12/12 23:21:48.027322 [DBG] RAFT [0EXN8Tyd - S-R3F-NDTQJ6CJ] Not current, no commits
2023-12-12T15:21:49-08:00	[33] 2023/12/12 23:21:49.252066 [DBG] RAFT [0EXN8Tyd - S-R3F-NDTQJ6CJ] Not switching to candidate, catching up
2023-12-12T15:21:49-08:00	[33] 2023/12/12 23:21:49.252088 [DBG] RAFT [0EXN8Tyd - S-R3F-NDTQJ6CJ] Canceling catchup subscription since we are now up to date
2023-12-12T15:21:49-08:00	[33] 2023/12/12 23:21:49.753915 [TRC] 10.42.67.181:6222 - rid:8 - <<- [RMSG $NRG.V.S-R3F-NDTQJ6CJ $NRG.R.wk9h8g7o 32]
2023-12-12T15:21:49-08:00	[33] 2023/12/12 23:21:49.754010 [DBG] RAFT [0EXN8Tyd - S-R3F-NDTQJ6CJ] Received a voteRequest &{term:468 lastTerm:467 lastIndex:12385 candidate:fb4CZJ37 reply:$NRG.R.wk9h8g7o}
2023-12-12T15:21:49-08:00	[33] 2023/12/12 23:21:49.754053 [DBG] RAFT [0EXN8Tyd - S-R3F-NDTQJ6CJ] Sending a voteResponse &{term:467 peer:0EXN8Tyd granted:true} -> "$NRG.R.wk9h8g7o"
2023-12-12T15:21:49-08:00	[33] 2023/12/12 23:21:49.755335 [TRC] 10.42.67.181:6222 - rid:8 - <<- [RS+ $NRG.P.S-R3F-NDTQJ6CJ]
2023-12-12T15:21:49-08:00	[33] 2023/12/12 23:21:49.755381 [TRC] 10.42.67.181:6222 - rid:8 - <<- [RS+ $NRG.RP.S-R3F-NDTQJ6CJ]
2023-12-12T15:21:49-08:00	[33] 2023/12/12 23:21:49.755391 [TRC] 10.42.67.181:6222 - rid:8 - <<- [RMSG $NRG.AE.S-R3F-NDTQJ6CJ $NRG.R.ctuFCldN 81]
2023-12-12T15:21:49-08:00	[33] 2023/12/12 23:21:49.755538 [DBG] RAFT [0EXN8Tyd - S-R3F-NDTQJ6CJ] AppendEntry updating leader to "fb4CZJ37"
2023-12-12T15:21:49-08:00	[33] 2023/12/12 23:21:49.755564 [DBG] RAFT [0EXN8Tyd - S-R3F-NDTQJ6CJ] AppendEntry did not match 467 12385 with 467 0
2023-12-12T15:21:49-08:00	[33] 2023/12/12 23:21:49.756892 [TRC] 10.42.67.181:6222 - rid:8 - <<- [RS- $NRG.P.S-R3F-NDTQJ6CJ]
2023-12-12T15:21:49-08:00	[33] 2023/12/12 23:21:49.756924 [TRC] 10.42.67.181:6222 - rid:8 - <<- [RS- $NRG.RP.S-R3F-NDTQJ6CJ]
2023-12-12T15:21:50-08:00	[33] 2023/12/12 23:21:50.026519 [DBG] RAFT [0EXN8Tyd - S-R3F-NDTQJ6CJ] Not current, no commits
2023-12-12T15:21:50-08:00	[33] 2023/12/12 23:21:50.785365 [DBG] RAFT [0EXN8Tyd - S-R3F-NDTQJ6CJ] Not current, no commits
2023-12-12T15:21:52-08:00	[33] 2023/12/12 23:21:52.027399 [DBG] RAFT [0EXN8Tyd - S-R3F-NDTQJ6CJ] Not current, no commits
2023-12-12T15:21:54-08:00	[33] 2023/12/12 23:21:54.026962 [DBG] RAFT [0EXN8Tyd - S-R3F-NDTQJ6CJ] Not current, no commits
2023-12-12T15:21:55-08:00	[33] 2023/12/12 23:21:55.014392 [TRC] 10.42.67.181:6222 - rid:8 - <<- [RMSG $NRG.V.S-R3F-NDTQJ6CJ $NRG.R.wk9h8g7o 32]
2023-12-12T15:21:55-08:00	[33] 2023/12/12 23:21:55.014477 [DBG] RAFT [0EXN8Tyd - S-R3F-NDTQJ6CJ] Received a voteRequest &{term:1 lastTerm:0 lastIndex:0 candidate:fb4CZJ37 reply:$NRG.R.wk9h8g7o}
2023-12-12T15:21:55-08:00	[33] 2023/12/12 23:21:55.014523 [DBG] RAFT [0EXN8Tyd - S-R3F-NDTQJ6CJ] Sending a voteResponse &{term:467 peer:0EXN8Tyd granted:false} -> "$NRG.R.wk9h8g7o"
2023-12-12T15:21:56-08:00	[33] 2023/12/12 23:21:56.026551 [DBG] RAFT [0EXN8Tyd - S-R3F-NDTQJ6CJ] Not current, no commits
2023-12-12T15:21:58-08:00	[33] 2023/12/12 23:21:58.026751 [DBG] RAFT [0EXN8Tyd - S-R3F-NDTQJ6CJ] Not current, no commits
2023-12-12T15:22:00-08:00	[33] 2023/12/12 23:22:00.026955 [DBG] RAFT [0EXN8Tyd - S-R3F-NDTQJ6CJ] Not current, no commits
2023-12-12T15:22:00-08:00	[33] 2023/12/12 23:22:00.784744 [DBG] RAFT [0EXN8Tyd - S-R3F-NDTQJ6CJ] Not current, no commits
2023-12-12T15:22:01-08:00	[33] 2023/12/12 23:22:01.577015 [TRC] 10.42.67.181:6222 - rid:8 - <<- [RMSG $NRG.V.S-R3F-NDTQJ6CJ $NRG.R.wk9h8g7o 32]
2023-12-12T15:22:01-08:00	[33] 2023/12/12 23:22:01.577110 [DBG] RAFT [0EXN8Tyd - S-R3F-NDTQJ6CJ] Received a voteRequest &{term:468 lastTerm:0 lastIndex:0 candidate:fb4CZJ37 reply:$NRG.R.wk9h8g7o}
2023-12-12T15:22:01-08:00	[33] 2023/12/12 23:22:01.577129 [DBG] RAFT [0EXN8Tyd - S-R3F-NDTQJ6CJ] Sending a voteResponse &{term:467 peer:0EXN8Tyd granted:false} -> "$NRG.R.wk9h8g7o"
2023-12-12T15:22:01-08:00	[33] 2023/12/12 23:22:01.578054 [TRC] 10.42.67.181:6222 - rid:8 - <<- [RS+ $NRG.P.S-R3F-NDTQJ6CJ]
2023-12-12T15:22:01-08:00	[33] 2023/12/12 23:22:01.578109 [TRC] 10.42.67.181:6222 - rid:8 - <<- [RS+ $NRG.RP.S-R3F-NDTQJ6CJ]
2023-12-12T15:22:01-08:00	[33] 2023/12/12 23:22:01.578719 [TRC] 10.42.67.181:6222 - rid:8 - <<- [RMSG $NRG.AE.S-R3F-NDTQJ6CJ $NRG.R.ctuFCldN 81]
2023-12-12T15:22:01-08:00	[33] 2023/12/12 23:22:01.578821 [DBG] RAFT [0EXN8Tyd - S-R3F-NDTQJ6CJ] Catchup may be stalled, will request again
2023-12-12T15:22:01-08:00	[33] 2023/12/12 23:22:01.580227 [DBG] RAFT [0EXN8Tyd - S-R3F-NDTQJ6CJ] Canceling catchup subscription since we are now up to date
2023-12-12T15:22:01-08:00	[33] 2023/12/12 23:22:01.581308 [TRC] 10.42.67.181:6222 - rid:8 - <<- [RMSG $NRG.AE.S-R3F-NDTQJ6CJ $NRG.R.ctuFCldN 42]
2023-12-12T15:22:01-08:00	[33] 2023/12/12 23:22:01.581383 [DBG] RAFT [0EXN8Tyd - S-R3F-NDTQJ6CJ] Update peers from leader to map[0EXN8Tyd:0xc00002a360 fb4CZJ37:0xc00002a390 uG7ZhloB:0xc00002a378]
2023-12-12T15:22:02-08:00	[33] 2023/12/12 23:22:02.578731 [TRC] 10.42.67.181:6222 - rid:8 - <<- [RMSG $NRG.AE.S-R3F-NDTQJ6CJ $NRG.R.ctuFCldN 42]
2023-12-12T15:22:03-08:00	[33] 2023/12/12 23:22:03.579549 [TRC] 10.42.67.181:6222 - rid:8 - <<- [RMSG $NRG.AE.S-R3F-NDTQJ6CJ $NRG.R.ctuFCldN 42]
2023-12-12T15:22:04-08:00	[33] 2023/12/12 23:22:04.578977 [TRC] 10.42.67.181:6222 - rid:8 - <<- [RMSG $NRG.AE.S-R3F-NDTQJ6CJ $NRG.R.ctuFCldN 42]
2023-12-12T15:22:05-08:00	[33] 2023/12/12 23:22:05.579514 [TRC] 10.42.67.181:6222 - rid:8 - <<- [RMSG $NRG.AE.S-R3F-NDTQJ6CJ $NRG.R.ctuFCldN 42]
2023-12-12T15:22:06-08:00	[33] 2023/12/12 23:22:06.579196 [TRC] 10.42.67.181:6222 - rid:8 - <<- [RMSG $NRG.AE.S-R3F-NDTQJ6CJ $NRG.R.ctuFCldN 42]
2023-12-12T15:22:07-08:00	[33] 2023/12/12 23:22:07.578980 [TRC] 10.42.67.181:6222 - rid:8 - <<- [RMSG $NRG.AE.S-R3F-NDTQJ6CJ $NRG.R.ctuFCldN 42]
2023-12-12T15:22:08-08:00	[33] 2023/12/12 23:22:08.579710 [TRC] 10.42.67.181:6222 - rid:8 - <<- [RMSG $NRG.AE.S-R3F-NDTQJ6CJ $NRG.R.ctuFCldN 42]
2023-12-12T15:22:09-08:00	[33] 2023/12/12 23:22:09.578630 [TRC] 10.42.67.181:6222 - rid:8 - <<- [RMSG $NRG.AE.S-R3F-NDTQJ6CJ $NRG.R.ctuFCldN 42]
2023-12-12T15:22:10-08:00	[33] 2023/12/12 23:22:10.578620 [TRC] 10.42.67.181:6222 - rid:8 - <<- [RMSG $NRG.AE.S-R3F-NDTQJ6CJ $NRG.R.ctuFCldN 42]
2023-12-12T15:22:11-08:00	[33] 2023/12/12 23:22:11.578563 [TRC] 10.42.67.181:6222 - rid:8 - <<- [RMSG $NRG.AE.S-R3F-NDTQJ6CJ $NRG.R.ctuFCldN 42]
2023-12-12T15:22:12-08:00	[33] 2023/12/12 23:22:12.579532 [TRC] 10.42.67.181:6222 - rid:8 - <<- [RMSG $NRG.AE.S-R3F-NDTQJ6CJ $NRG.R.ctuFCldN 42]
2023-12-12T15:22:13-08:00	[33] 2023/12/12 23:22:13.579147 [TRC] 10.42.67.181:6222 - rid:8 - <<- [RMSG $NRG.AE.S-R3F-NDTQJ6CJ $NRG.R.ctuFCldN 42]

derekcollison · 2024-02-15T13:09:00Z

@wallyqs any updates from your side here?

jzhn added the defect Suspected defect such as a bug or regression label Dec 12, 2023

jzhn changed the title ~~[Jetstream] Stream and consumer went out of sync after v2.10.7 upgrade~~ [Jetstream] Stream and consumer went out of sync after rolling restart of NATs servers Dec 13, 2023

wallyqs changed the title ~~[Jetstream] Stream and consumer went out of sync after rolling restart of NATs servers~~ [Jetstream] Stream and consumer went out of sync after rolling restart of NATS servers [v2.10.6, v2.10.7] Sep 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Jetstream] Stream and consumer went out of sync after rolling restart of NATS servers [v2.10.6, v2.10.7] #4875

[Jetstream] Stream and consumer went out of sync after rolling restart of NATS servers [v2.10.6, v2.10.7] #4875

jzhn commented Dec 12, 2023

derekcollison commented Dec 12, 2023

jzhn commented Dec 12, 2023 •

edited

Loading

derekcollison commented Dec 12, 2023

jzhn commented Dec 12, 2023

jzhn commented Dec 12, 2023

jzhn commented Dec 13, 2023 •

edited

Loading

derekcollison commented Dec 13, 2023

jzhn commented Dec 13, 2023

derekcollison commented Dec 13, 2023

jzhn commented Dec 13, 2023 •

edited

Loading

yoadey commented Dec 19, 2023

jzhn commented Dec 19, 2023

jzhn commented Dec 19, 2023

derekcollison commented Dec 19, 2023

yoadey commented Dec 19, 2023

jzhn commented Dec 20, 2023

derekcollison commented Feb 15, 2024

[Jetstream] Stream and consumer went out of sync after rolling restart of NATS servers [v2.10.6, v2.10.7] #4875

[Jetstream] Stream and consumer went out of sync after rolling restart of NATS servers [v2.10.6, v2.10.7] #4875

Comments

jzhn commented Dec 12, 2023

Observed behavior

Expected behavior

Server and client version

Host environment

Steps to reproduce

derekcollison commented Dec 12, 2023

jzhn commented Dec 12, 2023 • edited Loading

derekcollison commented Dec 12, 2023

jzhn commented Dec 12, 2023

jzhn commented Dec 12, 2023

jzhn commented Dec 13, 2023 • edited Loading

derekcollison commented Dec 13, 2023

jzhn commented Dec 13, 2023

derekcollison commented Dec 13, 2023

jzhn commented Dec 13, 2023 • edited Loading

yoadey commented Dec 19, 2023

jzhn commented Dec 19, 2023

jzhn commented Dec 19, 2023

derekcollison commented Dec 19, 2023

yoadey commented Dec 19, 2023

jzhn commented Dec 20, 2023

Logs

derekcollison commented Feb 15, 2024

jzhn commented Dec 12, 2023 •

edited

Loading

jzhn commented Dec 13, 2023 •

edited

Loading

jzhn commented Dec 13, 2023 •

edited

Loading