-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
work queue stream losing messages #5612
Comments
I've been able to reproduce this in 2.10.17, as well as using pull consumers |
@neilalexander were you able to reproduce the bug? |
I am taking a look, can not re-create from top of main with a server test re-creating what you are doing. Will run your bash script next to see if that is doing something different. |
Running the script with latest NATS cli and the server from main also succeeds. Will try the 2.10.17 release now. |
ok running against 2.10.17 release also shows no issues.. |
Using your script from above and a 3 node cluster. |
Apologies I can see only 20 msgs sometimes with 2.10.17.. |
On 2.10.18-RC1 seems good. |
@derekcollison thanks for taking a look.
server 1
server 2
server 3
|
ok will continue to test with your script to see. |
@derekcollison It seems that messages are incorrectly removed from this place: nats-server/server/consumer.go Line 5614 in e01679d
With this test and original code, we get |
Looking.. |
I see it now.. Working on fix. |
Observed behavior
In an R3 cluster, when creating a work queue policy stream with explicit acks and a push consumers, it's possible to lose messages. The way to replicate it is to create and delete ephemeral consumers that listen to different subjects (I've attached a script that can quite reliably reproduce it). Based on the docs I'd expect the messages to not get deleted, as there's never an ack from the consumers, hence the work queue stream (with acks) has to retain the messages.
Expected behavior
All messages sent should be present in the stream.
Server and client version
nats-server: v2.10.16, v2.10.17
nats --version: 0.1.1
Host environment
linux, either cloud k8s or a local cluster (kind) with 3 replicas for the servers
Steps to reproduce
The main idea is to send some messages to different subjects in the stream, subscribe to the delivery subjects, then repeatedly create and delete ephemeral consumers that target the delivery subjects. On my system, with a kind cluster in an R3 setup, this script reliably causes message loss:
The text was updated successfully, but these errors were encountered: