Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Queue Groups on leaf clusters not balancing correctly when messages are routed in from hub cluster [v2.10.21] #5972

Closed
roeschter opened this issue Oct 7, 2024 · 16 comments · Fixed by #5982 or #6161
Labels
defect Suspected defect such as a bug or regression triage

Comments

@roeschter
Copy link

Observed behavior

From two subscribers in a queue group connected to two different nodes in a leaf cluster only one will receive messages IFF those messages arrive from a hub cluster to which the leaf cluster is connected. The second subscribers (initially idle) will receive messages once the primary is terminated. This is reversible (restart primary connected to the original node and will take over again).

Expected behavior

All subcribers in the same queue group a leaf cluster are treated equally.

Server and client version

Server 2.10.21
Nats - compiled from main

Host environment

Windows/Mac/Linux - reproduced by customer, Borja and me

Steps to reproduce

To reproduce deterministically, all leaf node connections and nats-cli connections will connect to specific nodes.

  1. Create HUB cluster with nodes HUB1, HUB2, HUB3
  2. Create LEAF cluster with nodes LEAF1, LEAF2, LEAF3
  3. Connect leaf nodes such that LEAF1-->HUB1, LEAF2-->HUB2 LEAF3-->HUB3
  4. Start queue group listeners
  5. nats --server LEAF1 sub --queue=q1 foo
  6. nats --server LEAF2 sub --queue=q1 foo
  7. Publish messages to HUB1
  8. nats --server HUB1 pub foo Hello //Only the subscriber on LEAF2 will receive messages
  9. Publish messages to HUB3
  10. nats --server HUB3 pub foo Hello //Both subscribers receive messages

Tentative explanation: The "prefer local cluster listeners in work queues" logic is not leaf node aware. When the message is published such that both listeners have the same "distance" (routing hops) to the publisher the work queue LB works. When the message is published such that both listeners have the different "distance" to the publisher the work queue LB fails.

@roeschter roeschter added the defect Suspected defect such as a bug or regression label Oct 7, 2024
@wallyqs wallyqs changed the title Queue Groups on leaf clusters not balancing correctly when messages are routed in from hub cluster Queue Groups on leaf clusters not balancing correctly when messages are routed in from hub cluster [v2.10.21] Oct 8, 2024
@linear linear bot added the triage label Oct 9, 2024
@kozlovic
Copy link
Member

kozlovic commented Oct 9, 2024

@roeschter Are you sure of the setup? I tried to write a test based on your description and it passes. I just check that each queue sub receives at least 10% of the messages, but we usually test for close to even distribution. But since you claim that one does not receive any message, the 10% should be enough.
Here is the test, let me know what changes to make to reproduce the issue:

func TestLeafNodeQueueGroupDistribution(t *testing.T) {
	hc := createClusterWithName(t, "HUB", 3)
	defer hc.shutdown()

	// Now have a cluster of leafnodes with each one connecting to corresponding HUB(n) node.
	c1 := `
	server_name: LEAF1
	listen: 127.0.0.1:-1
	cluster { name: ln22, listen: 127.0.0.1:-1 }
	leafnodes { remotes = [{ url: nats-leaf://127.0.0.1:%d }] }
	`
	lconf1 := createConfFile(t, []byte(fmt.Sprintf(c1, hc.opts[0].LeafNode.Port)))
	ln1, lopts1 := RunServerWithConfig(lconf1)
	defer ln1.Shutdown()

	c2 := `
	server_name: LEAF2
	listen: 127.0.0.1:-1
	cluster { name: ln22, listen: 127.0.0.1:-1, routes = [ nats-route://127.0.0.1:%d] }
	leafnodes { remotes = [{ url: nats-leaf://127.0.0.1:%d }] }
	`
	lconf2 := createConfFile(t, []byte(fmt.Sprintf(c2, lopts1.Cluster.Port, hc.opts[1].LeafNode.Port)))
	ln2, _ := RunServerWithConfig(lconf2)
	defer ln2.Shutdown()

	c3 := `
	server_name: LEAF3
	listen: 127.0.0.1:-1
	cluster { name: ln22, listen: 127.0.0.1:-1, routes = [ nats-route://127.0.0.1:%d] }
	leafnodes { remotes = [{ url: nats-leaf://127.0.0.1:%d }] }
	`
	lconf3 := createConfFile(t, []byte(fmt.Sprintf(c3, lopts1.Cluster.Port, hc.opts[2].LeafNode.Port)))
	ln3, _ := RunServerWithConfig(lconf3)
	defer ln3.Shutdown()

	// Check leaf cluster is formed and all connected to the HUB.
	lnServers := []*Server{ln1, ln2, ln3}
	checkClusterFormed(t, lnServers...)
	for _, s := range lnServers {
		checkLeafNodeConnected(t, s)
	}
	// Check each node in the hub has 1 connection from the leaf cluster.
	for i := 0; i < 3; i++ {
		checkLeafNodeConnectedCount(t, hc.servers[i], 1)
	}

	// Create a client and qsub on LEAF1 and LEAF2.
	nc1 := natsConnect(t, ln1.ClientURL())
	defer nc1.Close()
	var qsub1Count atomic.Int32
	natsQueueSub(t, nc1, "foo", "queue1", func(_ *nats.Msg) {
		qsub1Count.Add(1)
	})

	nc2 := natsConnect(t, ln2.ClientURL())
	defer nc2.Close()
	var qsub2Count atomic.Int32
	natsQueueSub(t, nc1, "foo", "queue1", func(_ *nats.Msg) {
		qsub2Count.Add(1)
	})

	sendAndCheck := func(idx int) {
		t.Helper()
		nchub := natsConnect(t, hc.servers[idx].ClientURL())
		defer nchub.Close()
		total := 1000
		for i := 0; i < total; i++ {
			natsPub(t, nchub, "foo", []byte("from hub"))
		}
		checkFor(t, time.Second, 15*time.Millisecond, func() error {
			if trecv := int(qsub1Count.Load() + qsub2Count.Load()); trecv != total {
				return fmt.Errorf("Expected %v messages, got %v", total, trecv)
			}
			return nil
		})
		// Now that we have made sure that all messages were received,
		// check that qsub1 and qsub2 are getting at least some.
		if n := int(qsub1Count.Load()); n <= total/10 {
			t.Fatalf("Expected qsub1 to get some messages, but got %v", n)
		}
		if n := int(qsub2Count.Load()); n <= total/10 {
			t.Fatalf("Expected qsub2 to get some messages, but got %v", n)
		}
		// Reset the counters.
		qsub1Count.Store(0)
		qsub2Count.Store(0)
	}
	// Send from HUB1, HUB2 and HUB3
	for i := 0; i < 3; i++ {
		sendAndCheck(i)
	}
}

@derekcollison
Copy link
Member

@kozlovic do you observer step #8? Meaning if you publish through HUB1 only only L2 receives?

@kozlovic
Copy link
Member

kozlovic commented Oct 9, 2024

do you observer step #8? Meaning if you publish through HUB1 only only L2 receives?

@derekcollison No, that's my point. The test passes, in that all queue subs receive about the same amount of messages, regardless from where I publish (the test publishes 1000 msgs from HUB1, then check and all ok, then from HUB2, etc...).

@roeschter
Copy link
Author

roeschter commented Oct 9, 2024

Tested it again, the effect is real BUT to make it reproducible you need connect the leaf nodes exactly as described to the hub.
And have clients connect to different leaf node. And publish to the one node on the hub that has a different routing path to the clients on the leafs.

The key to the effect is that the messages needs to travel on two different routes with different number of hops from the hub to the leaf. If the connections are made randomly the effect will be random.

So messages CAN travel either

  1. HUB1 --> LEAF1 -> Subscriber1
  • OR
  1. HUB1 --> HUB2 -> LEAF2 -> Subscriber2
    If the possible paths are like above, LB will NOT work. ONLY the second path will be taken.

If the paths are symmetric it will work.

  1. HUB1 --> LEAF1 -> Subscriber1

  2. HUB1 --> LEAF1 -> Subscriber2
    WORKS // Both subscribers on the same leaf node

  3. HUB3 --> HUB1 -> LEAF1 -> Subscriber1

  4. HUB3 --> HUB2 -> LEAF2 -> Subscriber2
    WORKS

@kozlovic
Copy link
Member

kozlovic commented Oct 9, 2024

@roeschter But that's what the test is doing. I make sure that LEAF1 connects to HUB1, LEAF2 to HUB2, etc..and the 2 queue subs are connected to LEAF1 and LEAF2 respectively. The publisher is connected to HUB1, then to HUB2 and HUB3.

Can you share your config files?

@kozlovic
Copy link
Member

kozlovic commented Oct 9, 2024

Oops, just noticed in the test that both queue subs use nc1.. let me fix that and check.

@roeschter
Copy link
Author

roeschter commented Oct 9, 2024

And when sending you must only send to HUB1!

I know its a bit special - but this does happen in real life. If there is only one client sending requests to the workers,a and its by chance connected in this way the effect happens.

@kozlovic
Copy link
Member

kozlovic commented Oct 9, 2024

And when sending you must only send to HUB1!

Understood. But what the test is doing is to send only from HUB1, then check everything. When that is done, repeat the same test from HUB2, etc.. and then HUB3. To make sure that it works in all cases.

@roeschter
Copy link
Author

roeschter commented Oct 9, 2024

And when sending you must only send to HUB1!

Understood. But what the test is doing is to send only from HUB1, then check everything. When that is done, repeat the same test from HUB2, etc.. and then HUB3. To make sure that it works in all cases.

Its should fail for HUB1 and HUB2 and succeed for HUB3

@kozlovic
Copy link
Member

kozlovic commented Oct 9, 2024

Ok, I reproduced it, going to check the code but I think that it was intentional to treat LEAF as clients for the message routing. But again, will check code and discuss with @derekcollison if needed.

@kozlovic
Copy link
Member

kozlovic commented Oct 9, 2024

I may have found the place where we override and end up always sending to the route instead to the leaf. Will see if that was intentional or not, or if that breaks lots of tests. But yes, if I "fix" the issue this test at least passes now.

@derekcollison
Copy link
Member

ok great @kozlovic thank you!

derekcollison added a commit that referenced this issue Oct 10, 2024
…#5982)

While writing the test, I needed to make sure that each server in
the hub has registered interest for 2 queue subscribers from the
same group. I noticed that `Sublist.NumInterest()` (that I was
invoking from `Account.Interest()` was returning 1, even after
I knew that the propagation should have happened. It turns out
that `NumInterest()` was returning the number of queue groups, not
the number of queue subs in all those queue groups.
    
For the leafnode queue balancing issue, the code was favoring
local/routed queue subscriptions, so in the described issue,
the message would always go from HUB1->HUB2->LEAF2->QSub instead
of HUB1->LEAF1->QSub.
    
Since we had another test that was a bit reversed where we had
a HUB and LEAF1<->LEAF2 connecting to HUB and a qsub on both
HUB and LEAF1 and requests originated from LEAF2, and we were
expecting all responses to come from LEAF1 (instead of the
responder on HUB), I went with the following approach:
    
If the message originates from a client that connects to a server
that has a connection from a remote LEAF, then we pick that LEAF the
same as if it was a local client or routed server.
However, if the client connects to a server that has a leaf
connection to another server, then we keep track of the sub
but do not sent to that one if we have local or routed qsubs.
    
This makes the 2 tests pass, solving the new test and maintaining
the behavior for the old test.

Resolves #5972 

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
@roeschter
Copy link
Author

User tested with v2.10.22 and found another condition under which the queue groups are imbalanced.
The original issue has disappeared.

New problem:

We have been testing NATS server version v2.10.22 to evaluate improvements in load balancing for queue groups over leafnode connections. While the queue groups now seem to work more consistently, we’ve observed an issue when two leaf nodes in the same cluster are connected to the same hub server: only one of the leaf nodes is processing the messages.
Issue
When two leaf nodes (within the same cluster) connect to the same hub, messages are processed by only one of the leaf nodes, leaving the other idle.
Steps to Reproduce
To reproduce this issue deterministically, all leaf node and NATS CLI connections are established to specific nodes.

  1. Set up a hub cluster with nodes HUB1, HUB2

  2. Set up a leaf cluster with nodes LEAF1, LEAF2

  3. Connect the leaf nodes as follows:
    2. LEAF1 → HUB1
    3. LEAF2 → HUB1

  4. Start queue group listeners:
    2. nats --context LEAF1 sub --queue=q1 foo
    3. nats --context LEAF2 sub --queue=q1 foo

Publish messages:
configs.zip

Send messages to HUB3
nats --context HUB3 pub foo Hello --count 1000
Expected Behavior: Messages should be distributed between subscribers on LEAF1
Actual Behavior: Only the subscriber on LEAF2 receives messages.

@kozlovic
Copy link
Member

@roeschter Could you open a new issue and clarify the test? It says to send from HUB3 but there is no mention of it in the setup and also it says that messages should be distributed between subscribers on LEAF1 but there is only 1 queue sub on LEAF1 and 1 on LEAF2 according to the description.

@kozlovic
Copy link
Member

@roeschter Also, the configs show 2 sets of LEAF clusters (leafA and leafB), I guess this is not relevant and the issue happens with a single LEAF cluster? (By the way, there are 3 servers in HUB and LEAF, is that required or 2 is enough?)

@roeschter
Copy link
Author

OK, reproduced it: See: #6040
It requires a leaf and hub to have each 3 nodes

kozlovic added a commit that referenced this issue Nov 19, 2024
There were multiple issues, but basically the fact that we would
not store the routed subscriptions with the origin of the LEAF they
came from made the server unable to differentiate those compare to
"local" routed subscriptions, which in some cases (like a server
restart and the resend of subscriptions) could lead to servers
sending incorrectly subscription interest to leaf connections.

We are now storing the subscriptions with an origin with that origin
as part of the key. This allows to differentiate "regular" routed
subs versus the ones on behalf of a leafnode.
An INFO boolean is added `LNOCU` to indicate support for origin
in the `LS-` protocol, which is required to properly handle the
removal. Therefore, if a route does not have `LNOCU`, the server
will behave like an old server, and store with the key that does
not contain the origin, so that it can be removed when getting
an LS- without the origin. Note that in the case of a mix of servers
in the same cluster, some of the issues this PR is trying to fix
will be present (since the server will basically behave like a
server without the fix).

Having a different routed subs for leaf connections allow to revisit
the fix #5982 that was done for issue #5972, which was about
a more fair queue distribution to a cluster of leaf connections.
That fix actually introduced a change in that we always wanted to
favor queue subscriptions of the cluster where the message is produced,
which that fix possibly changed. With this current PR, the server
can now know if a remote queue sub is for a "local" queue sub there
or on behalf of a leaf and therefore will not favor that route compared
to a leaf subscription that it may have directly attached.

Resolves #5972
Resolves #6148

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
kozlovic added a commit that referenced this issue Nov 20, 2024
There were multiple issues, but basically the fact that we would
not store the routed subscriptions with the origin of the LEAF they
came from made the server unable to differentiate those compare to
"local" routed subscriptions, which in some cases (like a server
restart and the resend of subscriptions) could lead to servers
sending incorrectly subscription interest to leaf connections.

We are now storing the subscriptions with an origin with that origin
as part of the key. This allows to differentiate "regular" routed
subs versus the ones on behalf of a leafnode.
An INFO boolean is added `LNOCU` to indicate support for origin
in the `LS-` protocol, which is required to properly handle the
removal. Therefore, if a route does not have `LNOCU`, the server
will behave like an old server, and store with the key that does
not contain the origin, so that it can be removed when getting
an LS- without the origin. Note that in the case of a mix of servers
in the same cluster, some of the issues this PR is trying to fix
will be present (since the server will basically behave like a
server without the fix).

Having a different routed subs for leaf connections allow to revisit
the fix #5982 that was done for issue #5972, which was about
a more fair queue distribution to a cluster of leaf connections.
That fix actually introduced a change in that we always wanted to
favor queue subscriptions of the cluster where the message is produced,
which that fix possibly changed. With this current PR, the server
can now know if a remote queue sub is for a "local" queue sub there
or on behalf of a leaf and therefore will not favor that route compared
to a leaf subscription that it may have directly attached.

Resolves #5972
Resolves #6148

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
kozlovic added a commit that referenced this issue Nov 22, 2024
There were multiple issues, but basically the fact that we would
not store the routed subscriptions with the origin of the LEAF they
came from made the server unable to differentiate those compared to
"local" routed subscriptions, which in some cases (like a server
restart and the resend of subscriptions) could lead to servers
sending incorrectly subscription interest to leaf connections.

We are now storing the subscriptions with a sub type indicator and
the origin (for leaf subscriptions) as part of the key. This allows
to differentiate "regular" routed subs versus the ones on behalf
of a leafnode.
An INFO boolean is added `LNOCU` to indicate support for origin
in the `LS-` protocol, which is required to properly handle the
removal. Therefore, if a route does not have `LNOCU`, the server
will behave like an old server, and store with the key that does
not contain the origin, so that it can be removed when getting
an LS- without the origin. Note that in the case of a mix of servers
in the same cluster, some of the issues this PR is trying to fix
will be present (since the server will basically behave like a
server without the fix).

Having a different routed subs for leaf connections allow to revisit
the fix #5982 that was done for issue #5972, which was about
a more fair queue distribution to a cluster of leaf connections.
That fix actually introduced a change in that we always wanted to
favor queue subscriptions of the cluster where the message is produced,
which that fix possibly changed. With this current PR, the server
can now know if a remote queue sub is for a "local" queue sub there
or on behalf of a leaf and therefore will not favor that route compared
to a leaf subscription that it may have directly attached.

Resolves #5972
Resolves #6148

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
kozlovic added a commit that referenced this issue Nov 22, 2024
There were multiple issues, but basically the fact that we would
not store the routed subscriptions with the origin of the LEAF they
came from made the server unable to differentiate those compared to
"local" routed subscriptions, which in some cases (like a server
restart and the resend of subscriptions) could lead to servers
sending incorrectly subscription interest to leaf connections.

We are now storing the subscriptions with a sub type indicator and
the origin (for leaf subscriptions) as part of the key. This allows
to differentiate "regular" routed subs versus the ones on behalf
of a leafnode.
An INFO boolean is added `LNOCU` to indicate support for origin
in the `LS-` protocol, which is required to properly handle the
removal. Therefore, if a route does not have `LNOCU`, the server
will behave like an old server, and store with the key that does
not contain the origin, so that it can be removed when getting
an LS- without the origin. Note that in the case of a mix of servers
in the same cluster, some of the issues this PR is trying to fix
will be present (since the server will basically behave like a
server without the fix).

Having a different routed subs for leaf connections allow to revisit
the fix #5982 that was done for issue #5972, which was about
a more fair queue distribution to a cluster of leaf connections.
That fix actually introduced a change in that we always wanted to
favor queue subscriptions of the cluster where the message is produced,
which that fix possibly changed. With this current PR, the server
can now know if a remote queue sub is for a "local" queue sub there
or on behalf of a leaf and therefore will not favor that route compared
to a leaf subscription that it may have directly attached.

Resolves #5972
Resolves #6148

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
derekcollison added a commit that referenced this issue Nov 22, 2024
There were multiple issues, but basically the fact that we would not
store the routed subscriptions with the origin of the LEAF they came
from made the server unable to differentiate those compared to "local"
routed subscriptions, which in some cases (like a server restart and the
resend of subscriptions) could lead to servers sending incorrectly
subscription interest to leaf connections.

We are now storing the subscriptions with a sub type indicator and the
origin (for leaf subscriptions) as part of the key. This allows to
differentiate "regular" routed subs versus the ones on behalf of a
leafnode.
An INFO boolean is added `LNOCU` to indicate support for origin in the
`LS-` protocol, which is required to properly handle the removal.
Therefore, if a route does not have `LNOCU`, the server will behave like
an old server, and store with the key that does not contain the origin,
so that it can be removed when getting an LS- without the origin. Note
that in the case of a mix of servers in the same cluster, some of the
issues this PR is trying to fix will be present (since the server will
basically behave like a server without the fix).

Having a different routed subs for leaf connections allow to revisit the
fix #5982 that was done for issue #5972, which was about a more fair
queue distribution to a cluster of leaf connections. That fix actually
introduced a change in that we always wanted to favor queue
subscriptions of the cluster where the message is produced, which that
fix possibly changed. With this current PR, the server can now know if a
remote queue sub is for a "local" queue sub there or on behalf of a leaf
and therefore will not favor that route compared to a leaf subscription
that it may have directly attached.

Resolves #5972
Resolves #6148

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
neilalexander pushed a commit that referenced this issue Nov 22, 2024
There were multiple issues, but basically the fact that we would
not store the routed subscriptions with the origin of the LEAF they
came from made the server unable to differentiate those compared to
"local" routed subscriptions, which in some cases (like a server
restart and the resend of subscriptions) could lead to servers
sending incorrectly subscription interest to leaf connections.

We are now storing the subscriptions with a sub type indicator and
the origin (for leaf subscriptions) as part of the key. This allows
to differentiate "regular" routed subs versus the ones on behalf
of a leafnode.
An INFO boolean is added `LNOCU` to indicate support for origin
in the `LS-` protocol, which is required to properly handle the
removal. Therefore, if a route does not have `LNOCU`, the server
will behave like an old server, and store with the key that does
not contain the origin, so that it can be removed when getting
an LS- without the origin. Note that in the case of a mix of servers
in the same cluster, some of the issues this PR is trying to fix
will be present (since the server will basically behave like a
server without the fix).

Having a different routed subs for leaf connections allow to revisit
the fix #5982 that was done for issue #5972, which was about
a more fair queue distribution to a cluster of leaf connections.
That fix actually introduced a change in that we always wanted to
favor queue subscriptions of the cluster where the message is produced,
which that fix possibly changed. With this current PR, the server
can now know if a remote queue sub is for a "local" queue sub there
or on behalf of a leaf and therefore will not favor that route compared
to a leaf subscription that it may have directly attached.

Resolves #5972
Resolves #6148

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
defect Suspected defect such as a bug or regression triage
Projects
None yet
3 participants