Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LeafNode: (no)interest propagation issues #6148

Closed
kozlovic opened this issue Nov 19, 2024 · 0 comments · Fixed by #6161
Closed

LeafNode: (no)interest propagation issues #6148

kozlovic opened this issue Nov 19, 2024 · 0 comments · Fixed by #6161
Assignees
Labels
defect Suspected defect such as a bug or regression

Comments

@kozlovic
Copy link
Member

Observed behavior

Probably more cases than described here, but one would be that in a cluster hub/spoke setup, it was possible for interest registered in the leaf cluster to not be fully unregistered when the subscriptions were going away (depending if subscription was removed one by one or if a connection having multiple was closed), and more over, the hub may have incorrectly sent to the leaf cluster an interest on the subject, making publish from the leaf go to the hub where no interest was really present. This would happen if one of the hub server was restarted while subscription was still active, but then the subscriptions were removed.

Expected behavior

Subscription interest properly handled when subscriptions go away, either individually or as a whole when their connection is closed, or when servers are restarted.

Server and client version

Latest

Host environment

N/A

Steps to reproduce

	// B1 <--- route ---> B2
	//  |                 |
	// Leaf              Leaf
	//  |                 |
	// A1 <--- route ---> A2

Have queue subs connecting to A2, restart B1, unsubscribe one of the qsub from A2. When that happens, B1 may have sent a subscription interest to A1 (A1's Leafz would show a subscription interest). Removing the remaining subscriptions would not remove the subscription interest from A1->B1.
Test can be repeated where a single connection has all 3 subs and connection is closed.

@kozlovic kozlovic added the defect Suspected defect such as a bug or regression label Nov 19, 2024
@kozlovic kozlovic self-assigned this Nov 19, 2024
kozlovic added a commit that referenced this issue Nov 19, 2024
There were multiple issues, but basically the fact that we would
not store the routed subscriptions with the origin of the LEAF they
came from made the server unable to differentiate those compare to
"local" routed subscriptions, which in some cases (like a server
restart and the resend of subscriptions) could lead to servers
sending incorrectly subscription interest to leaf connections.

We are now storing the subscriptions with an origin with that origin
as part of the key. This allows to differentiate "regular" routed
subs versus the ones on behalf of a leafnode.
An INFO boolean is added `LNOCU` to indicate support for origin
in the `LS-` protocol, which is required to properly handle the
removal. Therefore, if a route does not have `LNOCU`, the server
will behave like an old server, and store with the key that does
not contain the origin, so that it can be removed when getting
an LS- without the origin. Note that in the case of a mix of servers
in the same cluster, some of the issues this PR is trying to fix
will be present (since the server will basically behave like a
server without the fix).

Having a different routed subs for leaf connections allow to revisit
the fix #5982 that was done for issue #5972, which was about
a more fair queue distribution to a cluster of leaf connections.
That fix actually introduced a change in that we always wanted to
favor queue subscriptions of the cluster where the message is produced,
which that fix possibly changed. With this current PR, the server
can now know if a remote queue sub is for a "local" queue sub there
or on behalf of a leaf and therefore will not favor that route compared
to a leaf subscription that it may have directly attached.

Resolves #5972
Resolves #6148

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
kozlovic added a commit that referenced this issue Nov 20, 2024
There were multiple issues, but basically the fact that we would
not store the routed subscriptions with the origin of the LEAF they
came from made the server unable to differentiate those compare to
"local" routed subscriptions, which in some cases (like a server
restart and the resend of subscriptions) could lead to servers
sending incorrectly subscription interest to leaf connections.

We are now storing the subscriptions with an origin with that origin
as part of the key. This allows to differentiate "regular" routed
subs versus the ones on behalf of a leafnode.
An INFO boolean is added `LNOCU` to indicate support for origin
in the `LS-` protocol, which is required to properly handle the
removal. Therefore, if a route does not have `LNOCU`, the server
will behave like an old server, and store with the key that does
not contain the origin, so that it can be removed when getting
an LS- without the origin. Note that in the case of a mix of servers
in the same cluster, some of the issues this PR is trying to fix
will be present (since the server will basically behave like a
server without the fix).

Having a different routed subs for leaf connections allow to revisit
the fix #5982 that was done for issue #5972, which was about
a more fair queue distribution to a cluster of leaf connections.
That fix actually introduced a change in that we always wanted to
favor queue subscriptions of the cluster where the message is produced,
which that fix possibly changed. With this current PR, the server
can now know if a remote queue sub is for a "local" queue sub there
or on behalf of a leaf and therefore will not favor that route compared
to a leaf subscription that it may have directly attached.

Resolves #5972
Resolves #6148

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
kozlovic added a commit that referenced this issue Nov 22, 2024
There were multiple issues, but basically the fact that we would
not store the routed subscriptions with the origin of the LEAF they
came from made the server unable to differentiate those compared to
"local" routed subscriptions, which in some cases (like a server
restart and the resend of subscriptions) could lead to servers
sending incorrectly subscription interest to leaf connections.

We are now storing the subscriptions with a sub type indicator and
the origin (for leaf subscriptions) as part of the key. This allows
to differentiate "regular" routed subs versus the ones on behalf
of a leafnode.
An INFO boolean is added `LNOCU` to indicate support for origin
in the `LS-` protocol, which is required to properly handle the
removal. Therefore, if a route does not have `LNOCU`, the server
will behave like an old server, and store with the key that does
not contain the origin, so that it can be removed when getting
an LS- without the origin. Note that in the case of a mix of servers
in the same cluster, some of the issues this PR is trying to fix
will be present (since the server will basically behave like a
server without the fix).

Having a different routed subs for leaf connections allow to revisit
the fix #5982 that was done for issue #5972, which was about
a more fair queue distribution to a cluster of leaf connections.
That fix actually introduced a change in that we always wanted to
favor queue subscriptions of the cluster where the message is produced,
which that fix possibly changed. With this current PR, the server
can now know if a remote queue sub is for a "local" queue sub there
or on behalf of a leaf and therefore will not favor that route compared
to a leaf subscription that it may have directly attached.

Resolves #5972
Resolves #6148

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
derekcollison added a commit that referenced this issue Nov 22, 2024
There were multiple issues, but basically the fact that we would not
store the routed subscriptions with the origin of the LEAF they came
from made the server unable to differentiate those compared to "local"
routed subscriptions, which in some cases (like a server restart and the
resend of subscriptions) could lead to servers sending incorrectly
subscription interest to leaf connections.

We are now storing the subscriptions with a sub type indicator and the
origin (for leaf subscriptions) as part of the key. This allows to
differentiate "regular" routed subs versus the ones on behalf of a
leafnode.
An INFO boolean is added `LNOCU` to indicate support for origin in the
`LS-` protocol, which is required to properly handle the removal.
Therefore, if a route does not have `LNOCU`, the server will behave like
an old server, and store with the key that does not contain the origin,
so that it can be removed when getting an LS- without the origin. Note
that in the case of a mix of servers in the same cluster, some of the
issues this PR is trying to fix will be present (since the server will
basically behave like a server without the fix).

Having a different routed subs for leaf connections allow to revisit the
fix #5982 that was done for issue #5972, which was about a more fair
queue distribution to a cluster of leaf connections. That fix actually
introduced a change in that we always wanted to favor queue
subscriptions of the cluster where the message is produced, which that
fix possibly changed. With this current PR, the server can now know if a
remote queue sub is for a "local" queue sub there or on behalf of a leaf
and therefore will not favor that route compared to a leaf subscription
that it may have directly attached.

Resolves #5972
Resolves #6148

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
neilalexander pushed a commit that referenced this issue Nov 22, 2024
There were multiple issues, but basically the fact that we would
not store the routed subscriptions with the origin of the LEAF they
came from made the server unable to differentiate those compared to
"local" routed subscriptions, which in some cases (like a server
restart and the resend of subscriptions) could lead to servers
sending incorrectly subscription interest to leaf connections.

We are now storing the subscriptions with a sub type indicator and
the origin (for leaf subscriptions) as part of the key. This allows
to differentiate "regular" routed subs versus the ones on behalf
of a leafnode.
An INFO boolean is added `LNOCU` to indicate support for origin
in the `LS-` protocol, which is required to properly handle the
removal. Therefore, if a route does not have `LNOCU`, the server
will behave like an old server, and store with the key that does
not contain the origin, so that it can be removed when getting
an LS- without the origin. Note that in the case of a mix of servers
in the same cluster, some of the issues this PR is trying to fix
will be present (since the server will basically behave like a
server without the fix).

Having a different routed subs for leaf connections allow to revisit
the fix #5982 that was done for issue #5972, which was about
a more fair queue distribution to a cluster of leaf connections.
That fix actually introduced a change in that we always wanted to
favor queue subscriptions of the cluster where the message is produced,
which that fix possibly changed. With this current PR, the server
can now know if a remote queue sub is for a "local" queue sub there
or on behalf of a leaf and therefore will not favor that route compared
to a leaf subscription that it may have directly attached.

Resolves #5972
Resolves #6148

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
defect Suspected defect such as a bug or regression
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant