Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARTEMIS-4455 - Improve message redistribution balance for OFF_WITH_RE… #4644

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

AntonRoskvist
Copy link
Contributor

…DISTRIBUTION

This benefits the case where messages arrive on a clustered node without a local consumer but with multiple remote targets.

Out of scope for this PR but perhaps it makes sense to implement initial distribution and redistribution weighted on consumerCount of the remote targets so that brokers with more consumers also receive more messages? This could be a nice addition to ON_DEMAND and OFF_WITH_REDISTRIBUTION load balancing types I think.

@clebertsuconic
Copy link
Contributor

clebertsuconic commented Oct 11, 2023

that's a semantic change, isn't?

OFF_WITH_REDISTRIBUTION should send messages locally , and after a "redistribution-delay" setting messages should then be moved to a different node. The idea is that the local node is always favored.

the idea of off_with_redistribution, is that you always route locally, with the possibility of eventually redistributing....

And you have the redistribution delay to respect before you can actually route.

This changes goes straight on the routing, making OFF_WITH_REDISTRIBUTION to behave more likely ON_DEMAND.

This might work for your usecase, but that semantic would be changed for others who need to always route locally.
A lot of users are trying to keep messages locally. You could eventually have a consumer off, that's coming back in say 5 minutes (after a reboot or something like that)... so I definitely think we shouldn't merge this.

shouldn't you just use ON_DEMAND and you would get what you need here?

You mentioned right here:

"This benefits the case where messages arrive on a clustered node without a local consumer but with multiple remote targets."

And that's the benefit of ON_DEMAND. I'm not sure we should merge this.

@clebertsuconic clebertsuconic marked this pull request as draft October 11, 2023 18:54
@clebertsuconic
Copy link
Contributor

I'm converting this as a draft so we can discuss about it.

@AntonRoskvist
Copy link
Contributor Author

Hi @clebertsuconic

No, there should be no semantic change with regards to initial distribution. Not as far as I've been able to tell at least. Messages still arrive at the local node first, as is expected with OFF_WITH_REDISTRIBUTION.

For this LB-type, the matchBinding()-method only returns true for the local binding which then sets the lastLowPriorityBinding to its position.

During the "secondary level of routing logic" the local binding is picked as the destination of the message.
This works as intended and all messages are sent to the local binding.
Since this is the only outcome of getNextBinding though, it means that the next step:
nextPosition = moveNextPosition(lastLowPriorityBinding, bindingsCount);
will always set nextPosition to the same value (provided the topology does not change)

After that the bindingIndex is set to nextPosition i.e set to the same value after every incoming message.

The bindingIndex in turn, is used by the redistributor (org.apache.activemq.artemis.core.postoffice.impl.BindingsImpl#redistribute()) to set the starting point of it's iteration over the bindings to find a suitable target, meaning that for a continuous stream of incoming messages, the redistributor will pick the same remote Binding as target most of the time, leading to an uneven redistribution of messages.

To be clear though, this is my conclusion after poking at this for some time, I might have missed something for sure.

@clebertsuconic
Copy link
Contributor

this change is making at least one test to fail:

org.apache.activemq.artemis.tests.integration.amqp.connect.AMQPRedistributeClusterTest.testTopicRedistributionAMQP
and testTopicRedistributionCORE (same test basically with different protocols)

@AntonRoskvist
Copy link
Contributor Author

Yes, @clebertsuconic you are right, I missed that one... though looking closer at that test I'm actually getting a bit confused as to what the expected or intended behavior really should be...

with ON_DEMAND semantics:

First, given topic/multicast semantics for a single node:
10 messages are sent to an address with 10 multicast/topic queues results in:
10 messages end up on each queue, grand total of 100 messages.
This aligns with my understanding of how topic/multicast semantics work.

However, given the same scenario on a two node cluster:
10 messages sent to an address with 10 multicast/topic queues distributed over 2 nodes:
The 10 messages gets distributed between the nodes, 5 each.
5 messages on each queue within a node, grand total of 50 messages.

Is that the expected behavior?

...I'm also not 100% sure that the test is working as intended, and neither is OFF_WITH_REDISTRIBUTION, this PR or not. Probably the condition: if ( !Objects.equals(message.getRoutingType(), RoutingType.MULTICAST) ... in BindingImpl#matchBinding should be removed or formulated differently to just rely on redistribution, not initial distribution, regardless of RoutingType... I will do some additional testing on this.

Anyway, In the failing test (org.apache.activemq.artemis.tests.integration.amqp.connect.AMQPRedistributeClusterTest.testTopicRedistributionAMQP), when messages are sent there are no active consumers on any node and as such all messages should end up on just the producers target node (or possibly all messages on all nodes?). This is what happens in the same test when switching over to ON_DEMAND load balancing as well...

@AntonRoskvist
Copy link
Contributor Author

@clebertsuconic
The topic/multicast thing aside, I changed my approach to not mess as much with how things currently work. I hope this change makes sense, otherwise just let me know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants