Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make the outbox batch size configuable. #1209

Open
JakkeFiejest opened this issue May 19, 2023 · 6 comments
Open

make the outbox batch size configuable. #1209

JakkeFiejest opened this issue May 19, 2023 · 6 comments

Comments

@JakkeFiejest
Copy link

Describe the suggested improvement

At the moment the batch size when the outbox is cleaned up, is set to 10000.
Changing this value is not explosed in the public configuration.

This however causes issues when using Azure SQL-servers in a DTU model.
These DTU models are a super nice fit in for altering commands and we can easiliy grow the application and database (more load = more dtu's and more logical message processors)
It also allows for a good implementation to have one datasource per service.

But all those benifits gow out the window as soon as the outbox starts cleaning up and needs to clean more then 10000 messgaes in one cycle.
This is because cleaning that amount of messages takes up all DTU's when creating the log to rollbak the query.
And it can have bad results causing even a complete crash of the messaging pipeline when we have to much consecutive fails.

At the moment we only have two workarrounds: massivly overscaling the database or disabling the cleanup and moving it to a dedicated process/sql job. which are both lesser options.

It would be better if the batch size could be exposed via the endpoint configuration, escpecially because the value can be set via the internal constructor.
This whay the outbox can be also configured/tweaked with the expected load of the service.

Additional Context

No response

@DavidBoike
Copy link
Member

Thanks @JakkeFiejest, this is a good idea. I actually peeked at the code and found that the query is already parameterized, we just have not exposed an API for setting that value.

I'm not sure when we'll get around to doing it, as it would probably need to be grouped with other improvements to persistence stuff.

As a workaround, I actually would encourage you to disable Outbox cleanup entirely within the endpoint and implement the cleanup as a SQL Agent task to run on a schedule. The query is pretty simple and aside from being tweakable to the batch size that works for you, it also has the advantage of having one canonical place to run. When an endpoint is scaled out, the individual endpoint instances don't have any way to coordinate and so they end up competing to clean the outbox, which is arguably even worse for the DTU model.

@JakkeFiejest
Copy link
Author

Will this be tackled soon? small side remark, on azure database with a dtu model, you cannot schedule tasks (we found a way around it with a cron-job)

@ttran-ipkg
Copy link

ttran-ipkg commented Sep 14, 2023

For reference, the batch size for the outbox cleanup was changed from 10_000 to 4_000 (along with other lock related improvements) in:

@ramonsmits
Copy link
Member

FYI: https://github.com/Particular/NServiceBus.Persistence.Sql/releases/tag/7.0.3 was released. Does that bring any benefits?

Using larger values will be more likely to result in deadlocks. Even lower values would result in each cleanup interval to even require more sequential cleanup queries.

@JakkeFiejest Even if we expose the batchsize making it smaller will only increase the number of queries for each cleanup interval. Could you please clarify to what value you would want to set it and what behavior requirements you would have when using that value?

@JakkeFiejest
Copy link
Author

JakkeFiejest commented Nov 17, 2023 via email

@ramonsmits
Copy link
Member

So basically you want to set your delete size to match what the dtu's allow.

@JakkeFiejest So you would want to increase the size? The problem with increasing is that this likely will result in lock escalation.

You likely also suggest to then not only increase the batch size but also the cleanup interval duration so that is runs less?

Do you have example size for different DTU's or have a link to documentation on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants