make the outbox batch size configuable. #1209

JakkeFiejest · 2023-05-19T08:15:06Z

Describe the suggested improvement

At the moment the batch size when the outbox is cleaned up, is set to 10000.
Changing this value is not explosed in the public configuration.

This however causes issues when using Azure SQL-servers in a DTU model.
These DTU models are a super nice fit in for altering commands and we can easiliy grow the application and database (more load = more dtu's and more logical message processors)
It also allows for a good implementation to have one datasource per service.

But all those benifits gow out the window as soon as the outbox starts cleaning up and needs to clean more then 10000 messgaes in one cycle.
This is because cleaning that amount of messages takes up all DTU's when creating the log to rollbak the query.
And it can have bad results causing even a complete crash of the messaging pipeline when we have to much consecutive fails.

At the moment we only have two workarrounds: massivly overscaling the database or disabling the cleanup and moving it to a dedicated process/sql job. which are both lesser options.

It would be better if the batch size could be exposed via the endpoint configuration, escpecially because the value can be set via the internal constructor.
This whay the outbox can be also configured/tweaked with the expected load of the service.

Additional Context

No response

DavidBoike · 2023-05-19T15:36:26Z

Thanks @JakkeFiejest, this is a good idea. I actually peeked at the code and found that the query is already parameterized, we just have not exposed an API for setting that value.

I'm not sure when we'll get around to doing it, as it would probably need to be grouped with other improvements to persistence stuff.

As a workaround, I actually would encourage you to disable Outbox cleanup entirely within the endpoint and implement the cleanup as a SQL Agent task to run on a schedule. The query is pretty simple and aside from being tweakable to the batch size that works for you, it also has the advantage of having one canonical place to run. When an endpoint is scaled out, the individual endpoint instances don't have any way to coordinate and so they end up competing to clean the outbox, which is arguably even worse for the DTU model.

JakkeFiejest · 2023-09-07T06:21:21Z

Will this be tackled soon? small side remark, on azure database with a dtu model, you cannot schedule tasks (we found a way around it with a cron-job)

ttran-ipkg · 2023-09-14T14:37:50Z

For reference, the batch size for the outbox cleanup was changed from 10_000 to 4_000 (along with other lock related improvements) in:

Optimize TSQL outbox cleanup by reducing batch size and adding rowlock hint #1249

ramonsmits · 2023-11-16T16:39:47Z

FYI: https://github.com/Particular/NServiceBus.Persistence.Sql/releases/tag/7.0.3 was released. Does that bring any benefits?

Using larger values will be more likely to result in deadlocks. Even lower values would result in each cleanup interval to even require more sequential cleanup queries.

@JakkeFiejest Even if we expose the batchsize making it smaller will only increase the number of queries for each cleanup interval. Could you please clarify to what value you would want to set it and what behavior requirements you would have when using that value?

JakkeFiejest · 2023-11-17T08:55:39Z

Hi, At the moment we run the cleanup in a different application (because the Azure DTU model does not support scheduling tasks) But there we see benefits from the rowlock (we also use it in the cleaup) and also the benifit from better using the query plan executer. However, the big reason why the setting the cleanup interval size could be useful, is because the DTU model allows you to scale the database to your throughput, etc... however deleting 4000 records from a heap table will screw up the dtu model, because the transaction log that you have, is to small. So basically you want to set your delete size to match what the dtu's allow. (but the changes do improve the issues with the application crashing due to outbox delete timeouts, because not all apps in our microservices landscape have the outbox cleanup out of process, only the business critical ones) KInd regards, Jan Op do 16 nov 2023 om 17:40 schreef Ramon Smits ***@***.***>:

…

FYI: https://github.com/Particular/NServiceBus.Persistence.Sql/releases/tag/7.0.3 was released. Does that bring any benefits? Using larger values will be more likely to result in deadlocks. Even lower values would result in each cleanup interval to even require more sequential cleanup queries. @JakkeFiejest <https://github.com/JakkeFiejest> Even if we expose the batchsize making it smaller will only increase the number of queries for each cleanup interval. Could you please clarify to what value you would want to set it and what behavior requirements you would have when using that value? — Reply to this email directly, view it on GitHub <#1209 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AJGG3KDNA6EMJVLPDJ5G2B3YEY6V7AVCNFSM6AAAAAAYHOUTWGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMJUHAZDAMRTHA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

ramonsmits · 2023-11-20T13:16:36Z

So basically you want to set your delete size to match what the dtu's allow.

@JakkeFiejest So you would want to increase the size? The problem with increasing is that this likely will result in lock escalation.

You likely also suggest to then not only increase the batch size but also the cleanup interval duration so that is runs less?

Do you have example size for different DTU's or have a link to documentation on this?

JakkeFiejest added the Improvement label May 19, 2023

ttran-ipkg mentioned this issue Oct 3, 2023

Deadlocks/timeouts occur on SQL outbox table cleanup if it is under high load #778

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

make the outbox batch size configuable. #1209

make the outbox batch size configuable. #1209

JakkeFiejest commented May 19, 2023

DavidBoike commented May 19, 2023

JakkeFiejest commented Sep 7, 2023

ttran-ipkg commented Sep 14, 2023 •

edited by ramonsmits

Loading

ramonsmits commented Nov 16, 2023

JakkeFiejest commented Nov 17, 2023 via email

ramonsmits commented Nov 20, 2023

make the outbox batch size configuable. #1209

make the outbox batch size configuable. #1209

Comments

JakkeFiejest commented May 19, 2023

Describe the suggested improvement

Additional Context

DavidBoike commented May 19, 2023

JakkeFiejest commented Sep 7, 2023

ttran-ipkg commented Sep 14, 2023 • edited by ramonsmits Loading

ramonsmits commented Nov 16, 2023

JakkeFiejest commented Nov 17, 2023 via email

ramonsmits commented Nov 20, 2023

ttran-ipkg commented Sep 14, 2023 •

edited by ramonsmits

Loading