Fix dissemination large shard count #22977

mmaslankaprv · 2024-08-21T10:21:30Z

Previously each time the leadership update request was received by the
metadata_dissemination_handler it created a copy for each of the shard
on the shard handling the request. This is inefficient and may lead to
OOM on the handler shard (especially for very large machines).

Instead of creating a copy for each shard we can simply use the const
reference as the updates vector doesn't change and it is safe to access
it from other cores.

Backports Required

Release Notes

Improvements

optimized memory usage with large core count nodes

StephanDollberg · 2024-08-21T10:52:55Z

src/v/cluster/metadata_dissemination_handler.cc

 vlog(clusterlog.trace, "Received a metadata update");
- co_await ss::parallel_for_each(
- boost::irange<ss::shard_id>(0, ss::smp::count),
- [this, leaders = std::move(leaders)](ss::shard_id shard) {


Is there a specific reason you switched to ss::do_with?

Just doing &leaders should do the same no? The coroutine will keep it alive until ss::parallel_for_each returns?

This part is called very often, i dropped coroutines to prevent allocations, in this case the coroutine do not help much with readability

why not use _leaders.invoke_on_all(...)?

This part is called very often, i dropped coroutines to prevent allocations, in this case the coroutine do not help much with readability

I really wouldn't overthink this in general. do_with will need an extra alloc as well to store the state.

StephanDollberg · 2024-08-21T10:53:38Z

This path isn't covered by any microbench right now, right? (Just asking)

mmaslankaprv · 2024-08-21T10:54:07Z

This path isn't covered by any microbench right now, right? (Just asking)

no it is not.

The `update_leader_request` may only contain few of ntp leader updates. In this case using a `fragmented_vector` with large chunk size is a memory waste Signed-off-by: Michał Maślanka <michal@redpanda.com>

Previously each time the leadership update request was received by the `metadata_dissemination_handler` it created a copy for each of the shard on the shard handling the request. This is inefficient and may lead to OOM on the handler shard (especially for very large machines). Instead of creating a copy for each shard we can simply use the const reference as the updates vector doesn't change and it is safe to access it from other cores. Signed-off-by: Michał Maślanka <michal@redpanda.com>

vbotbuildovich · 2024-08-21T15:18:25Z

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/53280#01917533-e197-4b3c-981b-c6da111be185

vbotbuildovich · 2024-08-21T15:43:47Z

/backport v24.2.x

vbotbuildovich · 2024-08-21T15:43:47Z

/backport v24.1.x

github-actions bot added the area/redpanda label Aug 21, 2024

mmaslankaprv requested review from bharathv, StephanDollberg, bashtanov, ztlpn and travisdowns August 21, 2024 10:21

mmaslankaprv added this to the v24.2.3 milestone Aug 21, 2024

StephanDollberg reviewed Aug 21, 2024

View reviewed changes

mmaslankaprv requested a review from StephanDollberg August 21, 2024 10:56

mmaslankaprv force-pushed the fix-dissemination-large-shard-count branch from 7dbf46e to 377dceb Compare August 21, 2024 11:05

mmaslankaprv added 2 commits August 21, 2024 12:09

c/m_dissemination: use chunked vector in update_leaders_request

e1be74a

The `update_leader_request` may only contain few of ntp leader updates. In this case using a `fragmented_vector` with large chunk size is a memory waste Signed-off-by: Michał Maślanka <michal@redpanda.com>

mmaslankaprv force-pushed the fix-dissemination-large-shard-count branch from 377dceb to 8e4f51f Compare August 21, 2024 12:09

StephanDollberg approved these changes Aug 21, 2024

View reviewed changes

mmaslankaprv merged commit ec8493e into redpanda-data:dev Aug 21, 2024
17 checks passed

This was referenced Aug 21, 2024

[v24.2.x] Fix dissemination large shard count #22982

Merged

[v24.1.x] Fix dissemination large shard count #22983

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix dissemination large shard count #22977

Fix dissemination large shard count #22977

mmaslankaprv commented Aug 21, 2024 •

edited

Loading

StephanDollberg Aug 21, 2024

mmaslankaprv Aug 21, 2024

bashtanov Aug 21, 2024

StephanDollberg Aug 21, 2024

StephanDollberg commented Aug 21, 2024

mmaslankaprv commented Aug 21, 2024

vbotbuildovich commented Aug 21, 2024

vbotbuildovich commented Aug 21, 2024

vbotbuildovich commented Aug 21, 2024

Fix dissemination large shard count #22977

Fix dissemination large shard count #22977

Conversation

mmaslankaprv commented Aug 21, 2024 • edited Loading

Backports Required

Release Notes

Improvements

StephanDollberg Aug 21, 2024

Choose a reason for hiding this comment

mmaslankaprv Aug 21, 2024

Choose a reason for hiding this comment

bashtanov Aug 21, 2024

Choose a reason for hiding this comment

StephanDollberg Aug 21, 2024

Choose a reason for hiding this comment

StephanDollberg commented Aug 21, 2024

mmaslankaprv commented Aug 21, 2024

vbotbuildovich commented Aug 21, 2024

vbotbuildovich commented Aug 21, 2024

vbotbuildovich commented Aug 21, 2024

mmaslankaprv commented Aug 21, 2024 •

edited

Loading