Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CORE-7750] Creating topic with huge number of partitions leads to segfault #24135

Merged
merged 3 commits into from
Nov 21, 2024

Conversation

IoannisRP
Copy link
Contributor

@IoannisRP IoannisRP commented Nov 15, 2024

Fixes: CORE-7750

The size of the allocation_request created as part of the partition allocation scales linearly with the number of partitions needed to be created. Currently, the check to see if the request can be accepted happens on the allocation_request object. When the number of requested partitions is too big, we run out of memory while creating this object.

The fix is to be lazy on the creation of the allocation_request, where possible, and do the first step of the validation before we create the full object.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.3.x
  • v24.2.x
  • v24.1.x

Release Notes

Bug Fixes

  • Fixed an issue where creating a topic with a huge number of partitions could lead to a crash.

Copy link
Contributor

@rockwotj rockwotj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me generally thanks! Just some small nits on docs/naming.

src/v/cluster/scheduling/partition_allocator.h Outdated Show resolved Hide resolved
src/v/cluster/scheduling/types.h Outdated Show resolved Hide resolved
src/v/cluster/topics_frontend.cc Outdated Show resolved Hide resolved
@IoannisRP
Copy link
Contributor Author

@rockwotj good points. I'll address them and i'll add the do_create_partion lazy path.

@michael-redpanda
Copy link
Contributor

/dt

@IoannisRP IoannisRP force-pushed the ik-partitions-segfault branch 2 times, most recently from 26e44b1 to 155d046 Compare November 18, 2024 14:58
@IoannisRP IoannisRP marked this pull request as ready for review November 18, 2024 14:59
@IoannisRP IoannisRP changed the title [CORE-7750] PoC - Creating topic with huge number of partitions leads to segfault [CORE-7750] Creating topic with huge number of partitions leads to segfault Nov 18, 2024
@IoannisRP IoannisRP requested a review from rockwotj November 18, 2024 14:59
@vbotbuildovich
Copy link
Collaborator

the below tests from https://buildkite.com/redpanda/redpanda/builds/58192#01933fcb-6e85-465d-9b8e-7ea931b6e484 have failed and will be retried

translator_test_rpfixture

@dotnwat
Copy link
Member

dotnwat commented Nov 18, 2024

seems like this is primarily a replication team code area. added reviewers from that team

src/v/cluster/scheduling/types.h Outdated Show resolved Hide resolved
src/v/cluster/scheduling/types.h Outdated Show resolved Hide resolved
@@ -802,8 +809,12 @@ ss::future<topic_result> topics_frontend::do_create_topic(
partition_allocator::shard,
[assignable_config, topic_aware = _partition_autobalancing_topic_aware()](
partition_allocator& al) {
if (assignable_config.has_custom_assignment()) {
return al.allocate(
make_custom_allocation_request(assignable_config, topic_aware));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this path (or through do_increase_replication_factor) not susceptible to the problem?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mainly, we cannot be lazy about these paths, as the request has per-partition requirements.

Maybe we could improve this by delaying the transform, but I don't think there would be any obvious benefits.
The raw transformation takes an integer and instantiates an array of that size.
This path translates an array into another array.
While it is very easy to request for a "billion" new partitions, it is not as easy to request for a billion custom assignments. If someone were to indeed ask for that, the point of failure would probably be around the parsing of the request and the population of this array. It would have to be handled there.

move static make_allocation_request helper functions into an
anonymous namespace
Create a new partition allocation path. The new path doesn't explicitly
instantiate a full allocation_request upfront, as the size of this request
scales with the number of requested partitions. This allows to check for
system limitations and reject a request earlier, without the need to
create a potentially resource-intense allocation_request.
Where applicable, use the simple_allocation_request interface.
@IoannisRP IoannisRP force-pushed the ik-partitions-segfault branch from 155d046 to 16a95fb Compare November 19, 2024 21:13
@IoannisRP
Copy link
Contributor Author

  • Renamed raw_allocation_request to simple_allocation_request
  • Renamed partition_count -> additional_partitions
  • Removed single argument constructor
  • Split implementation of new path into 2 separate commits.

@@ -341,6 +381,27 @@ FIXTURE_TEST(allocation_units_test, partition_allocator_fixture) {
// we do not decrement the highest raft group
BOOST_REQUIRE_EQUAL(allocator().state().last_group_id()(), 10);
}
FIXTURE_TEST(allocation_units_test_raw_req, partition_allocator_fixture) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: raw vs simple

@IoannisRP IoannisRP merged commit a91f6cb into redpanda-data:dev Nov 21, 2024
16 checks passed
@vbotbuildovich
Copy link
Collaborator

/backport v24.3.x

@vbotbuildovich
Copy link
Collaborator

/backport v24.2.x

@vbotbuildovich
Copy link
Collaborator

Failed to create a backport PR to v24.2.x branch. I tried:

git remote add upstream https://github.com/redpanda-data/redpanda.git
git fetch --all
git checkout -b backport-pr-24135-v24.2.x-772 remotes/upstream/v24.2.x
git cherry-pick -x ef6a6ac0ee fc1047866e 16a95fb467

Workflow run logs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants