Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage/compacted_index_chunk_reader: fix memory calculation #11138

Merged

Conversation

BenPope
Copy link
Member

@BenPope BenPope commented Jun 1, 2023

Adjust the memory calculation based on the memory usage,
which is the in-memory size of a compaction::entry, currently
56 bytes. Do not exceed the max_chunk_memory by calculating how
big the container will be on the next insert.

Fixes #10311

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v23.1.x
  • v22.3.x
  • v22.2.x

Release Notes

Improvements

  • Compaction: Reduce large allocations

BenPope added 2 commits June 1, 2023 13:35
Adjust the memory calculation based on the memory usage,
which is the in-memory size of a `compaction::entry`, currently
56 bytes. Do not exceed the max_chunk_memory by calculating how
big the container will be on the next insert.

Fixes redpanda-data#10311

Signed-off-by: Ben Pope <ben@redpanda.com>
Signed-off-by: Ben Pope <ben@redpanda.com>
@BenPope BenPope marked this pull request as ready for review June 1, 2023 12:51
@piyushredpanda piyushredpanda requested a review from VladLazar June 1, 2023 13:20
@andijcr
Copy link
Contributor

andijcr commented Jun 1, 2023

most of the ducktape failures have already a ticket, but https://buildkite.com/redpanda/redpanda/builds/30353#0188770b-f994-4fd5-95ac-cbf5f7c83cde

Module: rptest.tests.cloud_storage_chunk_read_path_test
Class:  CloudStorageChunkReadTest
Method: test_read_when_segment_size_smaller_than_chunk_size

this one seems new, the consumer could not finish in the allotted time

@BenPope BenPope self-assigned this Jun 1, 2023
@BenPope
Copy link
Member Author

BenPope commented Jun 1, 2023

most of the ducktape failures have already a ticket, but https://buildkite.com/redpanda/redpanda/builds/30353#0188770b-f994-4fd5-95ac-cbf5f7c83cde

Module: rptest.tests.cloud_storage_chunk_read_path_test
Class:  CloudStorageChunkReadTest
Method: test_read_when_segment_size_smaller_than_chunk_size

this one seems new, the consumer could not finish in the allotted time

The timeouts seem to be rpc

TRACE 2023-06-01 13:28:54,543 [shard 1] rpc - error connecting to 172.16.16.2:9000 - seastar::timed_out_error (timedout)
TRACE 2023-06-01 13:28:54,577 [shard 1] rpc - Connection error: seastar::timed_out_error (timedout)
TRACE 2023-06-01 13:28:54,801 [shard 1] rpc - error connecting to 172.16.16.2:9000 - seastar::timed_out_error (timedout)
TRACE 2023-06-01 13:28:54,805 [shard 1] rpc - Connection error: seastar::timed_out_error (timedout)
TRACE 2023-06-01 13:28:54,888 [shard 1] rpc - error connecting to 172.16.16.2:9000 - seastar::timed_out_error (timedout)
TRACE 2023-06-01 13:28:54,892 [shard 1] rpc - Connection error: seastar::timed_out_error (timedout)
TRACE 2023-06-01 13:28:55,031 [shard 1] rpc - error connecting to 172.16.16.2:9000 - seastar::timed_out_error (timedout)
TRACE 2023-06-01 13:28:55,033 [shard 1] rpc - Connection error: seastar::timed_out_error (timedout)
TRACE 2023-06-01 13:29:01,985 [shard 1] rpc - error connecting to 172.16.16.2:9000 - seastar::timed_out_error (timedout)
TRACE 2023-06-01 13:29:01,986 [shard 1] rpc - Connection error: seastar::timed_out_error (timedout)
TRACE 2023-06-01 13:29:02,152 [shard 1] rpc - error connecting to 172.16.16.2:9000 - seastar::timed_out_error (timedout)
TRACE 2023-06-01 13:29:02,153 [shard 1] rpc - Connection error: seastar::timed_out_error (timedout)
TRACE 2023-06-01 13:29:02,511 [shard 1] rpc - error connecting to 172.16.16.2:9000 - seastar::timed_out_error (timedout)
TRACE 2023-06-01 13:29:02,517 [shard 1] rpc - Connection error: seastar::timed_out_error (timedout)
TRACE 2023-06-01 13:29:02,632 [shard 1] rpc - error connecting to 172.16.16.2:9000 - seastar::timed_out_error (timedout)
TRACE 2023-06-01 13:29:02,643 [shard 1] rpc - Connection error: seastar::timed_out_error (timedout)
TRACE 2023-06-01 13:29:02,987 [shard 1] rpc - error connecting to 172.16.16.2:9000 - seastar::timed_out_error (timedout)
TRACE 2023-06-01 13:29:03,014 [shard 1] rpc - Connection error: seastar::timed_out_error (timedout)
TRACE 2023-06-01 13:29:03,331 [shard 1] rpc - error connecting to 172.16.16.2:9000 - seastar::timed_out_error (timedout)
TRACE 2023-06-01 13:29:03,354 [shard 1] rpc - Connection error: seastar::timed_out_error (timedout)
TRACE 2023-06-01 13:29:03,910 [shard 1] rpc - error connecting to 172.16.16.2:9000 - seastar::timed_out_error (timedout)
TRACE 2023-06-01 13:29:03,912 [shard 1] rpc - Connection error: seastar::timed_out_error (timedout)
TRACE 2023-06-01 13:29:04,219 [shard 1] rpc - error connecting to 172.16.16.2:9000 - seastar::timed_out_error (timedout)
TRACE 2023-06-01 13:29:04,374 [shard 1] rpc - Connection error: seastar::timed_out_error (timedout)
TRACE 2023-06-01 13:29:04,536 [shard 1] rpc - error connecting to 172.16.16.2:9000 - seastar::timed_out_error (timedout)
TRACE 2023-06-01 13:29:04,547 [shard 1] rpc - Connection error: seastar::timed_out_error (timedout)
TRACE 2023-06-01 13:29:04,699 [shard 1] rpc - error connecting to 172.16.16.2:9000 - seastar::timed_out_error (timedout)
TRACE 2023-06-01 13:29:04,701 [shard 1] rpc - Connection error: seastar::timed_out_error (timedout)
TRACE 2023-06-01 13:31:06,236 [shard 1] rpc - error connecting to 172.16.16.2:9000 - seastar::timed_out_error (timedout)
TRACE 2023-06-01 13:31:06,359 [shard 1] rpc - Connection error: seastar::timed_out_error (timedout)
TRACE 2023-06-01 13:31:33,088 [shard 1] rpc - error connecting to 172.16.16.2:9000 - seastar::timed_out_error (timedout)
TRACE 2023-06-01 13:31:33,149 [shard 1] rpc - Connection error: seastar::timed_out_error (timedout)
TRACE 2023-06-01 13:31:44,823 [shard 1] rpc - error connecting to 172.16.16.2:9000 - seastar::timed_out_error (timedout)
TRACE 2023-06-01 13:31:44,882 [shard 1] rpc - Connection error: seastar::timed_out_error (timedout)
TRACE 2023-06-01 13:31:48,165 [shard 1] rpc - error connecting to 172.16.16.2:9000 - seastar::timed_out_error (timedout)
TRACE 2023-06-01 13:31:48,170 [shard 1] rpc - Connection error: seastar::timed_out_error (timedout)

172.16.16.2 is minio-s3. There doesn't seem to be logs for that.

Locally I run the test 18 times:

ducktape version: 0.8.8
session_id:       2023-06-01--024
run time:         2 minutes 55.721 seconds
tests run:        18
passed:           18
failed:           0
ignored:          0
opassed:          0
ofailed:          0

Which is 9 of them in parallel, twice. So they take around 1min each.

@BenPope
Copy link
Member Author

BenPope commented Jun 1, 2023

/ci-repeat 1
skip-units
dt-repeat=1
tests/rptest/tests/cloud_storage_chunk_read_path_test.py::CloudStorageChunkReadTest.test_read_when_segment_size_smaller_than_chunk_size

@BenPope
Copy link
Member Author

BenPope commented Jun 1, 2023

/ci-repeat 1 skip-units dt-repeat=1 tests/rptest/tests/cloud_storage_chunk_read_path_test.py::CloudStorageChunkReadTest.test_read_when_segment_size_smaller_than_chunk_size

Same failure.

@BenPope
Copy link
Member Author

BenPope commented Jun 1, 2023

@abhijat any thoughts on the failure in CloudStorageChunkReadTest.test_read_when_segment_size_smaller_than_chunk_size (you added the test recently)?

Should I have some output from minio to investigate?

The failure is unrelated to this PR. I raised the failure here: #11151

Copy link
Contributor

@VladLazar VladLazar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx for this. Makes sense to me. Might have been more straightforward to use circular_buffer_fixed_capacity instead of tracking the size.

@abhijat
Copy link
Contributor

abhijat commented Jun 2, 2023

@BenPope I have a PR to fix this test now: #11161

@BenPope
Copy link
Member Author

BenPope commented Jun 2, 2023

Thx for this. Makes sense to me. Might have been more straightforward to use circular_buffer_fixed_capacity instead of tracking the size.

Good point, the use-case is compaction, so I guess we expect to fill the buffer most of the time anyway. Going straight to a fully allocated buffer may significantly reduce pressure on the allocator. Should I rewrite it?

@VladLazar
Copy link
Contributor

Thx for this. Makes sense to me. Might have been more straightforward to use circular_buffer_fixed_capacity instead of tracking the size.

Good point, the use-case is compaction, so I guess we expect to fill the buffer most of the time anyway. Going straight to a fully allocated buffer may significantly reduce pressure on the allocator. Should I rewrite it?

Only if you have the bandwidth. This is objectively an improvement, so I'm happy for us to merge it.

@BenPope
Copy link
Member Author

BenPope commented Jun 2, 2023

I'll merge since the fix is currently self-contained and circular_buffer is in the interface of several other related types and machinery. If entry stays at 56 bytes, and less than 64bytes, we get 87.5% utilisation or more, which is fine.

@vbotbuildovich
Copy link
Collaborator

/backport v23.1.x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Oversized allocation: 458752 bytes in storage::internal::compacted_index_chunk_reader::load_slice
6 participants