Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v23.3.x] [CORE-7061] sr/store: Deep copy schema_def iobufs at the interface #23137

Merged

Conversation

vbotbuildovich
Copy link
Collaborator

Backport of PR #23114

sharded_store's main purpose is to supply a consistent interface
to an underlying (sharded) in-memory store. So to process a given
request, it must first work out the home shard for the desired
schema ID or subject, dispatching requests to that shard.

Recently, schema registry was refactored to store schema definitions
as iobufs (rather than contiguous strings) to avoid large allocations.
To avoid semantically unnecessary _copies_, get_schema_definition was
implemented to return a iobuf share across a shard boundary. This is
an unsafe operation, as the reference counting and deletion machinery
of the underlying temporary buffers is totally unsynchronized. This is
UB in principle (data race) and leads to memory access faults (UAF, etc.)
in practice.

So this commit changes 'store::get_schema_definition' to return a copy
of the stored schema definition (iobuf), eliminating any cross-shard
sharing of the underlying temp buffers. This approach is rather blunt
and carries some cost at deallocation time (seastar allocator must
work out the appropriate site for memory reclamation for each component
temp buffer of a given schema def), but it should be safe since the
component temp buffers are never live on multiple shards simultaneously.

Note that this path could be optimized somewhat by returning a shared
iobuf managed by foreign_ptr and carrying out the necessary copy (a
read-only operation) on the requesting shard.

This commit also includes a "unit test" that is more of a memory access
canary, of sorts. Reverting the change in 'store::gets_schema_definition'
will cause the test to fail reliably, producing memory faults in line with
the reasoning above. With the change in place, this test reliably runs to
completion without issue.

Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
(cherry picked from commit 420bd28)
@vbotbuildovich vbotbuildovich added this to the v23.3.x-next milestone Aug 29, 2024
@vbotbuildovich vbotbuildovich added the kind/backport PRs targeting a stable branch label Aug 29, 2024
@vbotbuildovich vbotbuildovich requested a review from oleiman August 29, 2024 12:14
@oleiman
Copy link
Member

oleiman commented Aug 29, 2024

CI Failures:

  • Memory leak in gtest_raft_rpunit? 😬
  • some other nastiness in TransactionsStreamsTest

Neither seemingly having anything to do with this diff.

  • Latest failed ducktape looks like infra
  • and a kafka_server_rpunit failure, not related to this change in any obvious way

@oleiman
Copy link
Member

oleiman commented Sep 3, 2024

/ci-repeat 1

@BenPope BenPope modified the milestones: v23.3.x-next, v23.3.21 Sep 3, 2024
@rockwotj rockwotj merged commit 4dfc253 into redpanda-data:v23.3.x Sep 3, 2024
14 of 17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/redpanda kind/backport PRs targeting a stable branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants