Skip to content
This repository has been archived by the owner on Jan 13, 2025. It is now read-only.

direct mapping: harden, fix memory permission handling #32649

Merged
merged 12 commits into from
Aug 30, 2023

Conversation

alessandrod
Copy link
Contributor

@alessandrod alessandrod commented Jul 28, 2023

This fixes the following issues with the direct mapping feature, it has no impact on !direct_mapping code:

  • bpf_loader: direct_mapping: enforce account info pointers to be immutable ensures that all AccountInfo pointers are not mutated. CPI looks up memory regions based on an account's data address so it can't be changed, and other pointers are locked too since there's no (legitimate) reason to ever mutate them, they point to address space setup by the program runtime.

  • bpf_loader: cpi: access ref_to_len_in_vm through VmValue disables "caching" of vm->host address translations. Vm address space under direct mapping has some dynamic properties (CoW, changing permissions), so memory accesses must always to go through the vm.

  • direct mapping: improve memory permission tracking across CPI calls ensures that account data and realloc memory regions are correctly updated to match their respective account's permissions, and that the permissions are all synchronized in one go before the individual accounts are synchronized. This is required to guarantee that while synchronizing accounts across CPI calls memory region permissions are in a consistent state.

  • cpi: fix slow edge case zeroing extra account capacity after shrinking an account fixes a slow edge case when resizing large accounts.

  • cpi: direct_mapping: error out if ref_to_len_in_vm points to account memory forbids AccountInfo::data from being put in account memory. CPI updates that slice and we don't want CPI to write into account data.

  • bpf_loader: direct_mapping: map AccessViolation -> InstructionError maps low level AccessViolation errors to higher level InstructionErrors for better debuggability

Extracted from #31986 to not include other small refactorings and new tests which will be PR-d separately.

@codecov
Copy link

codecov bot commented Jul 28, 2023

Codecov Report

Merging #32649 (685c439) into master (166f9e2) will decrease coverage by 0.1%.
The diff coverage is 63.7%.

❗ Current head 685c439 differs from pull request most recent head 96045be. Consider uploading reports for the commit 96045be to get more accurate results

@@            Coverage Diff            @@
##           master   #32649     +/-   ##
=========================================
- Coverage    82.0%    82.0%   -0.1%     
=========================================
  Files         784      784             
  Lines      212877   212747    -130     
=========================================
- Hits       174610   174493    -117     
+ Misses      38267    38254     -13     

@github-actions github-actions bot added the stale [bot only] Added to stale content; results in auto-close after a week. label Aug 14, 2023
@github-actions github-actions bot closed this Aug 22, 2023
@alessandrod alessandrod reopened this Aug 22, 2023
@github-actions github-actions bot removed the stale [bot only] Added to stale content; results in auto-close after a week. label Aug 23, 2023
programs/bpf_loader/src/syscalls/cpi.rs Outdated Show resolved Hide resolved
programs/bpf_loader/src/syscalls/cpi.rs Show resolved Hide resolved
programs/bpf_loader/src/syscalls/cpi.rs Outdated Show resolved Hide resolved
programs/bpf_loader/src/syscalls/cpi.rs Show resolved Hide resolved
@alessandrod alessandrod force-pushed the cpi-perms-fixes branch 2 times, most recently from ea33b5c to 685c439 Compare August 24, 2023 12:24
@ryoqun
Copy link
Contributor

ryoqun commented Aug 28, 2023

status: phew, reviewing the dm is really hard... almost ready for lgtm with nits from me. really appreciate months of patience for code reviewing by me....

i need to cool down my brain a bit. i'll approve this and the other #31986 tomorrow unless tomorrow me beats tonight's me. :)

@alessandrod alessandrod force-pushed the cpi-perms-fixes branch 2 times, most recently from 2761745 to 96045be Compare August 29, 2023 01:39
@alessandrod
Copy link
Contributor Author

status: phew, reviewing the dm is really hard... almost ready for lgtm with nits from me. really appreciate months of patience for code reviewing by me....

i need to cool down my brain a bit. i'll approve this and the other #31986 tomorrow unless tomorrow me beats tonight's me. :)

Thank you again for your review, I know this is complex code nobody wants to get near to 😅 I think I've addressed all your nits, let me know if there's more

@ryoqun ryoqun self-requested a review August 29, 2023 04:46
@alessandrod alessandrod force-pushed the cpi-perms-fixes branch 2 times, most recently from 0783119 to 57b760b Compare August 29, 2023 11:11
let post_len = callee_account.get_data().len();
let realloc_bytes_used = post_len.saturating_sub(caller_account.original_data_len);
if prev_len != post_len {
Copy link
Contributor

@ryoqun ryoqun Aug 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe, we should always zero spare capacity always here if zero_all_spare_capacity == true, regardless post_len < prev_lenor not?

i mean, is it safe to skip zeroing conditionally while indicated to do so as such by the flag?

say, account data vector could be changed by native program inside the called ix in the manner of Arc's Clone-on-Write fashion via set_data_from_slice's early-returning .set_data(), while still holding prev_len == post_len, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then, assume {prev,post}_len < original_data_len, on-chain program can see uninitialized memory in spare capacity?

hmm, wait, to create the precondition, CoW must have been performed already before the cpi?? so, actually this is impossible?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i mean, is it safe to skip zeroing conditionally while indicated to do so as such by the flag?

It is safe because slice::to_vec() is Vec::with_capacity(slice.len()). So if slice.len() < original_data_len, it'll hit the if callee_account.capacity() < min_capacity { case and set zero_all_spare_capacity=true.

I guess I can make this more explicit and inline slice::to_vec() in our code.

ryoqun
ryoqun previously approved these changes Aug 29, 2023
Copy link
Contributor

@ryoqun ryoqun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm with nits.

really thanks for the hard work.

We now have a way to provide CallerAccount with trusted values coming
from our internal serialization code and not from untrusted vm space
…able

When direct mapping is enabled, we might need to update account data
memory regions across CPI calls. Since the only way we have to retrieve
the regions is based on their vm addresses, we enforce vm addresses to
be stable.  Accounts can still be mutated and resized of course, but it
must be done in place.

This also locks all other AccountInfo pointers, since there's no legitimate
reason to make them point to anything else.
Direct mapping needs to translate vm values at each access since
permissions of the underlying memory might have changed.
Ensure that the data and realloc regions of an account always track the
account's permissions. In order to do this, we also need to split
realloc regions in their own self contained regions, where before we
had:

[account fields][account data][account realloc + more account fields + next account fields][next account data][...]

we now have:

[account fields][account data][account realloc][more account fields + next account fields][next account data][...]

Tested in TEST_[FORBID|ALLOW]_WRITE_AFTER_OWNERSHIP_CHANGE*

Additionally when direct mapping is on, we must update all perms at once before
doing account data updates. Otherwise, updating an account might write into
another account whose perms we haven't updated yet. Tested in
TEST_FORBID_LEN_UPDATE_AFTER_OWNERSHIP_CHANGE.
…g an account

When returning from CPI we need to zero all the account memory up to the
original length only if we know we're potentially dealing with uninitialized
memory.

When we know that the spare capacity has deterministic content, we only need to
zero new_len..prev_len.

This fixes a slow edge case that was triggerable by the following scenario:

- load a large account (say 10MB) into the vm
- shrink to 10 bytes - would memset 10..10MB
- shrink to 9 bytes - would memset 9..10MB
- shrink to 8 bytes - would memset 8..10MB
- ...

Now instead in the scenario above the following will happen:

- load a large account (say 10MB) into the vm
- shrink to 10 bytes - memsets 10..10MB
- shrink to 9 bytes - memsets 9..10
- shrink to 8 bytes - memset 8..9
- ...
Shared between serialization and CPI to figure out the MemoryState of an
account.
…memory

If ref_to_len_in_vm is allowed to be in account memory, calles could mutate it,
essentially letting callees directly mutate callers memory.
Return the proper ReadonlyDataModified / ExecutableDataModified /
ExternalAccountDataModified depending on where the violation occurs
@alessandrod
Copy link
Contributor Author

lgtm with nits.

really thanks for the hard work.

Thank you for the great review! 😊

@alessandrod alessandrod merged commit 0f41719 into solana-labs:master Aug 30, 2023
@alessandrod alessandrod changed the title direct mapping: misc fixes direct mapping: harden, fix memory permission handling Sep 1, 2023
alessandrod added a commit to alessandrod/solana that referenced this pull request May 6, 2024
We used call make_data_mut() from set_data_from_slice() from the days
when direct mapping couldn't deal with accounts getting shrunk. That
changed in solana-labs#32649 (see the
if callee_account.capacity() < min_capacity check in
cpi.rs:update_caller_account()).

With this change we don't call make_data_mut() anymore before
set_data_from_slice(), saving the cost of a memcpy since
set_data_from_slice() overrides the whole account content anyway.
alessandrod added a commit to alessandrod/solana that referenced this pull request May 6, 2024
We used call make_data_mut() from set_data_from_slice() from the days
when direct mapping couldn't deal with accounts getting shrunk. That
changed in solana-labs#32649 (see the
if callee_account.capacity() < min_capacity check in
cpi.rs:update_caller_account()).

With this change we don't call make_data_mut() anymore before
set_data_from_slice(), saving the cost of a memcpy since
set_data_from_slice() overrides the whole account content anyway.
alessandrod added a commit to alessandrod/solana that referenced this pull request May 6, 2024
We used call make_data_mut() from set_data_from_slice() from the days
when direct mapping couldn't deal with accounts getting shrunk. That
changed in solana-labs#32649 (see the
if callee_account.capacity() < min_capacity check in
cpi.rs:update_caller_account()).

With this change we don't call make_data_mut() anymore before
set_data_from_slice(), saving the cost of a memcpy since
set_data_from_slice() overrides the whole account content anyway.
alessandrod added a commit to anza-xyz/agave that referenced this pull request May 7, 2024
* AccountSharedData::reserve: remove extra alloc + memcpy

Calling data_mut().reserve(additional) used to result in _two_ allocs
and memcpys: the first to unshare the underlying vector, and the second
upon calling reserve since Arc::make_mut clones so it uses capacity ==
len.

With this fix we manually "unshare" allocating with capacity = len +
additional, therefore saving on extra alloc and memcpy.

* AccountSharedData::set_data_from_slice: skip extra alloc + memcpy

We used call make_data_mut() from set_data_from_slice() from the days
when direct mapping couldn't deal with accounts getting shrunk. That
changed in solana-labs#32649 (see the
if callee_account.capacity() < min_capacity check in
cpi.rs:update_caller_account()).

With this change we don't call make_data_mut() anymore before
set_data_from_slice(), saving the cost of a memcpy since
set_data_from_slice() overrides the whole account content anyway.
CriesofCarrots pushed a commit to CriesofCarrots/solana that referenced this pull request May 7, 2024
* AccountSharedData::reserve: remove extra alloc + memcpy

Calling data_mut().reserve(additional) used to result in _two_ allocs
and memcpys: the first to unshare the underlying vector, and the second
upon calling reserve since Arc::make_mut clones so it uses capacity ==
len.

With this fix we manually "unshare" allocating with capacity = len +
additional, therefore saving on extra alloc and memcpy.

* AccountSharedData::set_data_from_slice: skip extra alloc + memcpy

We used call make_data_mut() from set_data_from_slice() from the days
when direct mapping couldn't deal with accounts getting shrunk. That
changed in solana-labs#32649 (see the
if callee_account.capacity() < min_capacity check in
cpi.rs:update_caller_account()).

With this change we don't call make_data_mut() anymore before
set_data_from_slice(), saving the cost of a memcpy since
set_data_from_slice() overrides the whole account content anyway.
vovkman pushed a commit to helius-labs/solana that referenced this pull request Aug 20, 2024
* AccountSharedData::reserve: remove extra alloc + memcpy

Calling data_mut().reserve(additional) used to result in _two_ allocs
and memcpys: the first to unshare the underlying vector, and the second
upon calling reserve since Arc::make_mut clones so it uses capacity ==
len.

With this fix we manually "unshare" allocating with capacity = len +
additional, therefore saving on extra alloc and memcpy.

* AccountSharedData::set_data_from_slice: skip extra alloc + memcpy

We used call make_data_mut() from set_data_from_slice() from the days
when direct mapping couldn't deal with accounts getting shrunk. That
changed in solana-labs#32649 (see the
if callee_account.capacity() < min_capacity check in
cpi.rs:update_caller_account()).

With this change we don't call make_data_mut() anymore before
set_data_from_slice(), saving the cost of a memcpy since
set_data_from_slice() overrides the whole account content anyway.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants