-
Notifications
You must be signed in to change notification settings - Fork 4.5k
direct mapping: harden, fix memory permission handling #32649
Conversation
Codecov Report
@@ Coverage Diff @@
## master #32649 +/- ##
=========================================
- Coverage 82.0% 82.0% -0.1%
=========================================
Files 784 784
Lines 212877 212747 -130
=========================================
- Hits 174610 174493 -117
+ Misses 38267 38254 -13 |
4ef4432
to
18ba47c
Compare
ea33b5c
to
685c439
Compare
status: phew, reviewing the dm is really hard... almost ready for lgtm with nits from me. really appreciate months of patience for code reviewing by me.... i need to cool down my brain a bit. i'll approve this and the other #31986 tomorrow unless tomorrow me beats tonight's me. :) |
2761745
to
96045be
Compare
Thank you again for your review, I know this is complex code nobody wants to get near to 😅 I think I've addressed all your nits, let me know if there's more |
0783119
to
57b760b
Compare
let post_len = callee_account.get_data().len(); | ||
let realloc_bytes_used = post_len.saturating_sub(caller_account.original_data_len); | ||
if prev_len != post_len { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe, we should always zero spare capacity always here if zero_all_spare_capacity == true
, regardless post_len < prev_len
or not?
i mean, is it safe to skip zeroing conditionally while indicated to do so as such by the flag?
say, account data vector could be changed by native program inside the called ix in the manner of Arc's Clone-on-Write fashion via set_data_from_slice
's early-returning .set_data()
, while still holding prev_len == post_len
, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then, assume {prev,post}_len < original_data_len
, on-chain program can see uninitialized memory in spare capacity?
hmm, wait, to create the precondition, CoW must have been performed already before the cpi?? so, actually this is impossible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i mean, is it safe to skip zeroing conditionally while indicated to do so as such by the flag?
It is safe because slice::to_vec()
is Vec::with_capacity(slice.len())
. So if slice.len() < original_data_len
, it'll hit the if callee_account.capacity() < min_capacity {
case and set zero_all_spare_capacity=true
.
I guess I can make this more explicit and inline slice::to_vec()
in our code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm with nits.
really thanks for the hard work.
We now have a way to provide CallerAccount with trusted values coming from our internal serialization code and not from untrusted vm space
…able When direct mapping is enabled, we might need to update account data memory regions across CPI calls. Since the only way we have to retrieve the regions is based on their vm addresses, we enforce vm addresses to be stable. Accounts can still be mutated and resized of course, but it must be done in place. This also locks all other AccountInfo pointers, since there's no legitimate reason to make them point to anything else.
Direct mapping needs to translate vm values at each access since permissions of the underlying memory might have changed.
Ensure that the data and realloc regions of an account always track the account's permissions. In order to do this, we also need to split realloc regions in their own self contained regions, where before we had: [account fields][account data][account realloc + more account fields + next account fields][next account data][...] we now have: [account fields][account data][account realloc][more account fields + next account fields][next account data][...] Tested in TEST_[FORBID|ALLOW]_WRITE_AFTER_OWNERSHIP_CHANGE* Additionally when direct mapping is on, we must update all perms at once before doing account data updates. Otherwise, updating an account might write into another account whose perms we haven't updated yet. Tested in TEST_FORBID_LEN_UPDATE_AFTER_OWNERSHIP_CHANGE.
…r from push_account_region
…g an account When returning from CPI we need to zero all the account memory up to the original length only if we know we're potentially dealing with uninitialized memory. When we know that the spare capacity has deterministic content, we only need to zero new_len..prev_len. This fixes a slow edge case that was triggerable by the following scenario: - load a large account (say 10MB) into the vm - shrink to 10 bytes - would memset 10..10MB - shrink to 9 bytes - would memset 9..10MB - shrink to 8 bytes - would memset 8..10MB - ... Now instead in the scenario above the following will happen: - load a large account (say 10MB) into the vm - shrink to 10 bytes - memsets 10..10MB - shrink to 9 bytes - memsets 9..10 - shrink to 8 bytes - memset 8..9 - ...
Shared between serialization and CPI to figure out the MemoryState of an account.
…memory If ref_to_len_in_vm is allowed to be in account memory, calles could mutate it, essentially letting callees directly mutate callers memory.
Return the proper ReadonlyDataModified / ExecutableDataModified / ExternalAccountDataModified depending on where the violation occurs
57b760b
to
7095299
Compare
Thank you for the great review! 😊 |
We used call make_data_mut() from set_data_from_slice() from the days when direct mapping couldn't deal with accounts getting shrunk. That changed in solana-labs#32649 (see the if callee_account.capacity() < min_capacity check in cpi.rs:update_caller_account()). With this change we don't call make_data_mut() anymore before set_data_from_slice(), saving the cost of a memcpy since set_data_from_slice() overrides the whole account content anyway.
We used call make_data_mut() from set_data_from_slice() from the days when direct mapping couldn't deal with accounts getting shrunk. That changed in solana-labs#32649 (see the if callee_account.capacity() < min_capacity check in cpi.rs:update_caller_account()). With this change we don't call make_data_mut() anymore before set_data_from_slice(), saving the cost of a memcpy since set_data_from_slice() overrides the whole account content anyway.
We used call make_data_mut() from set_data_from_slice() from the days when direct mapping couldn't deal with accounts getting shrunk. That changed in solana-labs#32649 (see the if callee_account.capacity() < min_capacity check in cpi.rs:update_caller_account()). With this change we don't call make_data_mut() anymore before set_data_from_slice(), saving the cost of a memcpy since set_data_from_slice() overrides the whole account content anyway.
* AccountSharedData::reserve: remove extra alloc + memcpy Calling data_mut().reserve(additional) used to result in _two_ allocs and memcpys: the first to unshare the underlying vector, and the second upon calling reserve since Arc::make_mut clones so it uses capacity == len. With this fix we manually "unshare" allocating with capacity = len + additional, therefore saving on extra alloc and memcpy. * AccountSharedData::set_data_from_slice: skip extra alloc + memcpy We used call make_data_mut() from set_data_from_slice() from the days when direct mapping couldn't deal with accounts getting shrunk. That changed in solana-labs#32649 (see the if callee_account.capacity() < min_capacity check in cpi.rs:update_caller_account()). With this change we don't call make_data_mut() anymore before set_data_from_slice(), saving the cost of a memcpy since set_data_from_slice() overrides the whole account content anyway.
* AccountSharedData::reserve: remove extra alloc + memcpy Calling data_mut().reserve(additional) used to result in _two_ allocs and memcpys: the first to unshare the underlying vector, and the second upon calling reserve since Arc::make_mut clones so it uses capacity == len. With this fix we manually "unshare" allocating with capacity = len + additional, therefore saving on extra alloc and memcpy. * AccountSharedData::set_data_from_slice: skip extra alloc + memcpy We used call make_data_mut() from set_data_from_slice() from the days when direct mapping couldn't deal with accounts getting shrunk. That changed in solana-labs#32649 (see the if callee_account.capacity() < min_capacity check in cpi.rs:update_caller_account()). With this change we don't call make_data_mut() anymore before set_data_from_slice(), saving the cost of a memcpy since set_data_from_slice() overrides the whole account content anyway.
* AccountSharedData::reserve: remove extra alloc + memcpy Calling data_mut().reserve(additional) used to result in _two_ allocs and memcpys: the first to unshare the underlying vector, and the second upon calling reserve since Arc::make_mut clones so it uses capacity == len. With this fix we manually "unshare" allocating with capacity = len + additional, therefore saving on extra alloc and memcpy. * AccountSharedData::set_data_from_slice: skip extra alloc + memcpy We used call make_data_mut() from set_data_from_slice() from the days when direct mapping couldn't deal with accounts getting shrunk. That changed in solana-labs#32649 (see the if callee_account.capacity() < min_capacity check in cpi.rs:update_caller_account()). With this change we don't call make_data_mut() anymore before set_data_from_slice(), saving the cost of a memcpy since set_data_from_slice() overrides the whole account content anyway.
This fixes the following issues with the direct mapping feature, it has no impact on !direct_mapping code:
bpf_loader: direct_mapping: enforce account info pointers to be immutable
ensures that all AccountInfo pointers are not mutated. CPI looks up memory regions based on an account's data address so it can't be changed, and other pointers are locked too since there's no (legitimate) reason to ever mutate them, they point to address space setup by the program runtime.bpf_loader: cpi: access ref_to_len_in_vm through VmValue
disables "caching" of vm->host address translations. Vm address space under direct mapping has some dynamic properties (CoW, changing permissions), so memory accesses must always to go through the vm.direct mapping: improve memory permission tracking across CPI calls
ensures that account data and realloc memory regions are correctly updated to match their respective account's permissions, and that the permissions are all synchronized in one go before the individual accounts are synchronized. This is required to guarantee that while synchronizing accounts across CPI calls memory region permissions are in a consistent state.cpi: fix slow edge case zeroing extra account capacity after shrinking an account
fixes a slow edge case when resizing large accounts.cpi: direct_mapping: error out if ref_to_len_in_vm points to account memory
forbidsAccountInfo::data
from being put in account memory. CPI updates that slice and we don't want CPI to write into account data.bpf_loader: direct_mapping: map AccessViolation -> InstructionError
maps low levelAccessViolation
errors to higher levelInstructionError
s for better debuggabilityExtracted from #31986 to not include other small refactorings and new tests which will be PR-d separately.