Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposed fix to enforce byte alignment for static_array_backend feature #86

Closed
wants to merge 10 commits into from

Conversation

nand-nor
Copy link

Problem statement

The wee_alloc implementation of the static_array_backend feature erroneously causes an out-of-memory error. The context where the problem arises is in compilation for a bare-metal embedded system (OS independent, target is something like armv7-unknown-linux-gnueabihf), using the static_array_backend feature and assigning wee_alloc as the global allocator.

The root cause of the resulting error is the implementation of imp_static_array::alloc_pages(). The function creates a pointer to a slice of the SCRATCH_HEAP of the requested size, which is used to create the FreeCell objects that compose the free list. The function does not enforce that the slice pointer maintain any byte boundary alignment, resulting in unaligned FreeCell objects. This ultimately creates an edge case error where allocation fails despite adequate memory left in the SCRATCH_HEAP. The error occurs regardless of the configured static backend array size exported at compile time. Additionally, the failure arises regardless of size policy (e.g. whether or not the size_classes feature is enabled).

Detailed description of the control flow in which the problem arises

The problem is immediately detectable by enabling the extra_assertions feature. With this feature enabled, the first call to allocate memory from the free list results in an alignment assertion failure, because as described above, the FreeCell objects created from the SCRATCH_HEAP object do not have the proper alignment. However, when extra_assertions feature is not enabled, the edge case in the example system occurs when the first 256 Kb of the free list are exhausted a.k.a. a full walk of the initial free list results in alloc_with_refill() executing the code branch to create and insert a new FreeCell.

Upon an allocation request, the allocator attempts to allocate from the SCRATCH_HEAP. This is done via call to the GlobalAlloc trait’s implementation of alloc(), which calls alloc_impl(), which then calls alloc_with_refill() within a function closure passed to self.with_free_list_and_policy_for_size. The alloc_with_refill function first calls alloc_first_fit() which walks the free list to find a FreeCell of adequate size. After the 256 Kb exhaustion point, alloc_first_fit fails to find a FreeCell of adequate size. At this point, the branch for creating and inserting a new FreeCell is executed via calls to new_cell_for_free_list() and insert_into_free_list(). Control flow then returns to alloc_with_refill, which calls alloc_first_fit again, which in turn calls try_alloc() on the newly inserted FreeCell object, which is the point of failure. The checks in try_alloc() assert that the newly inserted FreeCell is large enough to hold the requested allocation size, but that it is not big enough to be split in two. This leads to the final check in try_alloc(), which determines if the FreeCell is appropriately aligned, which, because of how alloc_pages() is implemented for the impl_static_array mod, results in a failure to allocate.

Steps to reproduce problem

  1. Use the static array backend feature, compiled for a no_std target (e.g. `armv7-unknown-linux-gnueabihf`) and use the default backend array bytes size
    
  2. Allocate via `alloc::Vec::Vec()` an empty vector of u32s, and then push 65 data to it.
    
  3. wee_alloc will fail to allocate more heap memory at the attempt to push the 65th u32 to the vector, triggering an oom() error (if the system has one defined) via the method described above
    

Proposed problem fix

To address this error, the imp_static_array::alloc_pages() function should enforce alignment requirements such that pointers to slices of the SCRATCH_HEAP are always aligned to the needed byte boundary. The resulting FreeCell objects will therefore always meet alignment checks, whether or not the extra_assertions feature is enabled.

@nand-nor
Copy link
Author

Edit: fixing the imp_windows.rs alloc_pages() function call that broke CI

@fitzgen
Copy link
Member

fitzgen commented Oct 23, 2019

Thanks, I'll take a look at this within the next couple days.

Copy link
Member

@fitzgen fitzgen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the in depth analysis @nand-nor! That really helps, and I appreciate the effort that went into it.

I have some inline comments below suggesting some improvements and also some questions. Thanks!

static mut SCRATCH_HEAP: [u8; SCRATCH_LEN_BYTES] = [0; SCRATCH_LEN_BYTES];
static mut OFFSET: Mutex<usize> = Mutex::new(0);

pub(crate) unsafe fn alloc_pages(pages: Pages) -> Result<NonNull<u8>, AllocErr> {
pub(crate) unsafe fn alloc_pages<B: Into<Bytes>>(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should take Bytes directly, rather than B: Into<Bytes>. We don't want multiple, monomorphized copies of this function, so that we can keep code size small.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in upcoming push, I see that the into() capability is superfluous compared to accessing align.0.

loop {
*offset += 1;
end += 1;
ptr = SCRATCH_HEAP[*offset..end].as_mut_ptr() as *mut u8;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there is anything here that is ensuring that offset and end are still in bounds of SCRATCH_HEAP as they are pushed forward while we are searching for the alignment? That could lead to out-of-bounds accesses and memory unsafety.

Additionally, we don't need to do a one-byte-at-a-time loop while searching for alignment. Instead, we can round up to the next aligned offset with:

fn round_up_to_alignment(n: usize, align: usize) -> usize {
    extra_assert!(align > 0);
    extra_assert!(align.is_power_of_two());
    (n + align - 1) & !(align - 1) 
}

and then check that the aligned version is within bounds:

let scratch_heap_start = SCRATCH_HEAP.as_mut_ptr() as usize;
let scratch_heap_end = scratch_heap_start + SCRATCH_HEAP.len();
let unaligned_ptr = scratch_heap_start + *offset;
let aligned_ptr = round_up_to_alignment(unaligned_ptr, align);
// NB: `unaligned_ptr <= aligned_ptr` handles potential overflow in `round_up_to_alignment`.
if unaligned_ptr <= aligned_ptr && aligned_ptr < scratch_heap_end
    && aligned_ptr.checked_add(bytes.0).ok_or(AllocErr)? < scratch_heap_end
{
    // success!
} else {
   // error!
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah great catch, I foolishly got rid of an additional loop where end was incremented to be byte aligned before the end < SCRATCH_HEAP_LEN check in an attempt to make it cleaner. That was a mistake and this looks much nicer. Will add this better fix to the next push in just a moment.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This specific change causes a large number of tests to fail. After confirming that it was not the alignment enforcement change of Into trait removal, I made some small adjustments. Tests are now all passing (including tests on the system where I encountered the error) and I have now pushed. I have pasted an edited-for-brevity test log output below to show where the failures were emitted.

+ export WEE_ALLOC_STATIC_ARRAY_BACKEND_BYTES=1073741824
+ WEE_ALLOC_STATIC_ARRAY_BACKEND_BYTES=1073741824
+ cargo test --release --features 'static_array_backend extra_assertions size_classes'
   Compiling...

running 21 tests
test allocate_size_zero ... ok
...
failures:
    allocate_many_large
    allocate_many_small
    multi_threaded_quickchecks
    quickchecks_0
    quickchecks_1
    regression_test_0
    regression_test_1
    regression_test_2
    regression_test_3
    single_allocation_with_size_and_align
    smoke
    test_trace_cpp_demangle
    test_trace_dogfood
    test_trace_ffmpeg
    test_trace_find
    test_trace_gcc_hello
    test_trace_grep_random_data
    test_trace_grep_recursive
    test_trace_ls
    test_trace_source_map

test result: FAILED. 1 passed; 20 failed; 0 ignored; 0 measured; 0 filtered out

const SCRATCH_LEN_BYTES: usize = include!(concat!(
env!("OUT_DIR"),
"/wee_alloc_static_array_backend_size_bytes.txt"
));
static mut SCRATCH_HEAP: [u8; SCRATCH_LEN_BYTES] = [0; SCRATCH_LEN_BYTES];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thing we should do to avoid most scenarios where we don't have proper alignment is to force an alignment on the SCRATCH_HEAP via

#[repr(align(4096)]
struct ScratchHeap([u8; SCRATCH_LEN_BYTES] = [0; SCRATCH_LEN_BYTES]);

static mut SCRATCH_HEAP: ScratchHeap = ScratchHeap([0; SCRATCH_LEN_BYTES]);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, is just this change enough to completely fix the problem, since the only call to alloc_pages ensures that the number of pages is guaranteed to be big enough to contain a cell of the requested alignment? Which extra assertion in particular is failing right now? Is it assert_is_word_aligned? If so, then I think just this change is good enough.

Copy link
Author

@nand-nor nand-nor Oct 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added this as well to the upcoming push, and it does seem to be the ultimate fix but I have one possibly unfounded reservation. Once I implemented the above rough alignment fix no extra assertions failed. Prior to the fix the extra_assertion feature forced failure at around the first allocation attempt, I believe somewhere in alloc_first_fit in the call to
assert_aligned_to(allocated.data(), align); Looking at assert_is_word_aligned it is a wrapper for assert_aligned_to so yes potentially just this change is good enough.

However, I had instrumented your code to speed up problem source discovery, and I recall noticing that somewhere in the allocation process, the resulting size request was not always a multiple of the byte alignment requirement. I (also foolishly) reverted all of that instrumentation to submit a nice clean PR and do not exactly recall where, so potentially this is not an issue. But for instance it may have been when, in new_cell_for_free_list, the size of the requested allocation is compared to and changed based on whether the value ensures the minimum cell size is left over after allocation (e.g. in new_cell_for_free_list in lib.rs:780 let size: Bytes = cmp::max(size.into(), (align + Self::MIN_CELL_SIZE) * Words(2)); ). Edit: the size I observed over many allocations may have been in pages rather than bytes, so again this may be a non-issue.

Im not sure without instrumenting the code again, but is it possible that the end resulting size request may result in *offset becoming unaligned so that the next allocation will be unaligned even if the SCRATCH_HEAP is forced to align as a page boundary? If this is possible then Im not sure just this fix would solve the issue. I can look into this at some point if you would like?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code should be ensuring that the requested number of pages allocated should always be able to satisfy the allocation request, if alloc_pages succeeds: https://github.com/rustwasm/wee_alloc/blob/master/wee_alloc/src/lib.rs#L776-L783

The allocation doesn't have to be a multiple of the alignment, just has to be big enough that an aligned chunk of the requested size must exist within it somewhere.

aclifford added 4 commits October 25, 2019 09:00
@fitzgen
Copy link
Member

fitzgen commented Oct 28, 2019

FYI: I've split out just the alignment of the scratch heap into #87

@nand-nor
Copy link
Author

FYI: I've split out just the alignment of the scratch heap into #87

Should I close this PR then? Based on the above discussion / testing /and #87 the problem seems solved

@fitzgen
Copy link
Member

fitzgen commented Nov 4, 2019

Based on the above discussion / testing /and #87 the problem seems solved

Great! That was what I was hoping would happen, but I didn't want to close this out until you confirmed that :)

Should I close this PR then?

Yep, I think we can close this PR. If you run into any thing else, please file an issue!

@fitzgen fitzgen closed this Nov 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants