-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GC unit test infra, start adding GC tests #2573
Conversation
This PR does not affect the produced WebAssembly code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I must admit, this PR is kinda large to review now, but hopefully some of the comments are useful....
pub fn test() { | ||
println!("Testing garbage collection ..."); | ||
|
||
// TODO: Add more tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, I wonder if adding some monadic syntax to let you build heap graphs using monadic syntax would be useful for writing tests. Does Rust let you do that easily?
unsafe { | ||
copying_gc_internal( | ||
&mut heap, | ||
heap_base, | ||
// get_hp | ||
|| heap_1.heap_ptr_address(), | ||
// set_hp | ||
move |hp| heap_2.set_heap_ptr_address(hp as usize), | ||
static_roots, | ||
closure_table_address, | ||
// note_live_size | ||
|_live_size| {}, | ||
// note_reclaimed | ||
|_reclaimed| {}, | ||
); | ||
} | ||
} | ||
|
||
GC::MarkCompact => { | ||
unsafe { | ||
compacting_gc_internal( | ||
&mut heap, | ||
heap_base, | ||
// get_hp | ||
|| heap_1.heap_ptr_address(), | ||
// set_hp | ||
move |hp| heap_2.set_heap_ptr_address(hp as usize), | ||
static_roots, | ||
closure_table_address, | ||
// note_live_size | ||
|_live_size| {}, | ||
// note_reclaimed | ||
|_reclaimed| {}, | ||
); | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The arguments look identical here. Can you instead compute the gc function and apply the result to the arguments, sans repetition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the problem here is compacting_gc_internal
and copying_gc_internal
functions have generic parameters, so they're like C++ templates rather than OCaml/Haskell functions with type parameters. We can't return them from functions for two reasons (1) type system not powerful enough for higher rank types, so we can't return a value of type forall x y z . ...
(2) the runtime does not support values with type parameters (which would require monomorphising in runtime). I also mentioned this briefly in the GC
type, copying here:
/// Enum for the GC implementations. GC functions are generic so we can't put them into arrays or
/// other data types, we use this type instead.
#[derive(Debug, Clone, Copy)]
pub enum GC {
Copying,
MarkCompact,
}
#[no_mangle] | ||
unsafe extern "C" fn float_fmt(a: f64, prec: u32, mode: u32) -> SkewedPtr { | ||
#[ic_mem_fn] | ||
unsafe fn float_fmt<M: Memory>(mem: &mut M, a: f64, prec: u32, mode: u32) -> SkewedPtr { | ||
// prec and mode are tagged (TODO (osa): what tag???) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unresolved TODO? @ggreif might know the answers to your questions...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, so this code was ported from the C implementation of RTS and I added this TODO to ask about it later and document it. Would be good to know what "tag" means here and document it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let reclaimed = (end_from_space - begin_from_space) - (end_to_space - begin_to_space); | ||
note_reclaimed(Bytes(reclaimed as u32)); | ||
|
||
// Copy to-space to the beginning of from-space |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious: I wonder why we do this rather than just switching the role of the two space and from space. Is that to accommodate a memory.grow on the next round of mutation and avoid interpreting pointers as offset relative to the current to_space base?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that to accommodate a memory.grow on the next round of mutation and avoid interpreting pointers as offset relative to the current to_space base?
It is to allow bump allocation until the end of Wasm heap without loss of efficiency. If keep using to-space we will have less room to bump the heap pointer until end of Wasm heap. Before copying GC the heap is like this:
| static heap | dynamic heap |........(unallocated)...........|
^ ^ ^ ^
0 heap base heap pointer end of Wasm heap
(allocation pointer) (4GiB)
Allocation is done by bumping the heap pointer until it reaches end of Wasm heap, and calling memory.grow
on the way to allocate Wasm pages when needed.
After copying live data to to-space it looks like this:
| static heap | from-space |to-space |....(unallocated)....|
^ ^ ^ ^ ^
0 heap base to-space heap pointer end of Wasm heap
Now if we don't move to-space back to from-space we'll have much less space to allocate (the "unallocated" part).
We could implement a slow path for allocation that, when hp == end of Wasm heap
, checks if we have any space between heap base and beginning of current space. Or more generally, we could make allocation area a list of ranges on the heap (though I think we will need at most two ranges in this list), so handle cases like:
| static heap | to-space | from-space | to-space |
^
end of Wasm heap
Which becomes this after GC:
| static heap | alloc area | live data | alloc area |
^
end of Wasm heap
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, all makes sense, thanks for the explanation!
Originally discussed here: #2573 (comment) Previously grow_memory would allocate one more page than necessary when the argument is a multiple of Wasm page size. New calculation fixes that. The code is basically (ptr + WASM_PAGE_SIZE - 1) / WASM_PAGE_SIZE I.e. divide argument by WASM_PAGE_SIZE, round up. However we do this in u64 to avoid overflows in `ptr` is larger than `u32::MAX - WASM_PAGE_SIZE + 1` Diff of mo-rts.wasm before and after: --- mo-rts.wat 2021-06-30 08:50:10.319312072 +0300 +++ ../../motoko_2/rts/mo-rts.wat 2021-06-30 08:50:08.507322577 +0300 @@ -18760,12 +18760,14 @@ (func $motoko_rts::alloc::alloc_impl::grow_memory::hd1a4f99ad46d13e9 (type 7) (param i32) block ;; label = @1 local.get 0 - i32.const 16 - i32.shr_u + i64.extend_i32_u + i64.const 65535 + i64.add + i64.const 16 + i64.shr_u + i32.wrap_i64 memory.size i32.sub - i32.const 1 - i32.add local.tee 0 i32.const 1 i32.lt_s So two extra instructions. CanCan backend benchmark reports 0.004% increase in cycles (42,055,656 to 42,057,358, +1,702).
Co-authored-by: Claudio Russo <claudio@dfinity.org>
Originally discussed here: #2573 (comment) Previously grow_memory would allocate one more page than necessary when the argument is a multiple of Wasm page size. New calculation fixes that. The code is basically (ptr + WASM_PAGE_SIZE - 1) / WASM_PAGE_SIZE I.e. divide argument by WASM_PAGE_SIZE, round up. However we do this in u64 to avoid overflows in `ptr` is larger than `u32::MAX - WASM_PAGE_SIZE + 1` Also moves `total_pages - current_pages` to the slow path to improve performance slightly. CanCan backend benchmark reports 0.002% increase in cycles.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious why you got rid of ic_fn, but LGTM
It wasn't as useful as
into
I can do the same without using the macro by adding
and use it in both tests and from generated code. |
Accidental renaming was done in #2573 while renaming the "Heap" trait to "Memory" and "heap" identifiers to "mem".
Accidental renaming was done in #2573 while renaming the "Heap" trait to "Memory" and "heap" identifiers to "mem".
This refactors the RTS to allow testing garbage collectors and adds a simple GC
test (more tests to be implemented in a new PR).
Major changes:
New trait
Memory
is introduced, with aalloc_words
method for heapallocation.
All allocating functions now take a generic
Memory
argument. Part of thisis necessary for testing the collectors, but this also enables more testing:
we now test more code paths in mark stack implementation.
Copying and mark-compact GC entry points now take these arguments:
mem: &mut M
: memory implementationget_hp
: callback to get heap pointerset_hp: SetHp
: callback to set heap pointermem_base: u32
: beginning of dynamic heapmem_end: u32
: end of dynamic heapstatic_roots: SkewedPtr
: pointer to the static root arrayclosure_table_loc: *mut SkewedPtr
: address of the closure table indynamic heap
Monomorphised versions of copying and mark-compact GC entry points added to
be called from compiled code
Copying GC refactored to avoid using
grow_memory
, it now usesalloc_words
to allocate in to-space.
New crate
motoko-rts-macros
added. This defines a macroic_mem_fn
whichis used to generate monomorphised versions of allocationg functions, to be
called from compiled code.
Randomized testing in RTS tests now use proptest instead of quickcheck.
proptest is much more flexible than quickcheck, allows easier testing in some
cases.
Perf changes:
Using CanCan backend benchmark as described here:
#2033 (comment)
With copying GC the the benchmark uses 1.5% more instructions. I suspect this
is caused by the change in copying GC to use
alloc_words
instead ofgrow_memory
.With compacting GC the benchmark uses 0.12% less instructions
TODOs:
big-array-access
. It's a failing test, and thebacktrace changes depending on RTS build flags (debug or release).