-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
alloc_slice in TypedArena #37220
alloc_slice in TypedArena #37220
Conversation
self.mk_ty(TyTuple(self.mk_type_list(ts))) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indentation looks messed up here
} | ||
} | ||
|
||
fn keep_local<'tcx, T: ty::TypeFoldable<'tcx>>(x: &T) -> bool { | ||
x.has_type_flags(ty::TypeFlags::KEEP_IN_LOCAL_TCX) | ||
} | ||
|
||
fn keep_local_slice<'tcx, T: ty::TypeFoldable<'tcx>>(xs: &[T]) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be a slice? Could it be:
fn keep_local_iter<'tcx, T, I>(xs: I) -> bool
where T: ty::TypeFoldable<'tcx>,
I: IntoIterator<T>
{
xs.into_iter().any(keep_local)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no point, it's a tiny helper. Besides, what you wrote doesn't work because the iterator is of references and for keep_local
the type behind the reference is foldable.
☔ The latest upstream changes (presumably #37129) made this pull request unmergeable. Please resolve the merge conflicts. |
self.grow(slice.len()); | ||
} | ||
|
||
//while (self.end.get() as usize - self.ptr.get() as usize) < slice.len() * mem::size_of::<T>() { self.grow_old() } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove commented out debug code.
//let arena_slice = slice::from_raw_parts_mut(v.as_mut_ptr(), v.len()); | ||
//mem::forget(v); | ||
//self.ptr.set(start_ptr.offset(arena_slice.len() as isize)); | ||
//arena_slice |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove commented out debug code.
unsafe { | ||
let start_ptr = self.ptr.get(); | ||
let arena_slice = slice::from_raw_parts_mut(start_ptr, slice.len()); | ||
assert!(start_ptr as *const _ != slice.as_ptr()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove debug assert.
assert!(start_ptr as *const _ != slice.as_ptr()); | ||
self.ptr.set(start_ptr.offset(arena_slice.len() as isize)); | ||
arena_slice.copy_from_slice(slice); | ||
assert!(self.ptr.get() <= self.end.get()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove debug assert.
macro_rules! slice_interners { | ||
($($field:ident: $method:ident($ty:ident)),+) => ( | ||
$(intern_method!('tcx, $field: $method(&[$ty<'tcx>], alloc_slice, Deref::deref, | ||
|xs: &[$ty]| -> &Slice<$ty> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
is misaligned (moving it two spaces to the right should be fine).
($($field:ident: $method:ident($ty:ident)),+) => ( | ||
$(intern_method!('tcx, $field: $method(&[$ty<'tcx>], alloc_slice, Deref::deref, | ||
|xs: &[$ty]| -> &Slice<$ty> { | ||
debug!("KEK ({:?}, {}) => {:?}", xs.as_ptr(), xs.len(), xs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove debug logging.
$(intern_method!('tcx, $field: $method(&[$ty<'tcx>], alloc_slice, Deref::deref, | ||
|xs: &[$ty]| -> &Slice<$ty> { | ||
debug!("KEK ({:?}, {}) => {:?}", xs.as_ptr(), xs.len(), xs); | ||
unsafe { mem::transmute(xs) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unindent once.
6073a47
to
3865da2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, waiting on Travis to succeed. Should write a description and include perf numbers.
3865da2
to
233f57d
Compare
where T: Copy { | ||
assert!(mem::size_of::<T>() != 0); | ||
if slice.len() == 0 { | ||
return unsafe { slice::from_raw_parts_mut(heap::EMPTY as *mut T, 0) }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason this isn't just &mut []
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
&mut []
doesn't have a guaranteed stable address across instantiations and we depend on pointer comparisons working. That said, this would only ever be invoked once so maybe it's fine to do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify, should we replace this with &mut []
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds far too fragile to do that.
#[inline] | ||
pub fn alloc_slice(&self, slice: &[T]) -> &mut [T] | ||
where T: Copy { | ||
assert!(mem::size_of::<T>() != 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be mentioned in the doc comment I guess, since we don't have a type system way to enforce it.
r=me with @bluss' comments addressed |
Added documentation about panicking on ZSTs. Up to @eddyb if we want to merge now and I can work on further optimizations in a future PR. |
@bors r+ |
📌 Commit f9c0c30 has been approved by |
b88bbfa
to
7a38599
Compare
@bors r+ |
📌 Commit 7a38599 has been approved by |
…r=eddyb alloc_slice in TypedArena Added `TypedArena::alloc_slice`, and moved from using `TypedArena<Vec<T>>` to `TypedArena<T>`. `TypedArena::alloc_slice` is implemented by copying the slices elements into the typed arena, requiring that `T: Copy` when using it. We allocate a new chunk when there's insufficient space remaining in the previous chunk, and we cannot resize the old chunk in place. This is non-optimal, since we may waste allocated space when allocating (especially longer) slices, but is considered good enough for the time being. This change also reduces heap fragmentation, since the arena now directly stores objects instead of storing the Vec's length and pointer to its contents. Performance: ``` futures-rs-test 5.048s vs 5.061s --> 0.997x faster (variance: 1.028x, 1.020x) helloworld 0.284s vs 0.295s --> 0.963x faster (variance: 1.207x, 1.189x) html5ever-2016- 8.396s vs 8.208s --> 1.023x faster (variance: 1.019x, 1.036x) hyper.0.5.0 5.768s vs 5.797s --> 0.995x faster (variance: 1.027x, 1.028x) inflate-0.1.0 5.213s vs 5.069s --> 1.028x faster (variance: 1.008x, 1.022x) issue-32062-equ 0.428s vs 0.467s --> 0.916x faster (variance: 1.188x, 1.015x) issue-32278-big 1.949s vs 2.010s --> 0.970x faster (variance: 1.112x, 1.049x) jld-day15-parse 1.795s vs 1.877s --> 0.956x faster (variance: 1.037x, 1.015x) piston-image-0. 13.554s vs 13.522s --> 1.002x faster (variance: 1.019x, 1.020x) rust-encoding-0 2.489s vs 2.465s --> 1.010x faster (variance: 1.047x, 1.086x) syntex-0.42.2 34.646s vs 34.593s --> 1.002x faster (variance: 1.007x, 1.005x) syntex-0.42.2-i 17.181s vs 17.163s --> 1.001x faster (variance: 1.004x, 1.004x) ``` r? @eddyb
🔒 Merge conflict |
☔ The latest upstream changes (presumably #37269) made this pull request unmergeable. Please resolve the merge conflicts. |
7a38599
to
83b1982
Compare
@eddyb Rebased. |
@bors r+ |
📌 Commit 83b1982 has been approved by |
alloc_slice in TypedArena Added `TypedArena::alloc_slice`, and moved from using `TypedArena<Vec<T>>` to `TypedArena<T>`. `TypedArena::alloc_slice` is implemented by copying the slices elements into the typed arena, requiring that `T: Copy` when using it. We allocate a new chunk when there's insufficient space remaining in the previous chunk, and we cannot resize the old chunk in place. This is non-optimal, since we may waste allocated space when allocating (especially longer) slices, but is considered good enough for the time being. This change also reduces heap fragmentation, since the arena now directly stores objects instead of storing the Vec's length and pointer to its contents. Performance: ``` futures-rs-test 5.048s vs 5.061s --> 0.997x faster (variance: 1.028x, 1.020x) helloworld 0.284s vs 0.295s --> 0.963x faster (variance: 1.207x, 1.189x) html5ever-2016- 8.396s vs 8.208s --> 1.023x faster (variance: 1.019x, 1.036x) hyper.0.5.0 5.768s vs 5.797s --> 0.995x faster (variance: 1.027x, 1.028x) inflate-0.1.0 5.213s vs 5.069s --> 1.028x faster (variance: 1.008x, 1.022x) issue-32062-equ 0.428s vs 0.467s --> 0.916x faster (variance: 1.188x, 1.015x) issue-32278-big 1.949s vs 2.010s --> 0.970x faster (variance: 1.112x, 1.049x) jld-day15-parse 1.795s vs 1.877s --> 0.956x faster (variance: 1.037x, 1.015x) piston-image-0. 13.554s vs 13.522s --> 1.002x faster (variance: 1.019x, 1.020x) rust-encoding-0 2.489s vs 2.465s --> 1.010x faster (variance: 1.047x, 1.086x) syntex-0.42.2 34.646s vs 34.593s --> 1.002x faster (variance: 1.007x, 1.005x) syntex-0.42.2-i 17.181s vs 17.163s --> 1.001x faster (variance: 1.004x, 1.004x) ``` r? @eddyb
…ddyb Add ArrayVec and AccumulateVec to reduce heap allocations during interning of slices Updates `mk_tup`, `mk_type_list`, and `mk_substs` to allow interning directly from iterators. The previous PR, #37220, changed some of the calls to pass a borrowed slice from `Vec` instead of directly passing the iterator, and these changes further optimize that to avoid the allocation entirely. This change yields 50% less malloc calls in [some cases](https://pastebin.mozilla.org/8921686). It also yields decent, though not amazing, performance improvements: ``` futures-rs-test 4.091s vs 4.021s --> 1.017x faster (variance: 1.004x, 1.004x) helloworld 0.219s vs 0.220s --> 0.993x faster (variance: 1.010x, 1.018x) html5ever-2016- 3.805s vs 3.736s --> 1.018x faster (variance: 1.003x, 1.009x) hyper.0.5.0 4.609s vs 4.571s --> 1.008x faster (variance: 1.015x, 1.017x) inflate-0.1.0 3.864s vs 3.883s --> 0.995x faster (variance: 1.232x, 1.005x) issue-32062-equ 0.309s vs 0.299s --> 1.033x faster (variance: 1.014x, 1.003x) issue-32278-big 1.614s vs 1.594s --> 1.013x faster (variance: 1.007x, 1.004x) jld-day15-parse 1.390s vs 1.326s --> 1.049x faster (variance: 1.006x, 1.009x) piston-image-0. 10.930s vs 10.675s --> 1.024x faster (variance: 1.006x, 1.010x) reddit-stress 2.302s vs 2.261s --> 1.019x faster (variance: 1.010x, 1.026x) regex.0.1.30 2.250s vs 2.240s --> 1.005x faster (variance: 1.087x, 1.011x) rust-encoding-0 1.895s vs 1.887s --> 1.005x faster (variance: 1.005x, 1.018x) syntex-0.42.2 29.045s vs 28.663s --> 1.013x faster (variance: 1.004x, 1.006x) syntex-0.42.2-i 13.925s vs 13.868s --> 1.004x faster (variance: 1.022x, 1.007x) ``` We implement a small-size optimized vector, intended to be used primarily for collection of presumed to be short iterators. This vector cannot be "upsized/reallocated" into a heap-allocated vector, since that would require (slow) branching logic, but during the initial collection from an iterator heap-allocation is possible. We make the new `AccumulateVec` and `ArrayVec` generic over implementors of the `Array` trait, of which there is currently one, `[T; 8]`. In the future, this is likely to expand to other values of N. Huge thanks to @nnethercote for collecting the performance and other statistics mentioned above.
Added
TypedArena::alloc_slice
, and moved from usingTypedArena<Vec<T>>
toTypedArena<T>
.TypedArena::alloc_slice
is implemented by copying the slices elements into the typed arena, requiring thatT: Copy
when using it. We allocate a new chunk when there's insufficient space remaining in the previous chunk, and we cannot resize the old chunk in place. This is non-optimal, since we may waste allocated space when allocating (especially longer) slices, but is considered good enough for the time being.This change also reduces heap fragmentation, since the arena now directly stores objects instead of storing the Vec's length and pointer to its contents.
Performance:
r? @eddyb