alloc_slice in TypedArena #37220

Mark-Simulacrum · 2016-10-16T18:21:41Z

Added TypedArena::alloc_slice, and moved from using TypedArena<Vec<T>> to TypedArena<T>.

TypedArena::alloc_slice is implemented by copying the slices elements into the typed arena, requiring that T: Copy when using it. We allocate a new chunk when there's insufficient space remaining in the previous chunk, and we cannot resize the old chunk in place. This is non-optimal, since we may waste allocated space when allocating (especially longer) slices, but is considered good enough for the time being.

This change also reduces heap fragmentation, since the arena now directly stores objects instead of storing the Vec's length and pointer to its contents.

Performance:

futures-rs-test  5.048s vs  5.061s --> 0.997x faster (variance: 1.028x, 1.020x)
helloworld       0.284s vs  0.295s --> 0.963x faster (variance: 1.207x, 1.189x)
html5ever-2016-  8.396s vs  8.208s --> 1.023x faster (variance: 1.019x, 1.036x)
hyper.0.5.0      5.768s vs  5.797s --> 0.995x faster (variance: 1.027x, 1.028x)
inflate-0.1.0    5.213s vs  5.069s --> 1.028x faster (variance: 1.008x, 1.022x)
issue-32062-equ  0.428s vs  0.467s --> 0.916x faster (variance: 1.188x, 1.015x)
issue-32278-big  1.949s vs  2.010s --> 0.970x faster (variance: 1.112x, 1.049x)
jld-day15-parse  1.795s vs  1.877s --> 0.956x faster (variance: 1.037x, 1.015x)
piston-image-0. 13.554s vs 13.522s --> 1.002x faster (variance: 1.019x, 1.020x)
rust-encoding-0  2.489s vs  2.465s --> 1.010x faster (variance: 1.047x, 1.086x)
syntex-0.42.2   34.646s vs 34.593s --> 1.002x faster (variance: 1.007x, 1.005x)
syntex-0.42.2-i 17.181s vs 17.163s --> 1.001x faster (variance: 1.004x, 1.004x)

r? @eddyb

jonas-schievink · 2016-10-16T18:24:52Z

src/librustc/ty/context.rs

        self.mk_ty(TyTuple(self.mk_type_list(ts)))
-    }
+        }


Indentation looks messed up here

frewsxcv · 2016-10-16T18:34:00Z

src/librustc/ty/context.rs

    }
 }

 fn keep_local<'tcx, T: ty::TypeFoldable<'tcx>>(x: &T) -> bool {
    x.has_type_flags(ty::TypeFlags::KEEP_IN_LOCAL_TCX)
 }

+fn keep_local_slice<'tcx, T: ty::TypeFoldable<'tcx>>(xs: &[T]) -> bool {


Does this need to be a slice? Could it be:

fn keep_local_iter<'tcx, T, I>(xs: I) -> bool where T: ty::TypeFoldable<'tcx>, I: IntoIterator<T> { xs.into_iter().any(keep_local) }

There's no point, it's a tiny helper. Besides, what you wrote doesn't work because the iterator is of references and for keep_local the type behind the reference is foldable.

bors · 2016-10-17T03:07:22Z

☔ The latest upstream changes (presumably #37129) made this pull request unmergeable. Please resolve the merge conflicts.

eddyb · 2016-10-17T03:11:52Z

src/libarena/lib.rs

+            self.grow(slice.len());
+        }
+
+        //while (self.end.get() as usize - self.ptr.get() as usize) < slice.len() * mem::size_of::<T>() { self.grow_old() }


Remove commented out debug code.

eddyb · 2016-10-17T03:11:57Z

src/libarena/lib.rs

+            //let arena_slice = slice::from_raw_parts_mut(v.as_mut_ptr(), v.len());
+            //mem::forget(v);
+            //self.ptr.set(start_ptr.offset(arena_slice.len() as isize));
+            //arena_slice


Remove commented out debug code.

eddyb · 2016-10-17T03:12:06Z

src/libarena/lib.rs

+        unsafe {
+            let start_ptr = self.ptr.get();
+            let arena_slice = slice::from_raw_parts_mut(start_ptr, slice.len());
+            assert!(start_ptr as *const _ != slice.as_ptr());


Remove debug assert.

eddyb · 2016-10-17T03:12:11Z

src/libarena/lib.rs

+            assert!(start_ptr as *const _ != slice.as_ptr());
+            self.ptr.set(start_ptr.offset(arena_slice.len() as isize));
+            arena_slice.copy_from_slice(slice);
+            assert!(self.ptr.get() <= self.end.get());


Remove debug assert.

eddyb · 2016-10-17T03:12:52Z

src/librustc/ty/context.rs

+macro_rules! slice_interners {
+    ($($field:ident: $method:ident($ty:ident)),+) => (
+        $(intern_method!('tcx, $field: $method(&[$ty<'tcx>], alloc_slice, Deref::deref,
+                                             |xs: &[$ty]| -> &Slice<$ty> {


| is misaligned (moving it two spaces to the right should be fine).

eddyb · 2016-10-17T03:12:59Z

src/librustc/ty/context.rs

+    ($($field:ident: $method:ident($ty:ident)),+) => (
+        $(intern_method!('tcx, $field: $method(&[$ty<'tcx>], alloc_slice, Deref::deref,
+                                             |xs: &[$ty]| -> &Slice<$ty> {
+                debug!("KEK ({:?}, {}) => {:?}", xs.as_ptr(), xs.len(), xs);


Remove debug logging.

eddyb · 2016-10-17T03:13:14Z

src/librustc/ty/context.rs

+        $(intern_method!('tcx, $field: $method(&[$ty<'tcx>], alloc_slice, Deref::deref,
+                                             |xs: &[$ty]| -> &Slice<$ty> {
+                debug!("KEK ({:?}, {}) => {:?}", xs.as_ptr(), xs.len(), xs);
+                unsafe { mem::transmute(xs) }


Unindent once.

eddyb

LGTM, waiting on Travis to succeed. Should write a description and include perf numbers.

bluss · 2016-10-17T06:12:46Z

src/libarena/lib.rs

+        where T: Copy {
+        assert!(mem::size_of::<T>() != 0);
+        if slice.len() == 0 {
+            return unsafe { slice::from_raw_parts_mut(heap::EMPTY as *mut T, 0) };


Is there a reason this isn't just &mut [] ?

&mut [] doesn't have a guaranteed stable address across instantiations and we depend on pointer comparisons working. That said, this would only ever be invoked once so maybe it's fine to do that.

To clarify, should we replace this with &mut []?

Sounds far too fragile to do that.

bluss · 2016-10-17T06:14:09Z

src/libarena/lib.rs

+    #[inline]
+    pub fn alloc_slice(&self, slice: &[T]) -> &mut [T]
+        where T: Copy {
+        assert!(mem::size_of::<T>() != 0);


This should be mentioned in the doc comment I guess, since we don't have a type system way to enforce it.

eddyb · 2016-10-17T07:40:11Z

r=me with @bluss' comments addressed

Mark-Simulacrum · 2016-10-17T21:06:28Z

Added documentation about panicking on ZSTs.

Up to @eddyb if we want to merge now and I can work on further optimizations in a future PR.

eddyb · 2016-10-17T21:08:38Z

@bors r+

bors · 2016-10-17T21:08:39Z

📌 Commit f9c0c30 has been approved by eddyb

eddyb · 2016-10-18T15:07:11Z

@bors r+

bors · 2016-10-18T15:07:13Z

📌 Commit 7a38599 has been approved by eddyb

@eddyb

…r=eddyb alloc_slice in TypedArena Added `TypedArena::alloc_slice`, and moved from using `TypedArena<Vec<T>>` to `TypedArena<T>`. `TypedArena::alloc_slice` is implemented by copying the slices elements into the typed arena, requiring that `T: Copy` when using it. We allocate a new chunk when there's insufficient space remaining in the previous chunk, and we cannot resize the old chunk in place. This is non-optimal, since we may waste allocated space when allocating (especially longer) slices, but is considered good enough for the time being. This change also reduces heap fragmentation, since the arena now directly stores objects instead of storing the Vec's length and pointer to its contents. Performance: ``` futures-rs-test 5.048s vs 5.061s --> 0.997x faster (variance: 1.028x, 1.020x) helloworld 0.284s vs 0.295s --> 0.963x faster (variance: 1.207x, 1.189x) html5ever-2016- 8.396s vs 8.208s --> 1.023x faster (variance: 1.019x, 1.036x) hyper.0.5.0 5.768s vs 5.797s --> 0.995x faster (variance: 1.027x, 1.028x) inflate-0.1.0 5.213s vs 5.069s --> 1.028x faster (variance: 1.008x, 1.022x) issue-32062-equ 0.428s vs 0.467s --> 0.916x faster (variance: 1.188x, 1.015x) issue-32278-big 1.949s vs 2.010s --> 0.970x faster (variance: 1.112x, 1.049x) jld-day15-parse 1.795s vs 1.877s --> 0.956x faster (variance: 1.037x, 1.015x) piston-image-0. 13.554s vs 13.522s --> 1.002x faster (variance: 1.019x, 1.020x) rust-encoding-0 2.489s vs 2.465s --> 1.010x faster (variance: 1.047x, 1.086x) syntex-0.42.2 34.646s vs 34.593s --> 1.002x faster (variance: 1.007x, 1.005x) syntex-0.42.2-i 17.181s vs 17.163s --> 1.001x faster (variance: 1.004x, 1.004x) ``` r? @eddyb

Rollup of 23 pull requests - Successful merges: #36964, #37108, #37117, #37124, #37161, #37176, #37182, #37193, #37198, #37202, #37208, #37218, #37221, #37224, #37230, #37231, #37233, #37236, #37240, #37254, #37257, #37265, #37267 - Failed merges: #37213, #37220, #37261

bors · 2016-10-19T09:53:28Z

🔒 Merge conflict

bors · 2016-10-19T09:57:49Z

☔ The latest upstream changes (presumably #37269) made this pull request unmergeable. Please resolve the merge conflicts.

Mark-Simulacrum · 2016-10-19T13:54:21Z

@eddyb Rebased.

eddyb · 2016-10-19T14:59:34Z

@bors r+

bors · 2016-10-19T14:59:35Z

📌 Commit 83b1982 has been approved by eddyb

bors · 2016-10-19T16:52:49Z

⌛ Testing commit 83b1982 with merge d337f34...

@eddyb

alloc_slice in TypedArena Added `TypedArena::alloc_slice`, and moved from using `TypedArena<Vec<T>>` to `TypedArena<T>`. `TypedArena::alloc_slice` is implemented by copying the slices elements into the typed arena, requiring that `T: Copy` when using it. We allocate a new chunk when there's insufficient space remaining in the previous chunk, and we cannot resize the old chunk in place. This is non-optimal, since we may waste allocated space when allocating (especially longer) slices, but is considered good enough for the time being. This change also reduces heap fragmentation, since the arena now directly stores objects instead of storing the Vec's length and pointer to its contents. Performance: ``` futures-rs-test 5.048s vs 5.061s --> 0.997x faster (variance: 1.028x, 1.020x) helloworld 0.284s vs 0.295s --> 0.963x faster (variance: 1.207x, 1.189x) html5ever-2016- 8.396s vs 8.208s --> 1.023x faster (variance: 1.019x, 1.036x) hyper.0.5.0 5.768s vs 5.797s --> 0.995x faster (variance: 1.027x, 1.028x) inflate-0.1.0 5.213s vs 5.069s --> 1.028x faster (variance: 1.008x, 1.022x) issue-32062-equ 0.428s vs 0.467s --> 0.916x faster (variance: 1.188x, 1.015x) issue-32278-big 1.949s vs 2.010s --> 0.970x faster (variance: 1.112x, 1.049x) jld-day15-parse 1.795s vs 1.877s --> 0.956x faster (variance: 1.037x, 1.015x) piston-image-0. 13.554s vs 13.522s --> 1.002x faster (variance: 1.019x, 1.020x) rust-encoding-0 2.489s vs 2.465s --> 1.010x faster (variance: 1.047x, 1.086x) syntex-0.42.2 34.646s vs 34.593s --> 1.002x faster (variance: 1.007x, 1.005x) syntex-0.42.2-i 17.181s vs 17.163s --> 1.001x faster (variance: 1.004x, 1.004x) ``` r? @eddyb

bors · 2016-10-19T20:02:46Z

@nnethercote

…ddyb Add ArrayVec and AccumulateVec to reduce heap allocations during interning of slices Updates `mk_tup`, `mk_type_list`, and `mk_substs` to allow interning directly from iterators. The previous PR, #37220, changed some of the calls to pass a borrowed slice from `Vec` instead of directly passing the iterator, and these changes further optimize that to avoid the allocation entirely. This change yields 50% less malloc calls in [some cases](https://pastebin.mozilla.org/8921686). It also yields decent, though not amazing, performance improvements: ``` futures-rs-test 4.091s vs 4.021s --> 1.017x faster (variance: 1.004x, 1.004x) helloworld 0.219s vs 0.220s --> 0.993x faster (variance: 1.010x, 1.018x) html5ever-2016- 3.805s vs 3.736s --> 1.018x faster (variance: 1.003x, 1.009x) hyper.0.5.0 4.609s vs 4.571s --> 1.008x faster (variance: 1.015x, 1.017x) inflate-0.1.0 3.864s vs 3.883s --> 0.995x faster (variance: 1.232x, 1.005x) issue-32062-equ 0.309s vs 0.299s --> 1.033x faster (variance: 1.014x, 1.003x) issue-32278-big 1.614s vs 1.594s --> 1.013x faster (variance: 1.007x, 1.004x) jld-day15-parse 1.390s vs 1.326s --> 1.049x faster (variance: 1.006x, 1.009x) piston-image-0. 10.930s vs 10.675s --> 1.024x faster (variance: 1.006x, 1.010x) reddit-stress 2.302s vs 2.261s --> 1.019x faster (variance: 1.010x, 1.026x) regex.0.1.30 2.250s vs 2.240s --> 1.005x faster (variance: 1.087x, 1.011x) rust-encoding-0 1.895s vs 1.887s --> 1.005x faster (variance: 1.005x, 1.018x) syntex-0.42.2 29.045s vs 28.663s --> 1.013x faster (variance: 1.004x, 1.006x) syntex-0.42.2-i 13.925s vs 13.868s --> 1.004x faster (variance: 1.022x, 1.007x) ``` We implement a small-size optimized vector, intended to be used primarily for collection of presumed to be short iterators. This vector cannot be "upsized/reallocated" into a heap-allocated vector, since that would require (slow) branching logic, but during the initial collection from an iterator heap-allocation is possible. We make the new `AccumulateVec` and `ArrayVec` generic over implementors of the `Array` trait, of which there is currently one, `[T; 8]`. In the future, this is likely to expand to other values of N. Huge thanks to @nnethercote for collecting the performance and other statistics mentioned above.

rust-highfive assigned eddyb Oct 16, 2016

jonas-schievink reviewed Oct 16, 2016

View reviewed changes

src/librustc/ty/context.rs

self.mk_ty(TyTuple(self.mk_type_list(ts)))

}

}

Copy link

Contributor

jonas-schievink Oct 16, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation looks messed up here

frewsxcv reviewed Oct 16, 2016

View reviewed changes

eddyb suggested changes Oct 17, 2016

View reviewed changes

Mark-Simulacrum force-pushed the arena-alloc-slice branch from 6073a47 to 3865da2 Compare October 17, 2016 03:25

eddyb approved these changes Oct 17, 2016

View reviewed changes

Mark-Simulacrum force-pushed the arena-alloc-slice branch from 3865da2 to 233f57d Compare October 17, 2016 04:58

bluss reviewed Oct 17, 2016

View reviewed changes

Mark-Simulacrum force-pushed the arena-alloc-slice branch 3 times, most recently from b88bbfa to 7a38599 Compare October 18, 2016 12:51

eddyb mentioned this pull request Oct 18, 2016

Rollup of 16 pull requests #37262

Closed

eddyb mentioned this pull request Oct 19, 2016

Rollup of 23 pull requests #37269

Merged

Mark-Simulacrum mentioned this pull request Oct 19, 2016

Add ArrayVec and AccumulateVec to reduce heap allocations during interning of slices #37270

Merged

Add TypedArena::alloc_slice.

a714c2a

Use TypedArena::alloc_slice in rustc.

83b1982

Mark-Simulacrum force-pushed the arena-alloc-slice branch from 7a38599 to 83b1982 Compare October 19, 2016 13:54

bors merged commit 83b1982 into rust-lang:master Oct 19, 2016

Mark-Simulacrum deleted the arena-alloc-slice branch October 22, 2016 02:12

alloc_slice in TypedArena #37220

alloc_slice in TypedArena #37220

Uh oh!

Conversation

Mark-Simulacrum commented Oct 16, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

frewsxcv Oct 16, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bors commented Oct 17, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eddyb left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eddyb commented Oct 17, 2016

Uh oh!

Mark-Simulacrum commented Oct 17, 2016

Uh oh!

eddyb commented Oct 17, 2016

Uh oh!

bors commented Oct 17, 2016

Uh oh!

eddyb commented Oct 18, 2016

Uh oh!

bors commented Oct 18, 2016

Uh oh!

bors commented Oct 19, 2016

Uh oh!

bors commented Oct 19, 2016

Uh oh!

Mark-Simulacrum commented Oct 19, 2016

Uh oh!

eddyb commented Oct 19, 2016

Uh oh!

bors commented Oct 19, 2016

Uh oh!

bors commented Oct 19, 2016

Uh oh!

bors commented Oct 19, 2016

Uh oh!

Uh oh!

Mark-Simulacrum commented Oct 16, 2016 •

edited

Loading

frewsxcv Oct 16, 2016 •

edited

Loading