Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move to using a styled buffer for rendering #6

Merged
merged 2 commits into from
Apr 12, 2016

Conversation

sophiajt
Copy link
Collaborator

Add a 2-D styled buffer that we can easily write into and later render into a Vec of RenderedLines. Refactor render_line to use this new styled buffer.

@nikomatsakis
Copy link
Owner

Nice!

}

fn render(&self, source_kind: RenderedLineKind) -> Vec<RenderedLine> {
let mut output : Vec<RenderedLine> = Vec::new();
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style nit: let mut output: Vec<RenderedLine> = Vec::new();

(I'd probably write vec![] too.)

@nikomatsakis nikomatsakis merged this pull request into nikomatsakis:borrowck-snippet Apr 12, 2016
nikomatsakis pushed a commit that referenced this pull request Dec 1, 2016
Add small-copy optimization for copy_from_slice

## Summary

During benchmarking, I found that one of my programs spent between 5 and 10 percent of the time doing memmoves. Ultimately I tracked these down to single-byte slices being copied with a memcopy. Doing a manual copy if the slice contains only one element can speed things up significantly. For my program, this reduced the running time by 20%.

## Background

I am optimizing a program that relies heavily on reading a single byte at a time. To avoid IO overhead, I read all data into a vector once, and then I use a `Cursor` around that vector to read from. During profiling, I noticed that `__memmove_avx_unaligned_erms` was hot, taking up 7.3% of the running time. It turns out that these were caused by calls to `Cursor::read()`, which calls `<&[u8] as Read>::read()`, which calls `&[T]::copy_from_slice()`, which calls `ptr::copy_nonoverlapping()`. This one is implemented as a memcopy. Copying a single byte with a memcopy is very wasteful, because (at least on my platform) it involves calling `memcpy` in libc. This is an indirect call when libc is linked dynamically, and furthermore `memcpy` is optimized for copying large amounts of data at the cost of a bit of overhead for small copies.

## Benchmarks

Before I made this change, `perf` reported the following for my program. I only included the relevant functions, and how they rank. (This is on a different machine than where I ran the original benchmarks. It has an older CPU, so `__memmove_sse2_unaligned_erms` is called instead of `__memmove_avx_unaligned_erms`.)

```
#3   5.47%  bench_decode  libc-2.24.so      [.] __memmove_sse2_unaligned_erms
#5   1.67%  bench_decode  libc-2.24.so      [.] memcpy@GLIBC_2.2.5
#6   1.51%  bench_decode  bench_decode      [.] memcpy@plt
```

`memcpy` is eating up 8.65% of the total running time, and the overhead of dispatching to a specialized fast copy function (`memcpy@GLIBC` showing up) is clearly visible. The price of dynamic linking (`memcpy@plt` showing up) is visible too.

After this change, this is what `perf` reports:

```
#5   0.33%  bench_decode  libc-2.24.so      [.] __memmove_sse2_unaligned_erms
#14  0.01%  bench_decode  libc-2.24.so      [.] memcpy@GLIBC_2.2.5
```

Now only 0.34% of the running time is spent on memcopies. The dynamic linking overhead is not significant at all any more.

To add some more data, my program generates timing results for the operation in its main loop. These are the timings before and after the change:

| Time before   | Time after    | After/Before |
|---------------|---------------|--------------|
| 29.8 ± 0.8 ns | 23.6 ± 0.5 ns |  0.79 ± 0.03 |

The time is basically the total running time divided by a constant; the actual numbers are not important. This change reduced the total running time by 21% (much more than the original 9% spent on memmoves, likely because the CPU is stalling a lot less because data dependencies are more transparent). Of course YMMV and for most programs this will not matter at all. But when it does, the gains can be significant!

## Alternatives

* At first I implemented this in `io::Cursor`. I moved it to `&[T]::copy_from_slice()` instead, but this might be too intrusive, especially because it applies to all `T`, not just `u8`. To restrict this to `io::Read`, `<&[u8] as Read>::read()` is probably the best place.
* I tried copying bytes in a loop up to 64 or 8 bytes before calling `Read::read`, but both resulted in about a 20% slowdown instead of speedup.
nikomatsakis pushed a commit that referenced this pull request Feb 16, 2017
LeakSanitizer, ThreadSanitizer, AddressSanitizer and MemorySanitizer support

```
$ cargo new --bin leak && cd $_

$ edit Cargo.toml && tail -n3 $_
```

``` toml
[profile.dev]
opt-level = 1
```

```
$ edit src/main.rs && cat $_
```

``` rust
use std::mem;

fn main() {
    let xs = vec![0, 1, 2, 3];
    mem::forget(xs);
}
```

```
$ RUSTFLAGS="-Z sanitizer=leak" cargo run --target x86_64-unknown-linux-gnu; echo $?
    Finished dev [optimized + debuginfo] target(s) in 0.0 secs
     Running `target/debug/leak`

=================================================================
==10848==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 16 byte(s) in 1 object(s) allocated from:
    #0 0x557c3488db1f in __interceptor_malloc /shared/rust/checkouts/lsan/src/compiler-rt/lib/lsan/lsan_interceptors.cc:55
    #1 0x557c34888aaa in alloc::heap::exchange_malloc::h68f3f8b376a0da42 /shared/rust/checkouts/lsan/src/liballoc/heap.rs:138
    #2 0x557c34888afc in leak::main::hc56ab767de6d653a $PWD/src/main.rs:4
    #3 0x557c348c0806 in __rust_maybe_catch_panic ($PWD/target/debug/leak+0x3d806)

SUMMARY: LeakSanitizer: 16 byte(s) leaked in 1 allocation(s).
23
```

```
$ cargo new --bin racy && cd $_

$ edit src/main.rs && cat $_
```

``` rust
use std::thread;

static mut ANSWER: i32 = 0;

fn main() {
    let t1 = thread::spawn(|| unsafe { ANSWER = 42 });
    unsafe {
        ANSWER = 24;
    }
    t1.join().ok();
}
```

```
$ RUSTFLAGS="-Z sanitizer=thread" cargo run --target x86_64-unknown-linux-gnu; echo $?
==================
WARNING: ThreadSanitizer: data race (pid=12019)
  Write of size 4 at 0x562105989bb4 by thread T1:
    #0 racy::main::_$u7b$$u7b$closure$u7d$$u7d$::hbe13ea9e8ac73f7e $PWD/src/main.rs:6 (racy+0x000000010e3f)
    #1 _$LT$std..panic..AssertUnwindSafe$LT$F$GT$$u20$as$u20$core..ops..FnOnce$LT$$LP$$RP$$GT$$GT$::call_once::h2e466a92accacc78 /shared/rust/checkouts/lsan/src/libstd/panic.rs:296 (racy+0x000000010cc5)
    #2 std::panicking::try::do_call::h7f4d2b38069e4042 /shared/rust/checkouts/lsan/src/libstd/panicking.rs:460 (racy+0x00000000c8f2)
    #3 __rust_maybe_catch_panic <null> (racy+0x0000000b4e56)
    #4 std::panic::catch_unwind::h31ca45621ad66d5a /shared/rust/checkouts/lsan/src/libstd/panic.rs:361 (racy+0x00000000b517)
    #5 std::thread::Builder::spawn::_$u7b$$u7b$closure$u7d$$u7d$::hccfc37175dea0b01 /shared/rust/checkouts/lsan/src/libstd/thread/mod.rs:357 (racy+0x00000000c226)
    #6 _$LT$F$u20$as$u20$alloc..boxed..FnBox$LT$A$GT$$GT$::call_box::hd880bbf91561e033 /shared/rust/checkouts/lsan/src/liballoc/boxed.rs:605 (racy+0x00000000f27e)
    #7 std::sys::imp::thread::Thread::new::thread_start::hebdfc4b3d17afc85 <null> (racy+0x0000000abd40)

  Previous write of size 4 at 0x562105989bb4 by main thread:
    #0 racy::main::h23e6e5ca46d085c3 $PWD/src/main.rs:8 (racy+0x000000010d7c)
    #1 __rust_maybe_catch_panic <null> (racy+0x0000000b4e56)
    #2 __libc_start_main <null> (libc.so.6+0x000000020290)

  Location is global 'racy::ANSWER::h543d2b139f819b19' of size 4 at 0x562105989bb4 (racy+0x0000002f8bb4)

  Thread T1 (tid=12028, running) created by main thread at:
    #0 pthread_create /shared/rust/checkouts/lsan/src/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:902 (racy+0x00000001aedb)
    #1 std::sys::imp::thread::Thread::new::hce44187bf4a36222 <null> (racy+0x0000000ab9ae)
    #2 std::thread::spawn::he382608373eb667e /shared/rust/checkouts/lsan/src/libstd/thread/mod.rs:412 (racy+0x00000000b5aa)
    #3 racy::main::h23e6e5ca46d085c3 $PWD/src/main.rs:6 (racy+0x000000010d5c)
    #4 __rust_maybe_catch_panic <null> (racy+0x0000000b4e56)
    #5 __libc_start_main <null> (libc.so.6+0x000000020290)

SUMMARY: ThreadSanitizer: data race $PWD/src/main.rs:6 in racy::main::_$u7b$$u7b$closure$u7d$$u7d$::hbe13ea9e8ac73f7e
==================
ThreadSanitizer: reported 1 warnings
66
```

```
$ cargo new --bin oob && cd $_

$ edit src/main.rs && cat $_
```

``` rust
fn main() {
    let xs = [0, 1, 2, 3];
    let y = unsafe { *xs.as_ptr().offset(4) };
}
```

```
$ RUSTFLAGS="-Z sanitizer=address" cargo run --target x86_64-unknown-linux-gnu; echo $?
=================================================================
==13328==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fff29f3ecd0 at pc 0x55802dc6bf7e bp 0x7fff29f3ec90 sp 0x7fff29f3ec88
READ of size 4 at 0x7fff29f3ecd0 thread T0
    #0 0x55802dc6bf7d in oob::main::h0adc7b67e5feb2e7 $PWD/src/main.rs:3
    #1 0x55802dd60426 in __rust_maybe_catch_panic ($PWD/target/debug/oob+0xfe426)
    #2 0x55802dd58dd9 in std::rt::lang_start::hb2951fc8a59d62a7 ($PWD/target/debug/oob+0xf6dd9)
    #3 0x55802dc6c002 in main ($PWD/target/debug/oob+0xa002)
    #4 0x7fad8c3b3290 in __libc_start_main (/usr/lib/libc.so.6+0x20290)
    #5 0x55802dc6b719 in _start ($PWD/target/debug/oob+0x9719)

Address 0x7fff29f3ecd0 is located in stack of thread T0 at offset 48 in frame
    #0 0x55802dc6bd5f in oob::main::h0adc7b67e5feb2e7 $PWD/src/main.rs:1

  This frame has 1 object(s):
    [32, 48) 'xs' <== Memory access at offset 48 overflows this variable
HINT: this may be a false positive if your program uses some custom stack unwind mechanism or swapcontext
      (longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-buffer-overflow $PWD/src/main.rs:3 in oob::main::h0adc7b67e5feb2e7
Shadow bytes around the buggy address:
  0x1000653dfd40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1000653dfd50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1000653dfd60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1000653dfd70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1000653dfd80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x1000653dfd90: 00 00 00 00 f1 f1 f1 f1 00 00[f3]f3 00 00 00 00
  0x1000653dfda0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1000653dfdb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1000653dfdc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1000653dfdd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1000653dfde0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Heap right redzone:      fb
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack partial redzone:   f4
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==13328==ABORTING
1
```

```
$ cargo new --bin uninit && cd $_

$ edit src/main.rs && cat $_
```

``` rust
use std::mem;

fn main() {
    let xs: [u8; 4] = unsafe { mem::uninitialized() };
    let y = xs[0] + xs[1];
}
```

```
$ RUSTFLAGS="-Z sanitizer=memory" cargo run; echo $?
==30198==WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0x563f4b6867da in uninit::main::hc2731cd4f2ed48f8 $PWD/src/main.rs:5
    #1 0x563f4b7033b6 in __rust_maybe_catch_panic ($PWD/target/debug/uninit+0x873b6)
    #2 0x563f4b6fbd69 in std::rt::lang_start::hb2951fc8a59d62a7 ($PWD/target/debug/uninit+0x7fd69)
    #3 0x563f4b6868a9 in main ($PWD/target/debug/uninit+0xa8a9)
    #4 0x7fe844354290 in __libc_start_main (/usr/lib/libc.so.6+0x20290)
    #5 0x563f4b6864f9 in _start ($PWD/target/debug/uninit+0xa4f9)

SUMMARY: MemorySanitizer: use-of-uninitialized-value $PWD/src/main.rs:5 in uninit::main::hc2731cd4f2ed48f8
Exiting
77
```
nikomatsakis pushed a commit that referenced this pull request Sep 12, 2019
rebase code from rust-lang/rust master branch
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants