UnicodeWidthChar should inline an ascii fast path #1

pnkfelix · 2015-04-19T01:04:33Z

(imported from improperly closed bug rust-lang/rust#20658 )

So, the good news is that this cargo package is beating the deprecated stdlib code on my machine.

The bad news is that it is still slower than what the original issue author expects to see.

(Though I also don't think the original issue author's code is conformant with what the docs here say the fn width method should do on char.)

Anyway, here is a port of the benchmark provided by the original author, adapted to also test this cargo package. It is followed with the benchmark results on my machine.

#![feature(test, unicode)]

extern crate test;
extern crate unicode_width;

use test::{Bencher};

use unicode_width::UnicodeWidthChar;

const STRING: &'static str = 
    "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
     aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
     aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
     aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
     aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
     aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
     aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
     aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
     aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
     aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
     aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
     aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa";

#[bench]
fn stdlib(b: &mut Bencher) {
    b.iter(|| {
        for c in STRING.chars() {
            test::black_box(c.width(false));
        }
    });
}

#[bench]
fn cargo(b: &mut Bencher) {
    b.iter(|| {
        for c in STRING.chars() {
            test::black_box(UnicodeWidthChar::width(c));
        }
    });
}

#[bench]
fn simple(b: &mut Bencher) {
    b.iter(|| {
        for c in STRING.chars() {
            if (c as u32) < 127 {
                if (c as u32) > 31 {
                    test::black_box(Some(1));
                } else {
                    test::black_box(None::<usize>);
                }
            } else {
                test::black_box(c.width(false));
            }
        }
    });
}

03-03-58 char_fast_path/char_fast_path (git:fsk-sized-bounded-iter) % rustc --version
rustc 1.0.0-nightly (1284be404 2015-04-18) (built 2015-04-17)
03-04-12 char_fast_path/char_fast_path (git:fsk-sized-bounded-iter) % cargo bench
     Running target/release/char_fast_path-935b3c19b07d9282

running 3 tests
test cargo  ... bench:      3197 ns/iter (+/- 401)
test simple ... bench:      1465 ns/iter (+/- 500)
test stdlib ... bench:      3427 ns/iter (+/- 571)

test result: ok. 0 passed; 0 failed; 0 ignored; 3 measured

03-04-21 char_fast_path/char_fast_path (git:fsk-sized-bounded-iter) %

The text was updated successfully, but these errors were encountered:

kwantam · 2015-04-19T01:12:45Z

Thanks! I'll have a look.

This diff adds benchmarks to get more info regarding Issue #1. It appears that the remaining difference between the "simple" case and the "cargo" case is the result of a difference in performance between using `match` and `if` for tight loops. I suspect it's because of the way that match arms get reordered: if I manually reorder the "if" statement, I can reproduce the match performance. Also added a couple #[inline] annotations in tables.rs, though the difference in performance in my measurements is negligible. Bumped version number to 0.1.1.

kwantam · 2015-04-19T04:05:41Z

Strange. On the machines on which I've tested, it appears that the remaining difference in performance between the "simple" case and the "cargo" case is due to the use of if in the former and match in the latter. I created another "simple" case that uses match instead, and its performance is identical to the "cargo" case.

I suppose it might be worthwhile to modify the code to use an if statement, but then again one assumes that performance of generated code will increase in the future, and so it might not be worth it.

pnkfelix · 2015-04-19T06:43:11Z

Hmm, that might be an interesting bug to file against the Rust match implementation, at least.

kwantam · 2015-04-19T07:04:25Z

Thinking about it a bit more, it's not necessarily surprising that the if implementation is slightly faster, since the conditionals are ordered to check the desired case (printable ASCII) first. I haven't checked the LLVM IR, but I'm guessing that the match ends up ordering the checks differently, and this probably explains the difference.

I'll have a closer look at the generated IR and try to confirm.

mjguynn mentioned this issue Jun 10, 2022

Improve performance, especially on non-ASCII codepoints #28

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UnicodeWidthChar should inline an ascii fast path #1

UnicodeWidthChar should inline an ascii fast path #1

pnkfelix commented Apr 19, 2015

kwantam commented Apr 19, 2015

kwantam commented Apr 19, 2015

pnkfelix commented Apr 19, 2015

kwantam commented Apr 19, 2015

UnicodeWidthChar should inline an ascii fast path #1

UnicodeWidthChar should inline an ascii fast path #1

Comments

pnkfelix commented Apr 19, 2015

kwantam commented Apr 19, 2015

kwantam commented Apr 19, 2015

pnkfelix commented Apr 19, 2015

kwantam commented Apr 19, 2015