Skip to content

Commit

Permalink
Merge pull request #24 from rusticstuff/v.next
Browse files Browse the repository at this point in the history
Prepare v0.1.1
  • Loading branch information
hkratz authored Apr 26, 2021
2 parents 84b79cf + acea3c2 commit 3063e4c
Show file tree
Hide file tree
Showing 32 changed files with 26,044 additions and 16,448 deletions.
4 changes: 3 additions & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = "simdutf8"
version = "0.1.0"
version = "0.1.1"
authors = ["Hans Kratz <hans@appfour.com>"]
edition = "2018"
description = "SIMD-accelerated UTF-8 validation."
Expand All @@ -16,8 +16,10 @@ exclude = ["/.github", "/.vscode", "/bench", "/afl", "/fuzz", "/img", "expected-
[features]
default = ["std"]

# enable CPU feature detection, on by default, turn off for no-std support
std = []

# expose SIMD implementations in basic::imp::* and compat::imp::*
public_imp = []

# use branch hints - requires nightly
Expand Down
60 changes: 34 additions & 26 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,15 +8,14 @@ Blazingly fast API-compatible UTF-8 validation for Rust using SIMD extensions, b
[simdjson](https://github.com/simdjson/simdjson). Originally ported to Rust by the developers of [simd-json.rs](https://simd-json.rs).

## Disclaimer
This software should be considered alpha quality and should not (yet) be used in production, though it has been tested
with sample data as well as a fuzzer and there are no known bugs. It will be tested more rigorously before the first
production release.
This software should not (yet) be used in production, though it has been tested with sample data as well as
fuzzing and there are no known bugs.

## Features
* `basic` API for the fastest validation, optimized for valid UTF-8
* `compat` API as a fully compatible replacement for `std::str::from_utf8()`
* Up to twenty times faster than the std library on non-ASCII, up to twice as fast on ASCII
* Up to 28% faster on non-ASCII input compared to the original simdjson implementation
* Up to 22 times faster than the std library on non-ASCII, up to three times faster on ASCII
* As fast as or faster than the original simdjson implementation
* Supports AVX 2 and SSE 4.2 implementations on x86 and x86-64. ARMv7 and ARMv8 neon support is planned
* Selects the fastest implementation at runtime based on CPU support
* Written in pure Rust
Expand All @@ -28,7 +27,7 @@ production release.
Add the dependency to your Cargo.toml file:
```toml
[dependencies]
simdutf8 = { version = "0.1.0" }
simdutf8 = { version = "0.1.1" }
```

Use `simdutf8::basic::from_utf8` as a drop-in replacement for `std::str::from_utf8()`.
Expand Down Expand Up @@ -59,7 +58,8 @@ is not valid UTF-8. `simdutf8::basic::Utf8Error` is a zero-sized error struct.

### Compat flavor
The `compat` flavor is fully API-compatible with `std::str::from_utf8`. In particular, `simdutf8::compat::from_utf8()`
returns a `simdutf8::compat::Utf8Error`, which has `valid_up_to()` and `error_len()` methods. The first is useful for verification of streamed data. The second is useful e.g. for replacing invalid byte sequences with a replacement character.
returns a `simdutf8::compat::Utf8Error`, which has `valid_up_to()` and `error_len()` methods. The first is useful for
verification of streamed data. The second is useful e.g. for replacing invalid byte sequences with a replacement character.

It also fails early: errors are checked on-the-fly as the string is processed and once
an invalid UTF-8 sequence is encountered, it returns without processing the rest of the data.
Expand All @@ -75,47 +75,56 @@ For no-std support (compiled with `--no-default-features`) the implementation is
the targeted CPU. Use `RUSTFLAGS="-C target-feature=+avx2"` for the AVX 2 implementation or `RUSTFLAGS="-C target-feature=+sse4.2"`
for the SSE 4.2 implementation.

If you want to be able to call A SIMD implementation directly, use the `public_imp` feature flag. The validation
If you want to be able to call a SIMD implementation directly, use the `public_imp` feature flag. The validation
implementations are then accessible via `simdutf8::(basic|compat)::imp::x86::(avx2|sse42)::validate_utf8()`.

## When not to use
If you are only processing short byte sequences (less than 64 bytes), the excellent scalar algorithm in the standard
library is likely faster. Also, this library uses unsafe code which has not been battle-tested and should not (yet)
be used in production.
This library uses unsafe code which has not been battle-tested and should not (yet) be used in production.

## Minimum Supported Rust Version (MSRV)
This crate's minimum supported Rust version is 1.38.0.

## Benchmarks

The benchmarks have been done with [criterion](https://bheisler.github.io/criterion.rs/book/index.html), the tables
are created with [critcmp](https://github.com/BurntSushi/critcmp). Source code and data are in the
[bench directory](https://github.com/rusticstuff/simdutf8/tree/main/bench).

The name schema is id-charset/size. _0-empty_ is the empty byte slice, _x-error/66536_ is a 64KiB slice where the very
first character is invalid UTF-8. All benchmarks were run on a laptop with an Intel Core i7-10750H CPU (Comet Lake) on
Windows with Rust 1.51.0. Library versions are simdutf8 v0.1.0 and simdjson v0.9.2.
Windows with Rust 1.51.0 if not otherwise stated. Library versions are simdutf8 v0.1.1 and simdjson v0.9.2. When comparing
with simdjson simdutf8 is compiled with `#inline(never)`.

### simdutf8 basic vs std library UTF-8 validation
![critcmp stimdutf8 basic vs std lib](https://raw.githubusercontent.com/rusticstuff/simdutf8/main/img/basic-vs-std.png)
simdutf8 performs better except for inputs ≤ 64 bytes.
![critcmp stimdutf8 v0.1.1 basic vs std lib](https://user-images.githubusercontent.com/3736990/116121179-a8271f80-a6c0-11eb-9b2b-6233c3c824f2.png)
simdutf8 performs better or as well as the std library.

### simdutf8 basic vs simdjson UTF-8 validation on Intel Comet Lake
![critcmp stimdutf8 v0.1.1 basic vs simdjson WSL](https://user-images.githubusercontent.com/3736990/116121748-38656480-a6c1-11eb-8cb4-385c7516a46a.png)
simdutf8 beats simdjson on almost all inputs on this CPU. This benchmark is run on
[WSL](https://docs.microsoft.com/en-us/windows/wsl/install-win10)
since I could not get simdjson to reach maximum performance on Windows with any C++ toolchain (see also simdjson issues
[847](https://github.com/simdjson/simdjson/issues/847) and [848](https://github.com/simdjson/simdjson/issues/848)).

### simdutf8 basic vs simdjson UTF-8 validation on AMD Zen 2
![critcmp stimdutf8 v0.1.1 basic vs simdjson AMD Zen 2](https://user-images.githubusercontent.com/3736990/116122729-731bcc80-a6c2-11eb-82a5-6e297778a1c4.png)

### simdutf8 basic vs simdjson UTF-8 validation
![critcmp st lib vs stimdutf8 basic](https://raw.githubusercontent.com/rusticstuff/simdutf8/main/img/basic-vs-simdjson.png)
simdutf8 is faster than simdjson except for some crazy optimization by clang for the pure ASCII
loop (to be investigated). simdjson is compiled using clang and gcc from MSYS.
On AMD Zen 2 aligning reads apparently does not matter at all. The extra step for aligning even hurts performance a bit around
an input size of 4096.

### simdutf8 basic vs simdutf8 compat UTF-8 validation
![critcmp st lib vs stimdutf8 basic](https://raw.githubusercontent.com/rusticstuff/simdutf8/main/img/basic-vs-compat.png)
![image](https://user-images.githubusercontent.com/3736990/116122427-0dc7db80-a6c2-11eb-8434-f9879742d90d.png)
There is a small performance penalty to continuously checking the error status while processing data, but detecting
errors early provides a huge benefit for the _x-error/66536_ benchmark.

## Technical details
The implementation is similar to the one in simdjson except that it aligns reads to the block size of the
SIMD extension, which leads to better peak performance compared to the implementation in simdjson. This alignment
means that an incomplete block needs to be processed before the aligned data is read, which would lead to worse
performance on short byte sequences. Thus, aligned reads are only used with 2048 bytes of data or more. Incomplete
reads for the first unaligned and the last incomplete block are done in two aligned 64-byte buffers.
On X86 for inputs shorter than 64 bytes validation is delegated to `core::str::from_utf8()`.

The SIMD implementation is similar to the one in simdjson except that it aligns reads to the block size of the
SIMD extension, which leads to better peak performance compared to the implementation in simdjson on some CPUs.
This alignment means that an incomplete block needs to be processed before the aligned data is read, which
leads to worse performance on byte sequences shorter than 2048 bytes. Thus, aligned reads are only used with
2048 bytes of data or more. Incomplete reads for the first unaligned and the last incomplete block are done in
two aligned 64-byte buffers.

For the compat API we need to check the error buffer on each 64-byte block instead of just aggregating it. If an
error is found, the last bytes of the previous block are checked for a cross-block continuation and then
Expand All @@ -137,5 +146,4 @@ the MIT license and Apache 2.0 license.
simdjson itself is distributed under the Apache License 2.0.

## References

John Keiser, Daniel Lemire, [Validating UTF-8 In Less Than One Instruction Per Byte](https://arxiv.org/abs/2010.03090), Software: Practice and Experience 51 (5), 2021
1 change: 0 additions & 1 deletion TODO.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,4 +9,3 @@
* investigate aarch64 support

# NEXT
* v0.1.1 benchmarks
5 changes: 2 additions & 3 deletions bench/BENCHMARKING.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,5 @@ Adding `-- --save-baseline some_name` to the bench commandline and then using [c
* Beware of BD PROCHOT on aged machines, can cause severe throttling

### Test machines
* Intel(R) Xeon(R) CPU E3-1225 v3 @ 3.20GHz (Sandy bridge)
* Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz (Skylake)
* Intel(R) Core(TM) i7-10750H CPU @ 2.60GHz (Comet Lake)
* Intel(R) Core(TM) i7-10750H CPU @ 2.60GHz (Comet Lake)
* AMD Ryzen 7 PRO 3700 8-Core Processor @ 3.60 GHz (Zen 2)
4 changes: 4 additions & 0 deletions bench/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,10 @@ simdjson-utf8 = { version = "*", path = "simdjson-utf8", optional = true }
name = "throughput_basic"
harness = false

[[bench]]
name = "throughput_basic_noinline"
harness = false

[[bench]]
name = "throughput_compat"
harness = false
Expand Down
Loading

0 comments on commit 3063e4c

Please sign in to comment.