Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance comparison with other existing libraries in README.md #6

Closed
StefanoD opened this issue Jun 18, 2016 · 27 comments
Closed

Performance comparison with other existing libraries in README.md #6

StefanoD opened this issue Jun 18, 2016 · 27 comments

Comments

@StefanoD
Copy link

StefanoD commented Jun 18, 2016

This would be interesting for many developers and a strong argument to use this library, respectively rust instead of another language.

In the comparison the version number of the respective lib should be indicated.
Especially a comparison with serd is interesting.
https://github.com/serde-rs/serde

@dtolnay
Copy link

dtolnay commented Jun 18, 2016

This would be interesting, although my impression is that this library optimizes for easy API and low learning curve over performance.

We have some benchmarks of HashMap and Vec deserialization with varying types and varying sizes using both Serde and RustcDecodable at bench_map.rs and bench_vec.rs. I would be curious to see the same benchmarks against json-rust. EDIT: never mind, these are benchmarks of core Serde not serde_json.

@dtolnay
Copy link

dtolnay commented Jun 18, 2016

@StefanoD I tried a benchmark of deserializing this Log object. My code is at dtolnay/serde-json@2acc39d.

Library Output Time Throughput
serde Log 5,135 ns/iter (+/- 73) 117 MB/s
serde serde_json::Value 8,455 ns/iter (+/- 103) 71 MB/s
rustc-serialize json::Json 12,287 ns/iter (+/- 53) 49 MB/s
json-rust json::JsonValue 14,548 ns/iter (+/- 1,094) 41 MB/s
rustc-serialize Log 17,170 ns/iter (+/- 219) 35 MB/s

Notice that Serde takes advantage of knowing the shape of the output. It is significantly faster to deserialize to Log compared to the enum, so json-rust has a handicap there. (That said, the opposite is true for rustc-serialize which deserializes Log by going to the Json enum first.)

@StefanoD
Copy link
Author

@dtolnay Thank you very much. This is very impressive!
BTW: Did I note correctly, that there is json-rust to Log missing?
Could you publish similar benchmarks on your serd github webpage and compare i. e. serd with popular JSON libs?

@dtolnay
Copy link

dtolnay commented Jun 18, 2016

There is a json-rust to Log missing because json-rust does not support that.

I filed serde-rs/json#82 to add benchmarks to the serde_json readme.

@StefanoD
Copy link
Author

Thank you, very much!

@maciejhirsz
Copy link
Owner

@dtolnay This is pretty awesome, it's good to see I've room for improvement, and it's good to know that I'm not terribly far behind to begin with and things are usable as they are. I reckon my string parsing can be much better, since I just hacked in proper escaped characters handling yesterday.

maciejhirsz added a commit that referenced this issue Jun 22, 2016
0.7.1 performance boost, related to #6
@maciejhirsz
Copy link
Owner

maciejhirsz commented Jun 22, 2016

@dtolnay Waited a bit with this issue till I got time to look into performance. I did take a clue from Serde and reworked things to use bytes instead of characters, tests results on the same JSON as above on my machine as of 0.7.1:

Library Parsing Serializing
Serde 10,316 ns/iter (+/- 797) = 62 MB/s 3,341 ns/iter (+/- 298) = 192 MB/s
json-rust 10,587 ns/iter (+/- 642) = 60 MB/s 2,685 ns/iter (+/- 314) = 239 MB/s
rustc-serialize 15,995 ns/iter (+/- 1,179) = 40 MB/s 5,022 ns/iter (+/- 278) = 128 MB/s

Codez

I haven't put in benches with the Log struct, there is a lot of boilerplate there, I might just copy over the file tomorrow. I reckon it makes sense to inform people the about pros and cons of serializing to generic values vs struct, so before I push anything to README I'd rather have a complete breakdown.

Sidenote: this is probably the most nerdy fun I've had in a long while :).

@StefanoD
Copy link
Author

Oh, please don't close this! :)
A language comparison would be interesting, also. At least a comparison with the popular Java Lib "Jackson" would be nice.
And don't forget to add the version number of the lib as performance improvements can always be accomplished as you experienced yourself. ;)

@maciejhirsz
Copy link
Owner

Funny that you mention Jackson, since Jackson was one of the main inspirations for this (in the sense of: "oh dear, I really don't want to deal with that again").

I'll add versions. Currently the benchmarks are running via cargo bench on nightly, to benchmark Java things might get a bit more complicated, but I'll look into that.

@maciejhirsz maciejhirsz reopened this Jun 25, 2016
@dtolnay
Copy link

dtolnay commented Jun 25, 2016

Thanks for the great work on this Maciej.

I was pretty surprised by the difference in serialization time between Serde and json-rust (nicely done!) so I did some digging. It looks like the difference is 100% accounted for by two things:

  • We pre-allocate a vector of 128 bytes while you pre-allocate 1024 bytes. The benchmark object was large so the larger allocation paid off.
  • Serialization of u64. Your write_digits_from_u64 is really fast. We basically use write!(&mut writer, "{}", value) which is astonishingly slow. I filed serialize_u64 is slow serde-rs/json#84 to look into a better approach and/or contribute a fix to Rust for whatever is making write! slow.

@maciejhirsz
Copy link
Owner

I tried to reproduce the benchmark in Node.js (v6.1.0): https://gist.github.com/maciejhirsz/89b9813cc3fc875bd0723f6cf85dbbb9

Benching parse
- 5681ns per iteration
- throughput 101.56 MB/s

Benching serialize
- 4900ns per iteration
- throughput 117.76 MB/s

Parsing performance is really interesting given V8 has to parse things into dynamic data, not sure if my methodology is correct.

@StefanoD
Copy link
Author

Wow, thx! That's the reason, why I wanted to compare with other libs and languages! Now you can discuss bottlenecks with other developers!

@maciejhirsz
Copy link
Owner

maciejhirsz commented Jun 25, 2016

I suspect V8 might be doing some caching magic, I edited the gist, added this before iteration:

data = JSON.parse(json);
json = JSON.stringify(data);

Basically forcing a reset of the variables, values after:

Benching parse
- 6603ns per iteration
- throughput 87.38 MB/s

Benching serialize
- 5181ns per iteration
- throughput 111.35 MB/s

This basically matches Serde parsing to a struct, but loses in serialization.

Also note - that's not JavaScript, it's the built it native code of V8 doing the work (C++).

Another edit, this is running a single iteration, before Node gets "hot":

Benching parse
- 237541ns per iteration
- throughput 2.43 MB/s

Benching serialize
- 151885ns per iteration
- throughput 3.8 MB/s

maciejhirsz added a commit that referenced this issue Jun 26, 2016
@maciejhirsz
Copy link
Owner

Fun times with 0.8.0:

test json_rust_parse                  ... bench:       6,670 ns/iter (+/- 507) = 90 MB/s
test json_rust_stringify              ... bench:       2,440 ns/iter (+/- 131) = 247 MB/s
test rustc_serialize_parse            ... bench:      14,554 ns/iter (+/- 815) = 41 MB/s
test rustc_serialize_stringify        ... bench:       4,523 ns/iter (+/- 357) = 133 MB/s
test rustc_serialize_struct_parse     ... bench:      20,580 ns/iter (+/- 2,144) = 29 MB/s
test rustc_serialize_struct_stringify ... bench:       4,233 ns/iter (+/- 596) = 142 MB/s
test serde_json_parse                 ... bench:       9,032 ns/iter (+/- 970) = 66 MB/s
test serde_json_stringify             ... bench:       3,415 ns/iter (+/- 254) = 177 MB/s
test serde_json_struct_parse          ... bench:       6,645 ns/iter (+/- 518) = 91 MB/s
test serde_json_struct_stringify      ... bench:       3,132 ns/iter (+/- 187) = 193 MB/s

🎉

@dtolnay
Copy link

dtolnay commented Jun 30, 2016

Make sure to use serde_json 0.7.4 for the next run.

@maciejhirsz
Copy link
Owner

maciejhirsz commented Jun 30, 2016

Will do, already seen the numbers, nice job!

test json_rust_parse                  ... bench:       5,807 ns/iter (+/- 207) = 104 MB/s
test json_rust_stringify              ... bench:       2,047 ns/iter (+/- 186) = 295 MB/s
test rustc_serialize_parse            ... bench:      14,515 ns/iter (+/- 574) = 41 MB/s
test rustc_serialize_stringify        ... bench:       4,549 ns/iter (+/- 706) = 132 MB/s
test rustc_serialize_struct_parse     ... bench:      20,472 ns/iter (+/- 1,316) = 29 MB/s
test rustc_serialize_struct_stringify ... bench:       4,383 ns/iter (+/- 132) = 138 MB/s
test serde_json_parse                 ... bench:       9,050 ns/iter (+/- 371) = 66 MB/s
test serde_json_stringify             ... bench:       2,264 ns/iter (+/- 194) = 267 MB/s
test serde_json_struct_parse          ... bench:       6,836 ns/iter (+/- 734) = 88 MB/s
test serde_json_struct_stringify      ... bench:       2,079 ns/iter (+/- 273) = 291 MB/s

FYI: I speed up strings a lot. As long as I don't encounter escaped characters, I just iterate through the bytes without doing anything and then use the original slice in it's entirety, which is much faster then pushing individual bytes. The buffer doesn't have to increment length nor do capacity checks on each iteration, it can just memcopy the whole thing at once.

edit: Just looked at the PR on serde_json to see you are doing the same, but with a LUT, nice :)

edit: with lookups:

test json_rust_parse                  ... bench:       5,863 ns/iter (+/- 416) = 103 MB/s
test json_rust_stringify              ... bench:       1,795 ns/iter (+/- 126) = 337 MB/s

If I have time I might commit to serde_json. The strategies for strings we use do vary in the end, I've tried simplifying mine but I do take a dive in numbers. Basically I'm assuming a happy path (no characters needing to be escaped) on my strings, and only when I encounter one I jump to a method that can write those.

This means that for the majority of strings I don't have to keep track of where I am, and can just do this at the end:

self.write(string.as_bytes());

Instead of:

self.write(string[start..].as_bytes());

Which is faster for whatever reason ...and I found a bug my tests didn't cover while writing this...

@dtolnay
Copy link

dtolnay commented Jul 1, 2016

Interestingly for stringify I get reverse results for some much larger files. Take a look here.

@maciejhirsz
Copy link
Owner

Aye, that looks reasonable considering canada.json has a ton of floats in it, I haven't gotten to trying to figure out how to optimize those correctly. Also I should really pull all the fail / pass jsons into my unit tests.

@dtolnay
Copy link

dtolnay commented Jul 1, 2016

Ignore canada.json, I think it is a silly benchmark. But I am surprised by the other two because for the Log benchmark my results pretty much line up with yours (faster CPU but same ratios).

@maciejhirsz
Copy link
Owner

maciejhirsz commented Jul 1, 2016

Ha, interesting. It might deal with the fact that I'm writing directly to Vec<u8> instead of using <std::io::Write>, maybe the latter deals with reallocation better. I'll investigate when I get time.

Edit: reading the actual stdlib source, both push and extend_from_slice will double the capacity when it's reached, so the capacity growth is exponential which sounds like it should be more than good enough for super large files. Will have to dig deeper.

@maciejhirsz
Copy link
Owner

Looked over my stuff and made some quick very micro optimizations, particulary replacing instances where I was writing a single byte as a slice &[u8] instead of just u8.

I didn't expect much from that, but the total number of nanos is so small by now, that any deviation makes a huge difference:

test json_rust_parse                  ... bench:       5,800 ns/iter (+/- 984) = 104 MB/s
test json_rust_stringify              ... bench:       1,600 ns/iter (+/- 172) = 378 MB/s

@maciejhirsz
Copy link
Owner

@dtolnay Ok, I know what's happening - since json-rust can't write to io::Write atm, the benchmark has to first dump to an allocated String and then write that into a writer. I'll make a specialized generator for that (#51).

@dtolnay
Copy link

dtolnay commented Jul 1, 2016

I wonder if, using dtoa, you could beat the stringify time for RapidJSON. I have not been able to get nativejson-benchmark running locally so I do not have numbers that are comparable but you may have better luck. It must be pretty close.

@maciejhirsz
Copy link
Owner

maciejhirsz commented Jul 2, 2016

Numbers with dtoa:

======= serde_json ======= parse|stringify === parse|stringify ===
data/canada.json          51.4ms    24.1ms    36.9ms    20.4ms
data/citm_catalog.json    30.0ms     3.3ms    20.7ms     1.7ms
data/twitter.json         10.4ms     1.3ms     8.5ms     1.4ms

======= json-rust ======== parse|stringify === parse|stringify ===
data/canada.json          42.0ms    24.7ms
data/citm_catalog.json    15.9ms     2.6ms
data/twitter.json          6.8ms     1.7ms

==== rustc_serialize ===== parse|stringify === parse|stringify ===
data/canada.json          39.8ms    71.3ms    46.5ms    67.5ms
data/citm_catalog.json    26.9ms     6.0ms    32.6ms     4.1ms
data/twitter.json         13.4ms     2.9ms    17.8ms     2.6ms

I'll have to look into why I'm slower on twitter.json among others.

Your (re-)implementation of dtoa is using Grisu2, however there is also a Grisu3 algorithm, which is faster but rejects 0.5% of numbers, for which a fallback can be used, that sounds like a strategy worth pursuing. Found an implementation of it, no idea how well it fares though.

Fun fact: the author of RapidJSON is also an author of the C++ implementation of Grisu2 and the maintainer of the nativejson benchmark, kinda explains why 90% of the performance measured is how fast you can stringify floats, eh?

@maciejhirsz
Copy link
Owner

I should have spent more time researching this. Grisu3 isn't faster, it just detects when the result isn't optimal for the fallback, Grisu2 just goes through with it producing suboptimal results for some strings, which is totally fine.

@maciejhirsz maciejhirsz added this to the 0.9.0 milestone Jul 15, 2016
@maciejhirsz
Copy link
Owner

maciejhirsz commented Jul 15, 2016

Should update README with charts for the log benchmark, as well as the 3 benchmarks tested in json-benchmark, with descriptions of what they are actually testing:

  • log.json: pretty standard JSON object with short keys and short string or integer values. This is a relatively "easy" benchmark.
  • canada.json: vector graphic containing vectors of vectors of floats. This benchmark is mostly testing decimal number parsing and stringifying.
  • citm_catalog: a catalog of objects contained in one huge object using keys for ids. This benchmark tests fast key ordering in a map.
  • twitter.json: again pretty standard JSON objects in an array. This is mostly a benchmark for sheer volume of data combined with long strings composed of unicode characters (Japanese).

@maciejhirsz
Copy link
Owner

This issue has been open forever now. The new README now links to json-benchmarks. I'd like to see improvements for performance presentation happen there. Related issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants