-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance comparison with other existing libraries in README.md #6
Comments
This would be interesting, although my impression is that this library optimizes for easy API and low learning curve over performance.
|
@StefanoD I tried a benchmark of deserializing this Log object. My code is at dtolnay/serde-json@2acc39d.
Notice that Serde takes advantage of knowing the shape of the output. It is significantly faster to deserialize to |
@dtolnay Thank you very much. This is very impressive! |
There is a json-rust to Log missing because json-rust does not support that. I filed serde-rs/json#82 to add benchmarks to the serde_json readme. |
Thank you, very much! |
@dtolnay This is pretty awesome, it's good to see I've room for improvement, and it's good to know that I'm not terribly far behind to begin with and things are usable as they are. I reckon my string parsing can be much better, since I just hacked in proper escaped characters handling yesterday. |
0.7.1 performance boost, related to #6
@dtolnay Waited a bit with this issue till I got time to look into performance. I did take a clue from Serde and reworked things to use bytes instead of characters, tests results on the same JSON as above on my machine as of 0.7.1:
I haven't put in benches with the Log struct, there is a lot of boilerplate there, I might just copy over the file tomorrow. I reckon it makes sense to inform people the about pros and cons of serializing to generic values vs struct, so before I push anything to README I'd rather have a complete breakdown. Sidenote: this is probably the most nerdy fun I've had in a long while :). |
Oh, please don't close this! :) |
Funny that you mention Jackson, since Jackson was one of the main inspirations for this (in the sense of: "oh dear, I really don't want to deal with that again"). I'll add versions. Currently the benchmarks are running via |
Thanks for the great work on this Maciej. I was pretty surprised by the difference in serialization time between Serde and json-rust (nicely done!) so I did some digging. It looks like the difference is 100% accounted for by two things:
|
I tried to reproduce the benchmark in Node.js (v6.1.0): https://gist.github.com/maciejhirsz/89b9813cc3fc875bd0723f6cf85dbbb9
Parsing performance is really interesting given V8 has to parse things into dynamic data, not sure if my methodology is correct. |
Wow, thx! That's the reason, why I wanted to compare with other libs and languages! Now you can discuss bottlenecks with other developers! |
I suspect V8 might be doing some caching magic, I edited the gist, added this before iteration: data = JSON.parse(json);
json = JSON.stringify(data); Basically forcing a reset of the variables, values after:
This basically matches Serde parsing to a struct, but loses in serialization. Also note - that's not JavaScript, it's the built it native code of V8 doing the work (C++). Another edit, this is running a single iteration, before Node gets "hot":
|
Fun times with 0.8.0:
🎉 |
Make sure to use serde_json 0.7.4 for the next run. |
Will do, already seen the numbers, nice job!
FYI: I speed up strings a lot. As long as I don't encounter escaped characters, I just iterate through the bytes without doing anything and then use the original slice in it's entirety, which is much faster then pushing individual bytes. The buffer doesn't have to increment length nor do capacity checks on each iteration, it can just memcopy the whole thing at once. edit: Just looked at the PR on serde_json to see you are doing the same, but with a LUT, nice :) edit: with lookups:
If I have time I might commit to serde_json. The strategies for strings we use do vary in the end, I've tried simplifying mine but I do take a dive in numbers. Basically I'm assuming a happy path (no characters needing to be escaped) on my strings, and only when I encounter one I jump to a method that can write those. This means that for the majority of strings I don't have to keep track of where I am, and can just do this at the end: self.write(string.as_bytes()); Instead of: self.write(string[start..].as_bytes()); Which is faster for whatever reason ...and I found a bug my tests didn't cover while writing this... |
Interestingly for stringify I get reverse results for some much larger files. Take a look here. |
Aye, that looks reasonable considering |
Ignore canada.json, I think it is a silly benchmark. But I am surprised by the other two because for the Log benchmark my results pretty much line up with yours (faster CPU but same ratios). |
Ha, interesting. It might deal with the fact that I'm writing directly to Edit: reading the actual stdlib source, both |
Looked over my stuff and made some quick very micro optimizations, particulary replacing instances where I was writing a single byte as a slice I didn't expect much from that, but the total number of nanos is so small by now, that any deviation makes a huge difference:
|
I wonder if, using dtoa, you could beat the stringify time for RapidJSON. I have not been able to get nativejson-benchmark running locally so I do not have numbers that are comparable but you may have better luck. It must be pretty close. |
Numbers with dtoa:
I'll have to look into why I'm slower on twitter.json among others. Your (re-)implementation of dtoa is using Grisu2, however there is also a Grisu3 algorithm, Fun fact: the author of RapidJSON is also an author of the C++ implementation of Grisu2 and the maintainer of the nativejson benchmark, kinda explains why 90% of the performance measured is how fast you can stringify floats, eh? |
I should have spent more time researching this. Grisu3 isn't faster, it just detects when the result isn't optimal for the fallback, Grisu2 just goes through with it producing suboptimal results for some strings, which is totally fine. |
Should update README with charts for the log benchmark, as well as the 3 benchmarks tested in json-benchmark, with descriptions of what they are actually testing:
|
This issue has been open forever now. The new README now links to json-benchmarks. I'd like to see improvements for performance presentation happen there. Related issue. |
This would be interesting for many developers and a strong argument to use this library, respectively rust instead of another language.
In the comparison the version number of the respective lib should be indicated.
Especially a comparison with serd is interesting.
https://github.com/serde-rs/serde
The text was updated successfully, but these errors were encountered: