-
Notifications
You must be signed in to change notification settings - Fork 1k
[json] Optimize primitive numbers to string (50%-250% faster) #7819
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I am seeing the time about cut in half |
|
A note on this PR, the ryu crate which I implemented to convert f64 values outputs scientific notation sometimes, so |
|
Thank you for this contribution @abacef I think @psvri switched arrow-cast to use ryu as well recently in #5401 which improved the performance
I am not sure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @abacef
| TapeElement::I32(low) => { | ||
| let val = ((high as i64) << 32) | (low as u32) as i64; | ||
| builder.append_value(val.to_string()); | ||
| builder.append_value(int_formatter.format(val)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this still copies the bytes twice -- once into the ryu buffer and then again into the StringBuilder's buffer . I suspect we could make it even faster by writing into the StringBuilder directly
Note StringBuffer implements Write so you can do stuff like
https://docs.rs/arrow/latest/arrow/array/type.GenericStringBuilder.html#example-incrementally-writing-strings-with-stdfmtwrite
Is there some way to get ryu to write directly to that buffer?
using ryu may still be faster
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok so I tested simply adding write! instead of to_string:
write!(builder, "{n}").unwrap();
builder.append_value("");vs
builder.append_value(n.to_string());And the benches I wrote were 25% faster, so slower than itoa and ryu, but for some reason the previously written benches were all regressing by 15%:
bench_integer time: [6.6288 ms 6.6426 ms 6.6605 ms]
change: [+16.009% +16.295% +16.619%] (p = 0.00 < 0.05)
Performance has regressed.
So at least that is a positive for using these libraries.
For ryu writing directly to the buffer, I was not able to find a way to do this since it seems like their internal buffer is coupled to the write implementation 😢
|
I'll also kick off a benchmark run |
This comment was marked as outdated.
This comment was marked as outdated.
| // specific language governing permissions and limitations | ||
| // under the License. | ||
|
|
||
| use criterion::*; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please update the existing writer benchmark rather than making a new one?
https://github.com/apache/arrow-rs/blob/main/arrow/benches/json_writer.rs
Also, if you make the benchmark in a separate PR it will be easier for me to run in separately from this prposed code change and reproduce your numbers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created a separate PR #7864
# Which issue does this PR close? - Closes: None # Rationale for this change It is suggested to merge benches before merging a speed optimization (see #7819) # What changes are included in this PR? adding the following benches to convert the following type arrays to a string - i64 - i32 - f64 - f32 - i64, i32, f64, f32 # Are these changes tested? I am not sure we are testing benches # Are there any user-facing changes? No
This comment was marked as outdated.
This comment was marked as outdated.
4 similar comments
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
|
@alamb Are the benchmark results supposed to be posted here? |
Sorry -- yes they should be. Let me look |
|
🤖 |
|
🤖: Benchmark completed Details
|
🤔 my benchmark machine suggests this branch is slower than main. I will rerun to double check Maybe the different is related to compiler settings / target architectures |
|
🤖 |
|
🤖: Benchmark completed Details
|
|
Given that this change seems to slow performance down, I don't think we should proceed with it until we can not regress performance |
|
Do you know why the new benchmarks I wrote are not showing up here? 57f96f2 |
|
On my machine when I run the above benchmarks I get some being a little slower and some being a little faster, but no specific bias to a particular branch like shown above |
|
@alamb can you run the above benchmark script against the same branch? On my machine when I run the benchmark twice on the same branch I get slower for some benches the second time. |
I did run the benchmark twice on the same machine 🤔 : |
My script compares against Perhaps you can merge up your branch to pick up the issues on main |
I am wondering what the results would be back to back |
|
Ok, I just rebased my commits on top of |
|
The reason why I am questioning the validity of these benchmark results is because the only bench that touches the code path of the function I changed is |
Yeah, I agree the benchmark results look suspicious. It could also be a difference in architecture (perhaps your machine / rust settings do better with what is in itoa / ryu than what is done on the relatively crappy x86_64 gcp machine I am using) If your performance results reproducible on another machine? |
|
I am not seeing any better results for the older benchmarks on my computer. I guess we are not willing to take a slight hit on performance across the board to cut this specific use case's runtime in half? |
|
@alamb would you be able to run a benchmark with the env Would this be a convincing proof to resolve conversations about the benchmarks? Or do we specifically want to take into consideration most people compiling using the default CPU flags (I believe Rust defaults to x86 CPUs from 2000)? |
|
🤖 |
|
Hi @abacef -- I kicked off this job: BENCH_NAME="json_writer" BENCH_FILTER="" ./gh_compare_arrow.sh https://github.com/apache/arrow-rs/pull/7819
RUSTFLAGS='-C target-cpu=native' BENCH_NAME="json_writer" BENCH_FILTER="" ./gh_compare_arrow.sh https://github.com/apache/arrow-rs/pull/7819(so the second results have the |
|
🤖: Benchmark completed Details
|
|
🤖 |
|
🤖: Benchmark completed Details
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| lexical-core = { version = "1.0", default-features = false} | ||
| memchr = "2.7.4" | ||
| simdutf8 = "0.1.5" | ||
| ryu = "1.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was concerned about adding these new dependencies, however, it seems serde_json already depends on ryu and ito so this is not a net-new dependency, it is just now explicit.
name = "serde_json"
version = "1.0.140"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "20068b6e96dc6c9bd23e01df8827e6c7e1f2fddd43c21810382803c136b99373"
dependencies = [
"itoa",
"memchr",
"ryu",
"serde",
]| TapeElement::I32(low) => { | ||
| let val = ((high as i64) << 32) | (low as u32) as i64; | ||
| builder.append_value(val.to_string()); | ||
| builder.append_value(int_formatter.format(val)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW saving the string allocation also likely makes a non trivial difference
|
Thanks again @abacef |
Which issue does this PR close?
We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax.
Rationale for this change
Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes.
What changes are included in this PR?
There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR.
Added ryu and itoa to convert primitive numbers to strings
Are these changes tested?
We typically require tests for all PRs in order to:
If tests are not included in your PR, please explain why (for example, are they covered by existing tests)?
I assume since I am not adding any functionality there are already tests covering this
Are there any user-facing changes?
If there are user-facing changes then we may require documentation to be updated before approving the PR.
If there are any breaking changes to public APIs, please call them out.
There should not be