Skip to content

Conversation

abacef
Copy link
Contributor

@abacef abacef commented Jun 29, 2025

Which issue does this PR close?

We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax.

Rationale for this change

Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes.

What changes are included in this PR?

There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR.

Added ryu and itoa to convert primitive numbers to strings

Are these changes tested?

We typically require tests for all PRs in order to:

  1. Prevent the code from being accidentally broken by subsequent changes
  2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example, are they covered by existing tests)?

I assume since I am not adding any functionality there are already tests covering this

Are there any user-facing changes?

If there are user-facing changes then we may require documentation to be updated before approving the PR.

If there are any breaking changes to public APIs, please call them out.

There should not be

@github-actions github-actions bot added the arrow Changes to the arrow crate label Jun 29, 2025
@abacef
Copy link
Contributor Author

abacef commented Jun 29, 2025

I am seeing the time about cut in half
cargo bench --bench json_writer_coerce_primitive

f64_rows                time:   [4.7327 ms 4.8176 ms 4.9280 ms]
                        change: [−61.878% −61.081% −60.120%] (p = 0.00 < 0.05)
                        Performance has improved.

f32_rows                time:   [3.0712 ms 3.1112 ms 3.1661 ms]
                        change: [−44.925% −43.676% −42.438%] (p = 0.00 < 0.05)
                        Performance has improved.

i64_rows                time:   [3.8781 ms 3.9195 ms 3.9665 ms]
                        change: [−47.707% −46.591% −45.626%] (p = 0.00 < 0.05)
                        Performance has improved.

i32_rows                time:   [3.5450 ms 3.5870 ms 3.6408 ms]
                        change: [−48.915% −47.916% −46.916%] (p = 0.00 < 0.05)
                        Performance has improved.

mixed_rows              time:   [15.824 ms 15.994 ms 16.196 ms]
                        change: [−49.339% −48.584% −47.802%] (p = 0.00 < 0.05)
                        Performance has improved.

@abacef
Copy link
Contributor Author

abacef commented Jun 29, 2025

A note on this PR, the ryu crate which I implemented to convert f64 values outputs scientific notation sometimes, so 300000. may be converted to 3.0e5. Is it important to not break users who rely on a certian format? As I understand, scientific notation for numbers is valid json, but I wonder if a customer is manually parsing the number value who may break because of this. If this is that important, we can consider using a different method to reduce the time.

@alamb alamb changed the title Coerce primitive numbers to string faster [json] Coerce primitive numbers to string faster Jul 3, 2025
@alamb
Copy link
Contributor

alamb commented Jul 3, 2025

Thank you for this contribution @abacef

I think @psvri switched arrow-cast to use ryu as well recently in #5401 which improved the performance

A note on this PR, the ryu crate which I implemented to convert f64 values outputs scientific notation sometimes, so 300000. may be converted to 3.0e5. Is it important to not break users who rely on a certian format? As I understand, scientific notation for numbers is valid json, but I wonder if a customer is manually parsing the number value who may break because of this. If this is that important, we can consider using a different method to reduce the time.

I am not sure

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @abacef

TapeElement::I32(low) => {
let val = ((high as i64) << 32) | (low as u32) as i64;
builder.append_value(val.to_string());
builder.append_value(int_formatter.format(val));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this still copies the bytes twice -- once into the ryu buffer and then again into the StringBuilder's buffer . I suspect we could make it even faster by writing into the StringBuilder directly

Note StringBuffer implements Write so you can do stuff like
https://docs.rs/arrow/latest/arrow/array/type.GenericStringBuilder.html#example-incrementally-writing-strings-with-stdfmtwrite

Is there some way to get ryu to write directly to that buffer?

using ryu may still be faster

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok so I tested simply adding write! instead of to_string:

write!(builder, "{n}").unwrap();
builder.append_value("");

vs

builder.append_value(n.to_string());

And the benches I wrote were 25% faster, so slower than itoa and ryu, but for some reason the previously written benches were all regressing by 15%:

bench_integer           time:   [6.6288 ms 6.6426 ms 6.6605 ms]
                        change: [+16.009% +16.295% +16.619%] (p = 0.00 < 0.05)
                        Performance has regressed. 

So at least that is a positive for using these libraries.

For ryu writing directly to the buffer, I was not able to find a way to do this since it seems like their internal buffer is coupled to the write implementation 😢

@alamb
Copy link
Contributor

alamb commented Jul 3, 2025

I'll also kick off a benchmark run

@alamb

This comment was marked as outdated.

// specific language governing permissions and limitations
// under the License.

use criterion::*;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please update the existing writer benchmark rather than making a new one?

https://github.com/apache/arrow-rs/blob/main/arrow/benches/json_writer.rs

Also, if you make the benchmark in a separate PR it will be easier for me to run in separately from this prposed code change and reproduce your numbers

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created a separate PR #7864

alamb pushed a commit that referenced this pull request Jul 5, 2025
# Which issue does this PR close?

- Closes: None

# Rationale for this change

It is suggested to merge benches before merging a speed optimization
(see #7819)

# What changes are included in this PR?

adding the following benches to convert the following type arrays to a
string

- i64
- i32
- f64
- f32
- i64, i32, f64, f32

# Are these changes tested?

I am not sure we are testing benches

# Are there any user-facing changes?

No
@alamb

This comment was marked as outdated.

4 similar comments
@alamb

This comment was marked as outdated.

@alamb

This comment was marked as outdated.

@alamb

This comment was marked as outdated.

@alamb

This comment was marked as outdated.

@abacef
Copy link
Contributor Author

abacef commented Jul 8, 2025

@alamb Are the benchmark results supposed to be posted here?

@alamb
Copy link
Contributor

alamb commented Jul 10, 2025

@alamb Are the benchmark results supposed to be posted here?

Sorry -- yes they should be. Let me look

@alamb
Copy link
Contributor

alamb commented Jul 10, 2025

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.11.0-1016-gcp #16~24.04.1-Ubuntu SMP Wed May 28 02:40:52 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing abacef (40f9c11) to 674dc17 diff
BENCH_NAME=json_writer
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench json_writer
BENCH_FILTER=
BENCH_BRANCH_NAME=abacef
Results will be posted here when complete

@alamb
Copy link
Contributor

alamb commented Jul 10, 2025

🤖: Benchmark completed

Details

group                    abacef                                 main
-----                    ------                                 ----
bench_dict_array         1.09      7.6±0.12ms        ? ?/sec    1.00      6.9±0.14ms        ? ?/sec
bench_float              1.00      6.1±0.04ms        ? ?/sec    1.00      6.1±0.03ms        ? ?/sec
bench_integer            1.02      5.5±0.05ms        ? ?/sec    1.00      5.4±0.03ms        ? ?/sec
bench_list               1.17     87.8±0.39ms        ? ?/sec    1.00     75.1±0.43ms        ? ?/sec
bench_mixed              1.04     33.1±0.32ms        ? ?/sec    1.00     31.9±0.21ms        ? ?/sec
bench_nullable_list      1.15     12.4±0.39ms        ? ?/sec    1.00     10.8±0.25ms        ? ?/sec
bench_nullable_struct    1.16     31.2±0.40ms        ? ?/sec    1.00     26.8±0.26ms        ? ?/sec
bench_string             1.14     16.2±0.14ms        ? ?/sec    1.00     14.2±0.13ms        ? ?/sec
bench_struct             1.06     53.6±0.27ms        ? ?/sec    1.00     50.5±0.33ms        ? ?/sec
bench_struct_list        1.14      8.5±0.58ms        ? ?/sec    1.00      7.5±0.49ms        ? ?/sec

@alamb
Copy link
Contributor

alamb commented Jul 10, 2025

🤖: Benchmark completed

Details

group                    abacef                                 main
-----                    ------                                 ----
bench_dict_array         1.09      7.6±0.12ms        ? ?/sec    1.00      6.9±0.14ms        ? ?/sec
bench_float              1.00      6.1±0.04ms        ? ?/sec    1.00      6.1±0.03ms        ? ?/sec
bench_integer            1.02      5.5±0.05ms        ? ?/sec    1.00      5.4±0.03ms        ? ?/sec
bench_list               1.17     87.8±0.39ms        ? ?/sec    1.00     75.1±0.43ms        ? ?/sec
bench_mixed              1.04     33.1±0.32ms        ? ?/sec    1.00     31.9±0.21ms        ? ?/sec
bench_nullable_list      1.15     12.4±0.39ms        ? ?/sec    1.00     10.8±0.25ms        ? ?/sec
bench_nullable_struct    1.16     31.2±0.40ms        ? ?/sec    1.00     26.8±0.26ms        ? ?/sec
bench_string             1.14     16.2±0.14ms        ? ?/sec    1.00     14.2±0.13ms        ? ?/sec
bench_struct             1.06     53.6±0.27ms        ? ?/sec    1.00     50.5±0.33ms        ? ?/sec
bench_struct_list        1.14      8.5±0.58ms        ? ?/sec    1.00      7.5±0.49ms        ? ?/sec

🤔 my benchmark machine suggests this branch is slower than main. I will rerun to double check

Maybe the different is related to compiler settings / target architectures

@alamb
Copy link
Contributor

alamb commented Jul 10, 2025

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.11.0-1016-gcp #16~24.04.1-Ubuntu SMP Wed May 28 02:40:52 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing abacef (40f9c11) to 674dc17 diff
BENCH_NAME=json_writer
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench json_writer
BENCH_FILTER=
BENCH_BRANCH_NAME=abacef
Results will be posted here when complete

@alamb
Copy link
Contributor

alamb commented Jul 10, 2025

🤖: Benchmark completed

Details

group                    abacef                                 main
-----                    ------                                 ----
bench_dict_array         1.03      7.5±0.12ms        ? ?/sec    1.00      7.2±0.15ms        ? ?/sec
bench_float              1.00      6.1±0.03ms        ? ?/sec    1.01      6.1±0.05ms        ? ?/sec
bench_integer            1.03      5.5±0.04ms        ? ?/sec    1.00      5.4±0.03ms        ? ?/sec
bench_list               1.16     87.7±0.30ms        ? ?/sec    1.00     75.6±0.38ms        ? ?/sec
bench_mixed              1.03     32.9±0.33ms        ? ?/sec    1.00     31.9±0.29ms        ? ?/sec
bench_nullable_list      1.14     12.1±0.15ms        ? ?/sec    1.00     10.6±0.14ms        ? ?/sec
bench_nullable_struct    1.16     31.3±0.30ms        ? ?/sec    1.00     26.9±0.40ms        ? ?/sec
bench_string             1.12     16.2±0.15ms        ? ?/sec    1.00     14.5±0.09ms        ? ?/sec
bench_struct             1.05     53.8±0.35ms        ? ?/sec    1.00     51.2±0.24ms        ? ?/sec
bench_struct_list        1.19      7.6±0.73ms        ? ?/sec    1.00      6.4±0.34ms        ? ?/sec

@alamb
Copy link
Contributor

alamb commented Jul 10, 2025

Given that this change seems to slow performance down, I don't think we should proceed with it until we can not regress performance

@abacef
Copy link
Contributor Author

abacef commented Jul 10, 2025

Do you know why the new benchmarks I wrote are not showing up here? 57f96f2

@abacef
Copy link
Contributor Author

abacef commented Jul 10, 2025

On my machine when I run the above benchmarks I get some being a little slower and some being a little faster, but no specific bias to a particular branch like shown above

@abacef
Copy link
Contributor Author

abacef commented Jul 10, 2025

@alamb can you run the above benchmark script against the same branch? On my machine when I run the benchmark twice on the same branch I get slower for some benches the second time.

@alamb
Copy link
Contributor

alamb commented Jul 10, 2025

@alamb can you run the above benchmark script against the same branch? On my machine when I run the benchmark twice on the same branch I get slower for some benches the second time.

I did run the benchmark twice on the same machine 🤔 :

@alamb
Copy link
Contributor

alamb commented Jul 10, 2025

Do you know why the new benchmarks I wrote are not showing up here? 57f96f2

My script compares against merge-head (where the PR was branched from main)

Perhaps you can merge up your branch to pick up the issues on main

@abacef
Copy link
Contributor Author

abacef commented Jul 10, 2025

I did run the benchmark twice on the same machine

I am wondering what the results would be back to back

@abacef
Copy link
Contributor Author

abacef commented Jul 10, 2025

Ok, I just rebased my commits on top of main.

@abacef
Copy link
Contributor Author

abacef commented Jul 10, 2025

The reason why I am questioning the validity of these benchmark results is because the only bench that touches the code path of the function I changed is bench_struct_list, so I would expect only this bench to change if at all. The only other thing I could think of is if the rust compiler is choosing to optimize the code less since we are adding more code complexity with ryu and itoa?

@alamb
Copy link
Contributor

alamb commented Jul 12, 2025

The reason why I am questioning the validity of these benchmark results is because the only bench that touches the code path of the function I changed is bench_struct_list, so I would expect only this bench to change if at all. The only other thing I could think of is if the rust compiler is choosing to optimize the code less since we are adding more code complexity with ryu and itoa?

Yeah, I agree the benchmark results look suspicious. It could also be a difference in architecture (perhaps your machine / rust settings do better with what is in itoa / ryu than what is done on the relatively crappy x86_64 gcp machine I am using)

If your performance results reproducible on another machine?

@abacef
Copy link
Contributor Author

abacef commented Jul 15, 2025

I am not seeing any better results for the older benchmarks on my computer. I guess we are not willing to take a slight hit on performance across the board to cut this specific use case's runtime in half?

@abacef
Copy link
Contributor Author

abacef commented Sep 22, 2025

@alamb would you be able to run a benchmark with the env RUSTFLAGS='-C target-cpu=native' for the benchmark json-writer? On my machine (2020 x86_64) I do not get any regressions with newer CPU flags, but I do get a 15% regression with the unchanged code paths with the default cpu flags. I am hopeful your GCP x86_64 benchmarking machine will find similar results.

Would this be a convincing proof to resolve conversations about the benchmarks? Or do we specifically want to take into consideration most people compiling using the default CPU flags (I believe Rust defaults to x86 CPUs from 2000)?

@alamb
Copy link
Contributor

alamb commented Sep 26, 2025

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.14.0-1016-gcp #17~24.04.1-Ubuntu SMP Wed Sep 3 01:55:36 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing abacef (91ce361) to 010d0e7 diff
BENCH_NAME=json_writer
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench json_writer
BENCH_FILTER=
BENCH_BRANCH_NAME=abacef
Results will be posted here when complete

@alamb
Copy link
Contributor

alamb commented Sep 26, 2025

Hi @abacef -- I kicked off this job:

BENCH_NAME="json_writer" BENCH_FILTER="" ./gh_compare_arrow.sh  https://github.com/apache/arrow-rs/pull/7819
RUSTFLAGS='-C target-cpu=native' BENCH_NAME="json_writer" BENCH_FILTER="" ./gh_compare_arrow.sh  https://github.com/apache/arrow-rs/pull/7819

(so the second results have the RUSTFLAGS set)

@alamb
Copy link
Contributor

alamb commented Sep 26, 2025

🤖: Benchmark completed

Details

group                      abacef                                 main
-----                      ------                                 ----
bench_dict_array           1.00      7.5±0.20ms        ? ?/sec    1.03      7.7±0.20ms        ? ?/sec
bench_float                1.00      5.9±0.04ms        ? ?/sec    1.03      6.1±0.04ms        ? ?/sec
bench_integer              1.00      5.3±0.05ms        ? ?/sec    1.03      5.5±0.05ms        ? ?/sec
bench_list                 1.00     82.6±0.33ms        ? ?/sec    1.00     83.0±0.33ms        ? ?/sec
bench_mixed                1.00     31.9±0.45ms        ? ?/sec    1.00     32.0±0.42ms        ? ?/sec
bench_nullable_list        1.00     12.0±0.23ms        ? ?/sec    1.01     12.1±0.26ms        ? ?/sec
bench_nullable_struct      1.00     28.7±0.40ms        ? ?/sec    1.00     28.6±0.54ms        ? ?/sec
bench_string               1.00     15.8±0.16ms        ? ?/sec    1.04     16.4±0.18ms        ? ?/sec
bench_struct               1.00     53.4±0.28ms        ? ?/sec    1.01     54.1±1.55ms        ? ?/sec
bench_struct_list          1.05      7.6±0.81ms        ? ?/sec    1.00      7.3±0.80ms        ? ?/sec
f32_to_string              1.00      2.9±0.02ms        ? ?/sec    1.41      4.1±0.02ms        ? ?/sec
f64_to_string              1.00      4.6±0.02ms        ? ?/sec    2.49     11.5±0.06ms        ? ?/sec
i32_to_string              1.00      3.4±0.02ms        ? ?/sec    1.43      4.8±0.02ms        ? ?/sec
i64_to_string              1.00      3.6±0.01ms        ? ?/sec    1.40      5.1±0.02ms        ? ?/sec
mixed_numbers_to_string    1.00     15.0±0.36ms        ? ?/sec    1.72     25.8±0.32ms        ? ?/sec

@alamb
Copy link
Contributor

alamb commented Sep 26, 2025

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.14.0-1016-gcp #17~24.04.1-Ubuntu SMP Wed Sep 3 01:55:36 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing abacef (91ce361) to 010d0e7 diff
BENCH_NAME=json_writer
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench json_writer
BENCH_FILTER=
BENCH_BRANCH_NAME=abacef
Results will be posted here when complete

@alamb
Copy link
Contributor

alamb commented Sep 26, 2025

🤖: Benchmark completed

Details

group                      abacef                                 main
-----                      ------                                 ----
bench_dict_array           1.05      7.5±0.15ms        ? ?/sec    1.00      7.1±0.21ms        ? ?/sec
bench_float                1.01      5.9±0.03ms        ? ?/sec    1.00      5.8±0.04ms        ? ?/sec
bench_integer              1.02      5.3±0.06ms        ? ?/sec    1.00      5.2±0.03ms        ? ?/sec
bench_list                 1.02     83.8±0.23ms        ? ?/sec    1.00     82.0±0.31ms        ? ?/sec
bench_mixed                1.00     31.9±0.48ms        ? ?/sec    1.00     31.8±0.41ms        ? ?/sec
bench_nullable_list        1.02     12.2±0.23ms        ? ?/sec    1.00     11.9±0.26ms        ? ?/sec
bench_nullable_struct      1.02     28.7±0.55ms        ? ?/sec    1.00     28.1±0.48ms        ? ?/sec
bench_string               1.02     16.1±0.15ms        ? ?/sec    1.00     15.8±0.40ms        ? ?/sec
bench_struct               1.02     53.5±0.34ms        ? ?/sec    1.00     52.4±0.38ms        ? ?/sec
bench_struct_list          1.05      7.9±1.23ms        ? ?/sec    1.00      7.5±0.84ms        ? ?/sec
f32_to_string              1.00      3.0±0.01ms        ? ?/sec    1.41      4.3±0.01ms        ? ?/sec
f64_to_string              1.00      4.7±0.02ms        ? ?/sec    2.49     11.6±0.05ms        ? ?/sec
i32_to_string              1.00      3.5±0.01ms        ? ?/sec    1.44      5.0±0.03ms        ? ?/sec
i64_to_string              1.00      3.7±0.02ms        ? ?/sec    1.43      5.2±0.02ms        ? ?/sec
mixed_numbers_to_string    1.00     14.6±0.28ms        ? ?/sec    1.76     25.6±0.44ms        ? ?/sec

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @abacef -- the new benchmark numbers indeed look compelling

I spent some more time reviewing this carefully and I think this PR is looking great

Thank you for your patience @abacef

lexical-core = { version = "1.0", default-features = false}
memchr = "2.7.4"
simdutf8 = "0.1.5"
ryu = "1.0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was concerned about adding these new dependencies, however, it seems serde_json already depends on ryu and ito so this is not a net-new dependency, it is just now explicit.

name = "serde_json"
version = "1.0.140"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "20068b6e96dc6c9bd23e01df8827e6c7e1f2fddd43c21810382803c136b99373"
dependencies = [
 "itoa",
 "memchr",
 "ryu",
 "serde",
]

TapeElement::I32(low) => {
let val = ((high as i64) << 32) | (low as u32) as i64;
builder.append_value(val.to_string());
builder.append_value(int_formatter.format(val));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW saving the string allocation also likely makes a non trivial difference

@alamb alamb changed the title [json] Coerce primitive numbers to string faster [json] Optimize primitive numbers to string (50%-250% faster) Sep 26, 2025
@alamb alamb merged commit 3cdafaf into apache:main Sep 29, 2025
25 checks passed
@alamb
Copy link
Contributor

alamb commented Sep 29, 2025

Thanks again @abacef

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

JSON Reader Faster Coercion of Primitives to String

2 participants