Skip to content

Conversation

zjregee
Copy link
Member

@zjregee zjregee commented Feb 16, 2025

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

By calculating the length in advance and allocating memory accordingly, the repeat function achieves improved performance to a certain degree.

The following are the local test results:

repeat 3 times/repeat_string_view [size=1024, repeat_times=3]
                        time:   [40.286 µs 40.438 µs 40.607 µs]
                        change: [-0.1541% +0.4780% +1.0910%] (p = 0.17 > 0.05)
                        No change in performance detected.
repeat 3 times/repeat_string [size=1024, repeat_times=3]
                        time:   [38.556 µs 38.744 µs 38.944 µs]
                        change: [-10.201% -8.3456% -6.4266%] (p = 0.00 < 0.05)
                        Performance has improved.
repeat 3 times/repeat_large_string [size=1024, repeat_times=3]
                        time:   [30.212 µs 30.365 µs 30.523 µs]
                        change: [-3.3156% -2.5561% -1.7929%] (p = 0.00 < 0.05)
                        Performance has improved.

repeat 30 times/repeat_string_view [size=1024, repeat_times=30]
                        time:   [74.830 µs 75.354 µs 75.885 µs]
                        change: [-51.695% -51.232% -50.754%] (p = 0.00 < 0.05)
                        Performance has improved.
repeat 30 times/repeat_string [size=1024, repeat_times=30]
                        time:   [73.079 µs 73.638 µs 74.246 µs]
                        change: [-53.258% -52.647% -52.008%] (p = 0.00 < 0.05)
                        Performance has improved.
repeat 30 times/repeat_large_string [size=1024, repeat_times=30]
                        time:   [74.278 µs 74.651 µs 75.251 µs]
                        change: [-52.307% -51.550% -50.794%] (p = 0.00 < 0.05)
                        Performance has improved.

repeat 1073741824 times/repeat_string overflow [size=1024, repeat_times=1073741824]
                        time:   [155.57 ns 156.74 ns 157.77 ns]
                        change: [-29.625% -28.588% -27.670%] (p = 0.00 < 0.05)
                        Performance has improved.

repeat 3 times/repeat_string_view [size=4096, repeat_times=3]
                        time:   [158.52 µs 159.74 µs 161.16 µs]
                        change: [-8.5460% -7.5443% -6.4563%] (p = 0.00 < 0.05)
                        Performance has improved.
repeat 3 times/repeat_string [size=4096, repeat_times=3]
                        time:   [155.23 µs 156.10 µs 156.95 µs]
                        change: [-10.086% -9.3897% -8.7024%] (p = 0.00 < 0.05)
                        Performance has improved.
repeat 3 times/repeat_large_string [size=4096, repeat_times=3]
                        time:   [118.43 µs 119.50 µs 120.71 µs]
                        change: [-13.262% -12.286% -11.215%] (p = 0.00 < 0.05)
                        Performance has improved.

repeat 30 times/repeat_string_view [size=4096, repeat_times=30]
                        time:   [407.53 µs 411.95 µs 416.35 µs]
                        change: [-38.905% -37.817% -36.784%] (p = 0.00 < 0.05)
                        Performance has improved.
repeat 30 times/repeat_string [size=4096, repeat_times=30]
                        time:   [400.81 µs 404.55 µs 408.14 µs]
                        change: [-40.494% -39.392% -38.262%] (p = 0.00 < 0.05)
                        Performance has improved.
repeat 30 times/repeat_large_string [size=4096, repeat_times=30]
                        time:   [407.21 µs 412.32 µs 417.80 µs]
                        change: [-37.954% -37.020% -36.066%] (p = 0.00 < 0.05)
                        Performance has improved.

repeat 1073741824 times/repeat_string overflow [size=4096, repeat_times=1073741824]
                        time:   [150.92 ns 152.68 ns 154.44 ns]
                        change: [-31.467% -30.488% -29.483%] (p = 0.00 < 0.05)
                        Performance has improved.

Are these changes tested?

Yes.

Are there any user-facing changes?

None.

@github-actions github-actions bot added the functions Changes to functions implementation label Feb 16, 2025
@zjregee
Copy link
Member Author

zjregee commented Feb 16, 2025

Hi, @alamb.

In addition, thank you very much for your help, but I have tried some suggestions in #14610 to reduce memory copy by using the write!, but I have not achieved any results. I think there may be the following problems:

  • By looking at the source code, the implementation of the write! seems to be the same as append_value, and is also implemented by calling append_slice. There is still a memory copy here, and using write! does not seem to have any special meaning.
  • When using the following method for loop insertion, there seems to be no performance advantage when the number of loops is too large, because the str::repeat reduces the number of loops through exponential copying (although there are still more memory allocations here).
for _ in 0..number {
    write!(builder, "{}", string)?;
}
builder.append_value("")

@alamb alamb changed the title optimize performance of the repeat function optimize performance of the repeat function (up to 50% faster) Feb 16, 2025
@alamb
Copy link
Contributor

alamb commented Feb 16, 2025

Hi, @alamb.

In addition, thank you very much for your help, but I have tried some suggestions in #14610 to reduce memory copy by using the write!, but I have not achieved any results. I think there may be the following problems:

  • By looking at the source code, the implementation of the write! seems to be the same as append_value, and is also implemented by calling append_slice. There is still a memory copy here, and using write! does not seem to have any
    special meaning.
  • When using the following method for loop insertion, there seems to be no performance advantage when the number of loops is too large, because the str::repeat reduces the number of loops through exponential copying (although there are still more memory allocations here).

That is an interesting finding -- I see that std::String::repeat uses slice::repeat which seems quite optimized. Thanks for testing it out. If avoiding the allocation doesn't help then that is the end of that

@alamb
Copy link
Contributor

alamb commented Feb 16, 2025

BTW I got a flamegraph of this run while using

andrewlamb@Andrews-MacBook-Pro-2:~/Software/datafusion$ sudo flamegraph -- target/release/deps/repeat-f851d6dc5c039e6c --bench times/repeat_large_string

Here it is in case you are interested (fixed)
flamegraph

While there is some malloc/free that shows up a lot of the time is spent copying the strings which makes sense

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @zjregee

(Some(string), Some(number)) if number >= 0 => {
if number as usize * string.len() > max_str_len {
let item_capcaity = string.len() * number as usize;
if item_capcaity > max_str_len {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo item_capcaity

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! It has been updated.

@alamb alamb merged commit 8b45d2d into apache:main Feb 17, 2025
24 checks passed
@alamb
Copy link
Contributor

alamb commented Feb 17, 2025

Thanks again @Dandandan and @zjregee

@zjregee zjregee deleted the optimize-repeat-function branch February 17, 2025 12:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optimize repeat function

3 participants