Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of trim for string view (10%) #12395

Merged
merged 26 commits into from
Sep 25, 2024

Conversation

Rachelint
Copy link
Contributor

@Rachelint Rachelint commented Sep 9, 2024

Which issue does this PR close?

Closes #12387

Rationale for this change

Similar as the string view version substr, we can impl the string view version trim to improve performance.

What changes are included in this PR?

  • Impl a string view version trim which can avoid copying the whole long(> 12) string when performing trim.
  • Introduce the basic unit tests for trim.

Are these changes tested?

Test by new unit test and exist other tests.

Are there any user-facing changes?

No.

@Kev1n8
Copy link
Contributor

Kev1n8 commented Sep 9, 2024

FYI @Rachelint that #12383 is modifying make_and_append_view, the original implementation is not correct, which is my fault.

@Rachelint
Copy link
Contributor Author

Rachelint commented Sep 9, 2024

FYI @Rachelint that #12383 is modifying make_and_append_view, the original implementation is not correct, which is my fault.

Thanks! I will push forward this until #12383 merged.

@Rachelint Rachelint marked this pull request as ready for review September 11, 2024 13:13
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @Rachelint -- this looks really nice and quite close 🙏

I left some comments, but I don't think they are required to merge this.

I do think we should have benchmark numbers showing this makes things faster in order to merge it. Could you please make a StringView based benchmark for trim -- perhaps in

// regarding copyright ownership. The ASF licenses this file
?

Then we can run that benchmark and show that this PR improves the performance.

Thanks again!

@@ -82,7 +82,11 @@ impl ScalarUDFImpl for BTrimFunc {
}

fn return_type(&self, arg_types: &[DataType]) -> Result<DataType> {
utf8_to_str_type(&arg_types[0], "btrim")
if arg_types[0] == DataType::Utf8View {
Ok(DataType::Utf8View)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Also eventually it would also be possible to return Utf8View when the input was Utf8 and save a copy as well

use datafusion_common::cast::{as_generic_string_array, as_string_view_array};
use datafusion_common::Result;
use datafusion_common::{exec_err, ScalarValue};
use datafusion_expr::ColumnarValue;

/// Make a `u128` based on the given substr, start(offset to view.offset), and
/// push into to the given buffers
pub(crate) fn make_and_append_view(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 I wonder if we should (as a follow on PR) propose adding this upstream to arrow-rs as it seems valuable for any trim related kernels on stringview

Copy link
Contributor Author

@Rachelint Rachelint Sep 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds great! and #12383 (comment) can be solved if it is function in arrow-rs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like what would be really useful is a StringViewBuilder that could be modified perhaps 🤔

I started to write a ticket in arrow-rs but I didn't know exactly what API to suggest. I think we would have to try it out

datafusion/functions/src/string/ltrim.rs Show resolved Hide resolved
@Rachelint
Copy link
Contributor Author

Rachelint commented Sep 17, 2024

I think maybe we should place the LTrim/RTrim/BTrim into a same place(like trim.rs)?

@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Sep 17, 2024
@Kev1n8
Copy link
Contributor

Kev1n8 commented Sep 17, 2024

For benchmarking, I would recommend this PR #12111. for what it's worth

@Rachelint
Copy link
Contributor Author

For benchmarking, I would recommend this PR #12111. for what it's worth

Thanks, it is really helpful!

@Rachelint
Copy link
Contributor Author

Rachelint commented Sep 17, 2024

Run benchmark introduced in #12513, about 10~20% improvement for the long string(64 bytes).

Highlights, as we expected, the string view trim mainly reduces copyings when the trimmed result > 12:

group                                                                                    main                                   string-view-trim
-----                                                                                    ----                                   ----------------
INPUT LEN > 12, OUTPUT LEN > 12/string_view [size=1024, len_before=64, len_after=60]     1.16     41.2±0.19µs        ? ?/sec    1.00     35.6±0.21µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN > 12/string_view [size=4096, len_before=64, len_after=60]     1.25    173.1±5.68µs        ? ?/sec    1.00    138.5±0.78µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN > 12/string_view [size=8192, len_before=64, len_after=60]     1.24    341.3±3.67µs        ? ?/sec    1.00    276.1±1.17µs        ? ?/sec

The detailed sorted out benchmark result:

group                                                                                    main                                   string-view-trim
-----                                                                                    ----                                   ----------------
INPUT LEN <= 12/large_string [size=1024, len_before=12, len_after=8]                     1.00     35.9±0.07µs        ? ?/sec    1.01     36.1±0.36µs        ? ?/sec
INPUT LEN <= 12/large_string [size=4096, len_before=12, len_after=8]                     1.00    139.6±0.51µs        ? ?/sec    1.00    139.1±0.49µs        ? ?/sec
INPUT LEN <= 12/large_string [size=8192, len_before=12, len_after=8]                     1.01    281.2±2.01µs        ? ?/sec    1.00    278.4±2.06µs        ? ?/sec
INPUT LEN <= 12/string [size=1024, len_before=12, len_after=8]                           1.00     35.9±0.31µs        ? ?/sec    1.00     35.9±0.14µs        ? ?/sec
INPUT LEN <= 12/string [size=4096, len_before=12, len_after=8]                           1.00    138.5±0.41µs        ? ?/sec    1.01    139.4±0.52µs        ? ?/sec
INPUT LEN <= 12/string [size=8192, len_before=12, len_after=8]                           1.00    279.1±3.72µs        ? ?/sec    1.00    278.6±1.07µs        ? ?/sec
INPUT LEN <= 12/string_view [size=1024, len_before=12, len_after=8]                      1.00     36.2±1.13µs        ? ?/sec    1.00     36.1±1.98µs        ? ?/sec
INPUT LEN <= 12/string_view [size=4096, len_before=12, len_after=8]                      1.00    139.7±1.54µs        ? ?/sec    1.00    139.0±2.41µs        ? ?/sec
INPUT LEN <= 12/string_view [size=8192, len_before=12, len_after=8]                      1.01    277.5±1.31µs        ? ?/sec    1.00    275.5±2.25µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN <= 12/large_string [size=1024, len_before=64, len_after=4]    1.03    135.5±4.86µs        ? ?/sec    1.00    131.6±1.33µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN <= 12/large_string [size=4096, len_before=64, len_after=4]    1.00    522.5±2.32µs        ? ?/sec    1.00    522.1±2.30µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN <= 12/large_string [size=8192, len_before=64, len_after=4]    1.00   1039.3±3.48µs        ? ?/sec    1.00   1040.9±3.07µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN <= 12/string [size=1024, len_before=64, len_after=4]          1.01    132.5±1.17µs        ? ?/sec    1.00    131.3±0.92µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN <= 12/string [size=4096, len_before=64, len_after=4]          1.01    527.6±3.43µs        ? ?/sec    1.00    522.2±1.72µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN <= 12/string [size=8192, len_before=64, len_after=4]          1.00   1043.3±2.28µs        ? ?/sec    1.00   1040.7±3.50µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN <= 12/string_view [size=1024, len_before=64, len_after=4]     1.01    131.3±0.40µs        ? ?/sec    1.00    130.5±0.60µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN <= 12/string_view [size=4096, len_before=64, len_after=4]     1.01    524.0±2.79µs        ? ?/sec    1.00    519.3±2.52µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN <= 12/string_view [size=8192, len_before=64, len_after=4]     1.00   1041.1±3.21µs        ? ?/sec    1.00   1040.1±9.73µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN > 12/large_string [size=1024, len_before=64, len_after=60]    1.00     41.2±0.30µs        ? ?/sec    1.00     41.2±0.16µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN > 12/large_string [size=4096, len_before=64, len_after=60]    1.01    169.9±4.30µs        ? ?/sec    1.00    168.1±1.83µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN > 12/large_string [size=8192, len_before=64, len_after=60]    1.01   345.1±10.96µs        ? ?/sec    1.00    342.5±4.26µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN > 12/string [size=1024, len_before=64, len_after=60]          1.02     41.8±0.62µs        ? ?/sec    1.00     41.0±0.12µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN > 12/string [size=4096, len_before=64, len_after=60]          1.01    171.6±1.73µs        ? ?/sec    1.00    169.2±2.07µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN > 12/string [size=8192, len_before=64, len_after=60]          1.00    343.0±6.30µs        ? ?/sec    1.00    341.8±6.00µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN > 12/string_view [size=1024, len_before=64, len_after=60]     1.16     41.2±0.19µs        ? ?/sec    1.00     35.6±0.21µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN > 12/string_view [size=4096, len_before=64, len_after=60]     1.25    173.1±5.68µs        ? ?/sec    1.00    138.5±0.78µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN > 12/string_view [size=8192, len_before=64, len_after=60]     1.24    341.3±3.67µs        ? ?/sec    1.00    276.1±1.17µs        ? ?/sec

@alamb
Copy link
Contributor

alamb commented Sep 18, 2024

I merged this PR up to main and am running another round of benchmarks. Thank you @Rachelint

@alamb
Copy link
Contributor

alamb commented Sep 18, 2024

++ critcmp main string-view-trim
group                                                                                    main                                   string-view-trim
-----                                                                                    ----                                   ----------------
INPUT LEN <= 12/large_string [size=1024, len_before=12, len_after=8]                     1.07     45.0±0.03µs        ? ?/sec    1.00     42.0±0.03µs        ? ?/sec
INPUT LEN <= 12/large_string [size=4096, len_before=12, len_after=8]                     1.07    173.8±0.20µs        ? ?/sec    1.00    163.1±0.18µs        ? ?/sec
INPUT LEN <= 12/large_string [size=8192, len_before=12, len_after=8]                     1.07    345.0±0.15µs        ? ?/sec    1.00    321.9±0.34µs        ? ?/sec
INPUT LEN <= 12/string [size=1024, len_before=12, len_after=8]                           1.00     42.3±0.02µs        ? ?/sec    1.02     43.3±0.02µs        ? ?/sec
INPUT LEN <= 12/string [size=4096, len_before=12, len_after=8]                           1.00    162.9±0.12µs        ? ?/sec    1.03    167.2±0.06µs        ? ?/sec
INPUT LEN <= 12/string [size=8192, len_before=12, len_after=8]                           1.00    323.2±0.17µs        ? ?/sec    1.03    332.3±0.49µs        ? ?/sec
INPUT LEN <= 12/string_view [size=1024, len_before=12, len_after=8]                      1.04     42.1±0.08µs        ? ?/sec    1.00     40.5±0.04µs        ? ?/sec
INPUT LEN <= 12/string_view [size=4096, len_before=12, len_after=8]                      1.02    163.1±0.14µs        ? ?/sec    1.00    159.4±0.16µs        ? ?/sec
INPUT LEN <= 12/string_view [size=8192, len_before=12, len_after=8]                      1.02    323.0±0.25µs        ? ?/sec    1.00    317.1±0.14µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN <= 12/large_string [size=1024, len_before=64, len_after=4]    1.00    184.2±0.17µs        ? ?/sec    1.01    186.2±0.22µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN <= 12/large_string [size=4096, len_before=64, len_after=4]    1.00   740.0±16.98µs        ? ?/sec    1.00    741.3±0.94µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN <= 12/large_string [size=8192, len_before=64, len_after=4]    1.00   1464.2±2.95µs        ? ?/sec    1.01   1482.4±2.26µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN <= 12/string [size=1024, len_before=64, len_after=4]          1.00    181.8±0.09µs        ? ?/sec    1.03    187.6±0.06µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN <= 12/string [size=4096, len_before=64, len_after=4]          1.00    722.9±0.86µs        ? ?/sec    1.03    746.0±0.42µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN <= 12/string [size=8192, len_before=64, len_after=4]          1.00   1440.5±1.30µs        ? ?/sec    1.04   1491.1±3.38µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN <= 12/string_view [size=1024, len_before=64, len_after=4]     1.00    182.3±0.19µs        ? ?/sec    1.01    184.0±3.93µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN <= 12/string_view [size=4096, len_before=64, len_after=4]     1.00    724.5±1.23µs        ? ?/sec    1.01    732.5±0.40µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN <= 12/string_view [size=8192, len_before=64, len_after=4]     1.00   1443.6±1.95µs        ? ?/sec    1.02  1465.6±24.73µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN > 12/large_string [size=1024, len_before=64, len_after=60]    1.07     46.6±0.07µs        ? ?/sec    1.00     43.4±0.05µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN > 12/large_string [size=4096, len_before=64, len_after=60]    1.06    179.5±0.27µs        ? ?/sec    1.00    168.9±0.19µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN > 12/large_string [size=8192, len_before=64, len_after=60]    1.06    363.9±0.73µs        ? ?/sec    1.00    341.8±0.64µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN > 12/string [size=1024, len_before=64, len_after=60]          1.00     44.2±0.11µs        ? ?/sec    1.02     45.3±0.17µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN > 12/string [size=4096, len_before=64, len_after=60]          1.00    168.6±0.13µs        ? ?/sec    1.03    174.0±0.30µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN > 12/string [size=8192, len_before=64, len_after=60]          1.00    343.9±0.93µs        ? ?/sec    1.02    352.1±0.62µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN > 12/string_view [size=1024, len_before=64, len_after=60]     1.07     44.6±0.05µs        ? ?/sec    1.00     41.7±0.04µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN > 12/string_view [size=4096, len_before=64, len_after=60]     1.05    170.7±0.11µs        ? ?/sec    1.00    163.0±1.12µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN > 12/string_view [size=8192, len_before=64, len_after=60]     1.07    348.4±0.87µs        ? ?/sec    1.00    324.8±0.51µs        ? ?/sec

Looks like a reasonable improvement to me

alamb
alamb previously approved these changes Sep 23, 2024
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Rachelint -- I went thought this PR again and it looks good

Since I had this PR checked out locally for review, I went ahead and remove the unsafe pointer calculation to try and move this PR along (I know it has been outstanding for too long)

Thanks again!

let views_buf = ScalarBuffer::from(views_buf);
let nulls_buf = null_builder.finish();

// Safety:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related discussion: apache/arrow-rs#6430

// Safety:
// `trim_str` is computed from `str::trim_xxx_matches`,
// and its addr is ensured to be >= `origin_str`'s
let start = unsafe { trim_str.as_ptr().offset_from(src_str.as_ptr()) as u32 };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran this diff:

diff --git a/datafusion/functions/src/string/common.rs b/datafusion/functions/src/string/common.rs
index 4f70374b7..f796d10c2 100644
--- a/datafusion/functions/src/string/common.rs
+++ b/datafusion/functions/src/string/common.rs
@@ -204,10 +204,7 @@ fn trim_and_append_str<'a>(
     if let (Some(src_str), Some(characters)) = (src_str_opt, trim_characters_opt) {
         let trim_str = trim_func(src_str, characters);

-        // Safety:
-        // `trim_str` is computed from `str::trim_xxx_matches`,
-        // and its addr is ensured to be >= `origin_str`'s
-        let start = unsafe { trim_str.as_ptr().offset_from(src_str.as_ptr()) as u32 };
+        let start = (src_str.as_bytes().len() - trim_str.as_bytes().len()) as u32;

         make_and_append_view(views_buf, null_builder, raw, trim_str, start);
     } else {

And all tests passed.

@alamb
Copy link
Contributor

alamb commented Sep 23, 2024

How about we merge this PR and then you can continue work on the optimizations as follow on PRs?

@Rachelint
Copy link
Contributor Author

How about we merge this PR and then you can continue work on the optimizations as follow on PRs?

I am checking about #12395 (comment)
Just wait a minute for me.

@alamb
Copy link
Contributor

alamb commented Sep 23, 2024

How about we merge this PR and then you can continue work on the optimizations as follow on PRs?

I am checking about #12395 (comment) Just wait a minute for me.

Sure -- no worries -- we can wait too. I just feel bad about how long this PR has been outstanding

@alamb alamb dismissed their stale review September 23, 2024 18:34

Hmm, some newly added tests seem to be failing

@Rachelint
Copy link
Contributor Author

How about we merge this PR and then you can continue work on the optimizations as follow on PRs?

I am checking about #12395 (comment) Just wait a minute for me.

Sure -- no worries -- we can wait too. I just feel bad about how long this PR has been outstanding

Thanks. I added some case to check it. And unfortunately, I found maybe we can't remove the unsafe codes currently.

@alamb
Copy link
Contributor

alamb commented Sep 23, 2024

Thanks. I added some case to check it. And unfortunately, I found maybe we can't remove the unsafe codes currently.

I just can't explain why pointer arithmetic is needed -- I think it is important to fix (or really understand) before merging

@Rachelint
Copy link
Contributor Author

Rachelint commented Sep 23, 2024

Thanks. I added some case to check it. And unfortunately, I found maybe we can't remove the unsafe codes currently.

I just can't explain why pointer arithmetic is needed -- I think it is important to fix (or really understand) before merging

Maybe disscussion in #12387 can help.

The logic of str::trim_xxx_matches is well explained by @Kev1n8

I've looked into the implementation of [general_trim](https://github.com/apache/datafusion/blob/f5c47fa274d53c1d524a1fb788d9a063bf5240ef/datafusion/functions/src/string/common.rs#L51), it uses the str::trim_xxx_matches methods the obtain the "substring". Furthermore, inside the str::trim_xxx_matches method, it first computes the [start, end) boundary and slices the str. 

But unforunately, the needed feature Pattern for getting the start index by safe codes is still unstable... @Kev1n8 mentioned that, too.

The index here is useful for modifying views. Unfortunately, currently the feature Pattern it uses is unstable.

So eventually, we can just through the pointer arithmetic to get the start index currently...


Update:
Unsafe codes have removed, actually we can get the needed index in safe way.

@Rachelint
Copy link
Contributor Author

Rachelint commented Sep 23, 2024

I have filed an issue #12597 to track the introduced unsafe codes.

And added a todo to mention this issue for tracking and explaining why we introduce unsafe codes here.

@github-actions github-actions bot added the physical-expr Physical Expressions label Sep 25, 2024
@Rachelint
Copy link
Contributor Author

Rachelint commented Sep 25, 2024

@alamb I found we indeed don't need the unsafe pointer arithmetic to get the start_offset, and I have swithed to a safe way here. Thanks much for suggestion!

https://github.com/Rachelint/arrow-datafusion/blob/f8174626e47d147e90c6715f5052ccfa269f0493/datafusion/functions/src/string/common.rs#L80

@github-actions github-actions bot removed the sqllogictest SQL Logic Tests (.slt) label Sep 25, 2024
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Rachelint -- very nice. Thank you for sticking with it

I reran the benchmarks one more time and they looks good to me. Nice work.

I merged up from main and removed some redundant tests and plan to merge this PR when it passes CI.

++ critcmp main string-view-trim
group                                                                                    main                                   string-view-trim
-----                                                                                    ----                                   ----------------
INPUT LEN <= 12/large_string [size=1024, len_before=12, len_after=8]                     1.03     42.9±0.05µs        ? ?/sec    1.00     41.7±0.06µs        ? ?/sec
INPUT LEN <= 12/large_string [size=4096, len_before=12, len_after=8]                     1.02    165.5±0.56µs        ? ?/sec    1.00    161.5±0.06µs        ? ?/sec
INPUT LEN <= 12/large_string [size=8192, len_before=12, len_after=8]                     1.02    327.8±0.19µs        ? ?/sec    1.00    320.2±0.19µs        ? ?/sec
INPUT LEN <= 12/string [size=1024, len_before=12, len_after=8]                           1.01     41.6±0.03µs        ? ?/sec    1.00     41.2±0.12µs        ? ?/sec
INPUT LEN <= 12/string [size=4096, len_before=12, len_after=8]                           1.01    160.3±0.11µs        ? ?/sec    1.00    159.4±0.09µs        ? ?/sec
INPUT LEN <= 12/string [size=8192, len_before=12, len_after=8]                           1.01    318.5±1.89µs        ? ?/sec    1.00    316.4±0.48µs        ? ?/sec
INPUT LEN <= 12/string_view [size=1024, len_before=12, len_after=8]                      1.06     41.6±0.01µs        ? ?/sec    1.00     39.4±0.02µs        ? ?/sec
INPUT LEN <= 12/string_view [size=4096, len_before=12, len_after=8]                      1.03    160.5±0.07µs        ? ?/sec    1.00    155.1±0.13µs        ? ?/sec
INPUT LEN <= 12/string_view [size=8192, len_before=12, len_after=8]                      1.03    318.0±0.20µs        ? ?/sec    1.00    309.3±1.15µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN <= 12/large_string [size=1024, len_before=64, len_after=4]    1.10    184.3±0.27µs        ? ?/sec    1.00    167.4±0.10µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN <= 12/large_string [size=4096, len_before=64, len_after=4]    1.09    727.0±0.67µs        ? ?/sec    1.00    665.8±0.41µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN <= 12/large_string [size=8192, len_before=64, len_after=4]    1.09   1448.7±1.76µs        ? ?/sec    1.00   1329.6±1.77µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN <= 12/string [size=1024, len_before=64, len_after=4]          1.09    181.8±0.18µs        ? ?/sec    1.00    167.4±0.11µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN <= 12/string [size=4096, len_before=64, len_after=4]          1.09   725.8±11.87µs        ? ?/sec    1.00    664.9±0.62µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN <= 12/string [size=8192, len_before=64, len_after=4]          1.09   1443.8±2.64µs        ? ?/sec    1.00   1324.8±1.25µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN <= 12/string_view [size=1024, len_before=64, len_after=4]     1.11    181.8±0.11µs        ? ?/sec    1.00    163.8±0.22µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN <= 12/string_view [size=4096, len_before=64, len_after=4]     1.11    724.6±0.84µs        ? ?/sec    1.00    651.6±0.22µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN <= 12/string_view [size=8192, len_before=64, len_after=4]     1.11   1444.8±0.63µs        ? ?/sec    1.00   1302.7±0.98µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN > 12/large_string [size=1024, len_before=64, len_after=60]    1.01     44.4±0.12µs        ? ?/sec    1.00     44.0±0.07µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN > 12/large_string [size=4096, len_before=64, len_after=60]    1.01    171.7±0.30µs        ? ?/sec    1.00    170.4±0.15µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN > 12/large_string [size=8192, len_before=64, len_after=60]    1.01    349.8±0.83µs        ? ?/sec    1.00    347.1±0.53µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN > 12/string [size=1024, len_before=64, len_after=60]          1.00     43.0±0.06µs        ? ?/sec    1.00     43.1±0.20µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN > 12/string [size=4096, len_before=64, len_after=60]          1.00    166.2±0.21µs        ? ?/sec    1.01    167.5±0.32µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN > 12/string [size=8192, len_before=64, len_after=60]          1.00    336.5±1.03µs        ? ?/sec    1.01    341.1±0.42µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN > 12/string_view [size=1024, len_before=64, len_after=60]     1.08     43.8±0.17µs        ? ?/sec    1.00     40.6±0.03µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN > 12/string_view [size=4096, len_before=64, len_after=60]     1.06    169.0±0.55µs        ? ?/sec    1.00    159.0±0.21µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN > 12/string_view [size=8192, len_before=64, len_after=60]     1.08    342.9±0.96µs        ? ?/sec    1.00    316.5±0.57µs        ? ?/sec

@@ -982,5 +982,93 @@ logical_plan
01)Projection: temp.column2 || temp.column3
02)--TableScan: temp projection=[column2, column3]

################################################
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I double checked and @goldmedal 's recent changes (I think created after this PR) make these tests redundant

WHich then runs the tests in https://github.com/apache/datafusion/blob/main/datafusion/sqllogictest/test_files/string/string_query.slt.part

I removed these tests from this PR to keep things moving

@alamb alamb changed the title Improve performance of trim for string view Improve performance of trim for string view (10%) Sep 25, 2024
@alamb alamb merged commit dbfde67 into apache:main Sep 25, 2024
25 checks passed
@alamb
Copy link
Contributor

alamb commented Sep 25, 2024

🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
functions physical-expr Physical Expressions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve performance of *TRIM functions for StringViewArray
3 participants