fix: Fallback to Spark for lpad/rpad for unsupported arguments & fix negative length handling #2630

andygrove · 2025-10-22T18:05:26Z

Which issue does this PR close?

Closes #2624
Closes #2631

Rationale for this change

Fix various bugs that caused queries to fail at runtime.

What changes are included in this PR?

Replace existing tests with one comprehensive test that is re-used for both lpad and road
Fix handling of negative length
Fallback to Spark for literal str argument
Fallback to Spark for non-literal pad argument

How are these changes tested?

New tests

codecov-commenter · 2025-10-22T18:27:03Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 59.17%. Comparing base (f09f8af) to head (a265595).
⚠️ Report is 636 commits behind head on main.

Additional details and impacted files

@@             Coverage Diff              @@
##               main    #2630      +/-   ##
============================================
+ Coverage     56.12%   59.17%   +3.04%     
- Complexity      976     1447     +471     
============================================
  Files           119      147      +28     
  Lines         11743    13743    +2000     
  Branches       2251     2360     +109     
============================================
+ Hits           6591     8132    +1541     
- Misses         4012     4388     +376     
- Partials       1140     1223      +83

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

mbutrovich · 2025-10-22T19:52:22Z

cc @coderfender since they were looking at this recently: #2099

coderfender · 2025-10-22T19:55:00Z

Thank you for the mention @mbutrovich . @andygrove let me know if I can help in anyways to get this fixed soon

andygrove · 2025-10-22T19:56:25Z

Thank you for the mention @mbutrovich . @andygrove let me know if I can help in anyways to get this fixed soon

It would be great if you could review the new test and make sure it covers everything the original tests covered. I believe that the underlying issues are resolved now.

coderfender · 2025-10-22T20:18:14Z

Sure @andygrove

coderfender · 2025-10-22T23:50:20Z

LGTM . Thank you for the prompt fix @andygrove

andygrove · 2025-10-23T17:51:58Z

Moving to draft until #2635 is merged

comphead · 2025-10-23T21:56:44Z

native/spark-expr/src/static_invoke/char_varchar_utils/read_side_padding.rs

-                        is_left_pad,
-                    )?),
+                    Some(string) => {
+                        if length < 0 {


its better to put happy path first in if stmt for compute intensive parts, so CPU won't have to execute eagerly instructions and then fall it back

Thanks. I have updated this.

parthchandra

lgtm.

parthchandra · 2025-10-23T22:01:07Z

spark/src/test/scala/org/apache/comet/CometStringExpressionSuite.scala

+    val edgeCases = Seq(
+      "é", // unicode 'e\\u{301}'
+      "é", // unicode '\\u{e9}'
+      "తెలుగు")


Out of curiosity, what makes this an edge case?

The first two were added in #772 to make sure Comet was consistent with Spark even though Rust and Java have different ways of representing unicode and graphemes.

parthchandra · 2025-10-23T22:05:53Z

spark/src/main/scala/org/apache/comet/serde/strings.scala

+    if (expr.str.isInstanceOf[Literal]) {
+      return Unsupported(Some("Scalar values are not supported for the str argument"))
+    }
+    if (!expr.pad.isInstanceOf[Literal]) {


I don't know if we'll ever hit this. As far as I can see (in functions.lpad), Spark expects the pad argument to be a literal as well.

Spark doesn't require pad to be a literal:

scala> spark.sql("select a, lpad('foo', 6, a) from t1").show +---+---------------+ | a|lpad(foo, 6, a)| +---+---------------+ | $| $$$foo| | @| @@@foo| +---+---------------+

case class StringRPad(str: Expression, len: Expression, pad: Expression = Literal(" "))

I suppose this is a good example of the benefit of fuzz testing (which is how this issue was discovered). The fuzzer will generate test cases that most developers would not consider. It does seem unlikely that anyone would want to use a column for the pad value, but I suppose it is possible that someone may have that requirement.

hsiang-c · 2025-10-23T22:53:19Z

spark/src/test/scala/org/apache/comet/CometStringExpressionSuite.scala

+
+    // test all combinations of scalar and array arguments
+    for (str <- Seq("'hello'", "str")) {
+      for (len <- Seq("6", "-6", "0", "len % 10")) {


hsiang-c · 2025-10-23T22:57:19Z

spark/src/test/scala/org/apache/comet/CometStringExpressionSuite.scala

+    df.createOrReplaceTempView("t1")
+
+    // test all combinations of scalar and array arguments
+    for (str <- Seq("'hello'", "str")) {


Spark doc says it also supports binary string input: e.g unhex('aabb').

Good catch, thanks. That opens up another set of issues! 😭

I added separate tests for binary inputs

hsiang-c

LGTM

andygrove added 6 commits October 22, 2025 09:34

add version of generateDataFrame that accepts a custom schema

e365ac3

refactor

10c960e

refactor

0ebd143

Refactor

3ffa0df

simplify

fccb7e6

fall back to Spark if lpad/rpad pad argument is not a literal

aeca95b

andygrove added 7 commits October 22, 2025 13:05

save progress

cb536a8

upmerge

432f68b

fix

3bda19c

fix

7adfff4

remove old tests

fb3cdeb

test 2-arg version

d8ac26a

max string length

e1f7849

andygrove changed the title ~~fix: Fallback to Spark for lpad/rpad if pad argument is not a literal~~ fix: Fallback to Spark for lpad/rpad for unsupported arguments & fix negative length handling Oct 22, 2025

andygrove marked this pull request as ready for review October 22, 2025 19:44

test negative literal length

9951885

extra unicode chars

ed3648e

andygrove added 2 commits October 22, 2025 13:58

more unicode

d0efb26

clippy

ca151c2

andygrove added 2 commits October 22, 2025 15:06

scalastyle

40130f4

scalastyle

6a7f4eb

andygrove marked this pull request as draft October 23, 2025 17:51

improve test

1ce35d0

andygrove added 2 commits October 23, 2025 15:38

fix

af0bb81

fix

27cc67e

andygrove marked this pull request as ready for review October 23, 2025 21:39

comphead reviewed Oct 23, 2025

View reviewed changes

parthchandra approved these changes Oct 23, 2025

View reviewed changes

address feedback

170b700

hsiang-c reviewed Oct 23, 2025

View reviewed changes

add tests for binary input

a265595

hsiang-c approved these changes Oct 24, 2025

View reviewed changes

andygrove merged commit 00922cf into apache:main Oct 24, 2025
102 checks passed

fix: Fallback to Spark for lpad/rpad for unsupported arguments & fix negative length handling #2630

fix: Fallback to Spark for lpad/rpad for unsupported arguments & fix negative length handling #2630

Uh oh!

Conversation

andygrove commented Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

How are these changes tested?

Uh oh!

codecov-commenter commented Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

mbutrovich commented Oct 22, 2025

Uh oh!

coderfender commented Oct 22, 2025

Uh oh!

andygrove commented Oct 22, 2025

Uh oh!

coderfender commented Oct 22, 2025

Uh oh!

coderfender commented Oct 22, 2025

Uh oh!

andygrove commented Oct 23, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

parthchandra left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hsiang-c Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hsiang-c left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

andygrove commented Oct 22, 2025 •

edited

Loading

codecov-commenter commented Oct 22, 2025 •

edited

Loading

hsiang-c Oct 23, 2025 •

edited

Loading