Skip to content

Conversation

@Shekharrajak
Copy link

@Shekharrajak Shekharrajak commented Nov 13, 2025

Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Shekharrajak for you contribution, please add a function to the fuzztesting kit, similar to #2755

@mbutrovich
Copy link
Contributor

mbutrovich commented Nov 13, 2025

In the past I think we've encountered differences in Java and Rust's regex engines wrt graphemes. Could we get some larger UTF-8 characters in the tests?

@andygrove
Copy link
Member

In the past I think we've encountered differences in Java and Rust's regex engines wrt graphemes. Could we get some larger UTF-8 characters in the tests?

We probably need to fall back to Spark unless this config is enabled:

  val COMET_REGEXP_ALLOW_INCOMPATIBLE: ConfigEntry[Boolean] =
    conf("spark.comet.regexp.allowIncompatible")
      .category(CATEGORY_EXEC)
      .doc("Comet is not currently fully compatible with Spark for all regular expressions. " +
        s"Set this config to true to allow them anyway. $COMPAT_GUIDE.")
      .booleanConf
      .createWithDefault(false)

@Shekharrajak
Copy link
Author

Thanks @Shekharrajak for you contribution, please add a function to the fuzztesting kit, similar to #2755

Thanks! Added in commit 8eddd29

@Shekharrajak
Copy link
Author

In the past I think we've encountered differences in Java and Rust's regex engines wrt graphemes. Could we get some larger UTF-8 characters in the tests?

Added tests 987b646

@Shekharrajak
Copy link
Author

In the past I think we've encountered differences in Java and Rust's regex engines wrt graphemes. Could we get some larger UTF-8 characters in the tests?

We probably need to fall back to Spark unless this config is enabled:

  val COMET_REGEXP_ALLOW_INCOMPATIBLE: ConfigEntry[Boolean] =
    conf("spark.comet.regexp.allowIncompatible")
      .category(CATEGORY_EXEC)
      .doc("Comet is not currently fully compatible with Spark for all regular expressions. " +
        s"Set this config to true to allow them anyway. $COMPAT_GUIDE.")
      .booleanConf
      .createWithDefault(false)

How can we check if it is not falling back to Spark's JVM execution? @andygrove

@wForget wForget changed the title Support for StringSplit feat: Support for StringSplit Nov 17, 2025
@Shekharrajak Shekharrajak force-pushed the feature/add-string-split-support branch from dbb34d5 to 1f8f2b2 Compare November 17, 2025 18:52
@codecov-commenter
Copy link

codecov-commenter commented Nov 17, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 45.46%. Comparing base (f09f8af) to head (bcb6ed4).
⚠️ Report is 708 commits behind head on main.

Additional details and impacted files
@@              Coverage Diff              @@
##               main    #2772       +/-   ##
=============================================
- Coverage     56.12%   45.46%   -10.66%     
- Complexity      976     1206      +230     
=============================================
  Files           119      157       +38     
  Lines         11743    14124     +2381     
  Branches       2251     2365      +114     
=============================================
- Hits           6591     6422      -169     
- Misses         4012     6683     +2671     
+ Partials       1140     1019      -121     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for StringSplit

6 participants