Skip to content

[GLUTEN-11355][UT] Add new Spark 4.1 tests#11380

Merged
baibaichen merged 19 commits intoapache:mainfrom
baibaichen:feature/41_ut_ext
Jan 13, 2026
Merged

[GLUTEN-11355][UT] Add new Spark 4.1 tests#11380
baibaichen merged 19 commits intoapache:mainfrom
baibaichen:feature/41_ut_ext

Conversation

@baibaichen
Copy link
Contributor

@baibaichen baibaichen commented Jan 8, 2026

This PR automates the migration of Gluten test and SQL Test Suites from spark 4.1
Fixes #11355

How to Generate Spark 4.1 Gluten Test Suites

With the help of AI, we follow a four-step process—details can be found here

1 Baseline Analysis

  • Scan existing Gluten Spark 4.0 test suites (Gluten*Suite.scala)
  • Map each Gluten suite to its corresponding Spark 4.0 source file
  • Generate mapping report with found/not-found status

2 Package Scope Extraction

  • Extract unique package paths from successfully mapped suites
  • Create a focused list of directories to monitor for changes

3 Delta Detection

  • Compare Spark 4.0 vs Spark 4.1 test files in identified packages
  • Identify net-new *Suite.scala files introduced in Spark 4.1

4 Code Generation

  • For each new Spark 4.1 suite, generate corresponding Gluten wrapper
  • Preserve original package structure
  • Auto-create directory hierarchy as needed

Result: Automated generation of 27 Gluten test suites for gluten Spark 4.1, ensuring test coverage parity with upstream changes.

How to Migrate SQL Test Suites to Spark 4.1

Three-Way Git Merge Approach

This PR migrates SQL test suites (inputs/*. sql and results/*.sql.out) from Spark 4.1 using Git's three-way merge algorithm to preserve both upstream changes and Gluten-specific modifications.


Result:

  • Auto-merged: 165 files
  • New tests added: 31 files (collations, edge cases, recursion, spatial, etc.)
  • Modified tests: 134 files
  • Deleted tests: 2 files (collations.sql -> split into 4 files, timestamp-ntz.sql)

Fix and Exclusion

Cause Type Category Description Affected files
#52406 4.1.0 Test Exclusion Exclude "infer shredding with mixed scale" test. Shredding behavior with mixed decimal scales changed in Spark 4.1. gluten-ut/spark41/.../velox/VeloxTestSettings.scala
Excluded test:
- "infer shredding with mixed scale" in GlutenFileBasedDataSourceSuite
#50599 Fix Bug Fix Implement Kryo serialization for CachedColumnarBatch. Adds proper Kryo serialization support for columnar cache to work with Spark's Kryo-based serialization. backends-velox/.../ColumnarCachedBatchSerializer.scala
gluten-ut/spark41/.../GlutenCacheTableInKryoSuite.scala
#50230 4.1.0 Test Exclusion Exclude GlutenMapStatusEndToEndSuite and configure parallelism. MapStatus serialization behavior changed in Spark 4.1 requiring test adjustments. gluten-ut/spark41/.../velox/VeloxTestSettings.scala
gluten-ut/spark41/.../GlutenMapStatusEndToEndSuite.scala
Excluded tests:
- GlutenMapStatusEndToEndSuite (entire suite)
#52473
#52870
#52891
4.1.0 Test Exclusion Exclude Spark Structured Streaming tests in Gluten. Multiple streaming-related tests need fixes due to upstream Spark changes. gluten-ut/spark41/.../velox/VeloxTestSettings.scala
Excluded tests:
- GlutenStreamRealTimeModeAllowlistSuite: "rtm operator allowlist", "repartition not allowed", "stateful queries not allowed"
- GlutenStreamRealTimeModeE2ESuite: "foreach", "to_json and from_json round-trip", "generateExec passthrough"
- GlutenStreamRealTimeModeSuite: "processAllAvailable"
- 4.1.0 Test Exclusion Exclude failing SQL tests on Spark 4.1. Multiple SQL test files need to be fixed for Spark 4.1 compatibility. gluten-ut/spark41/.../velox/VeloxSQLQueryTestSettings.scala
Excluded tests:
- cast.sql
- describe.sql
- nonansi/cast.sql
- nonansi/st-functions.sql
- scripting/randomly_generated_scripts.sql
- st-functions.sql
- type-coercion-edge-cases.sql
- variant-field-extractions.sql
#50287 Fix Bug Fix Replace RuntimeReplaceable with its replacement to fix unit tests. This handles the RuntimeReplaceable expression changes in Spark to ensure proper expression transformation. gluten-ut/common/.../GlutenTestsTrait.scala
gluten-ut/spark41/.../GlutenTimeExpressionsSuite.scala
gluten-ut/spark41/.../shim/GlutenTestsTrait.scala

How was this patch tested?

Run new tests.

@github-actions github-actions bot added the CORE works for Gluten Core label Jan 8, 2026
@github-actions
Copy link

github-actions bot commented Jan 8, 2026

Run Gluten Clickhouse CI on x86

@github-actions github-actions bot added the INFRA label Jan 8, 2026
@github-actions
Copy link

github-actions bot commented Jan 8, 2026

Run Gluten Clickhouse CI on x86

- 'shims/spark40/**'
- 'shims/spark41/**'
- 'gluten-ut/spark40/**'
- 'gluten-ut/spark41/**'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zzcclp @lwz9103
Since the ClickHouse backend does not support Spark 4.0 and later, I ignored it. I also ignored clickhouse_be_trigger.yml. This should be fine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's OK to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions
Copy link

github-actions bot commented Jan 9, 2026

Run Gluten Clickhouse CI on x86

@github-actions
Copy link

Run Gluten Clickhouse CI on x86

@baibaichen baibaichen marked this pull request as ready for review January 12, 2026 10:03
@github-actions
Copy link

Run Gluten Clickhouse CI on x86

@github-actions
Copy link

Run Gluten Clickhouse CI on x86

@github-actions
Copy link

Run Gluten Clickhouse CI on x86

baibaichen and others added 10 commits January 13, 2026 12:12
Three-way merge performed using Git:
- Base: Spark 4.0.1 (29434ea766b)
- Left: Spark 4.1.0 (e221b56be7b)
- Right: Gluten Spark 4.1 backends-velox

Summary:
- Auto-merged: 165 files
- New tests added: 31 files (collations, edge cases, recursion, spatial, etc.)
- Modified tests: 134 files
- Deleted tests: 2 files (collations.sql -> split into 4 files, timestamp-ntz.sql)

Conflicts resolved:
- inputs/timestamp-ntz.sql: Right deleted + Left modified -> DELETED (per resolution rule)

New test suites from Spark 4.1.0:
- Collations (4 files): aliases, basic, padding-trim, string-functions
- Edge cases (6 files): alias-resolution, extract-value, join-resolution, etc.
- Advanced features: cte-recursion, generators, kllquantiles, thetasketch, time
- Name resolution: order-by-alias, session-variable-precedence, runtime-replaceable
- Spatial functions: st-functions (ANSI and non-ANSI variants)
- Various resolution edge cases

Total files after merge: 671 (up from 613)
@github-actions
Copy link

Run Gluten Clickhouse CI on x86

Copy link
Member

@zhouyuan zhouyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@baibaichen baibaichen merged commit f194cbe into apache:main Jan 13, 2026
117 of 119 checks passed
@baibaichen baibaichen deleted the feature/41_ut_ext branch January 13, 2026 23:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add new spark 4.1.x suite

4 participants