Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spark 4: Fix parquet_test.py [databricks] #11519

Merged
merged 2 commits into from
Oct 8, 2024

Conversation

mythrocks
Copy link
Collaborator

@mythrocks mythrocks commented Sep 27, 2024

Fixes #11015.
Fixes #11531.
Contributes to #11004 and #10661.

This commit addresses the tests that fail in parquet_test.py, when run on Spark 4.

  1. Some of the tests were failing as a result of [FEA] Support aggregates when ANSI mode is enabled #5114. Those tests have been disabled, at least until we get around to supporting aggregations with ANSI mode enabled.

  2. test_parquet_check_schema_compatibility fails on Spark 4 regardless of ANSI mode, because it tests implicit type promotions where the read schema includes wider columns than the write schema. This will require new code. The test is disabled until [BUG] Support for wider types in read schemas for Parquet Reads #11512 is addressed.

  3. test_parquet_int32_downcast had an erroneous setup phase that fails in ANSI mode. This has been corrected. The test was refactored to run in ANSI and non-ANSI mode.

Fixes NVIDIA#11015.
Contributes to NVIDIA#11004.

This commit addresses the tests that fail in parquet_test.py, when
run on Spark 4.

1. Some of the tests were failing as a result of NVIDIA#5114.  Those tests
have been disabled, at least until we get around to supporting
aggregations with ANSI mode enabled.

2. `test_parquet_check_schema_compatibility` fails on Spark 4 regardless
of ANSI mode, because it tests implicit type promotions where the read
schema includes wider columns than the write schema.  This will require
new code.  The test is disabled until NVIDIA#11512 is addressed.

3. `test_parquet_int32_downcast` had an erroneous setup phase that fails
   in ANSI mode.  This has been corrected. The test was refactored to
run in ANSI and non-ANSI mode.

Signed-off-by: MithunR <mithunr@nvidia.com>
@mythrocks mythrocks added test Only impacts tests Spark 4.0+ Spark 4.0+ issues labels Sep 27, 2024
@mythrocks mythrocks self-assigned this Sep 27, 2024
@mythrocks
Copy link
Collaborator Author

Build

@razajafri
Copy link
Collaborator

Thanks for this.

The fix for test_parquet_check_schema_compatibility will directly port on Databricks 14.3

razajafri
razajafri previously approved these changes Oct 2, 2024
@mythrocks
Copy link
Collaborator Author

@razajafri: I've modified this PR to also include a fix for #11531.

@mythrocks mythrocks changed the title Spark 4: Fix parquet_test.py. Spark 4: Fix parquet_test.py [databricks] Oct 4, 2024
@mythrocks
Copy link
Collaborator Author

Build

@mythrocks
Copy link
Collaborator Author

@razajafri: I've modified this PR to also include a fix for #11531.

I have tested that this addresses the test failure in Databricks 14.3.

In making this change, I had to dismiss your earlier review. If you wouldn't mind taking another look, I'd appreciate it.

@mythrocks mythrocks merged commit 5eeddc6 into NVIDIA:branch-24.12 Oct 8, 2024
45 checks passed
@mythrocks
Copy link
Collaborator Author

This has been merged. Thank you for the review, @razajafri.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Spark 4.0+ Spark 4.0+ issues test Only impacts tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants