Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix tests failures in parquet_test.py #11015

Open
Tracked by #11004
razajafri opened this issue Jun 8, 2024 · 2 comments
Open
Tracked by #11004

Fix tests failures in parquet_test.py #11015

razajafri opened this issue Jun 8, 2024 · 2 comments
Assignees
Labels
bug Something isn't working Spark 4.0+ Spark 4.0+ issues

Comments

@razajafri
Copy link
Collaborator

FAILED ../../../../integration_tests/src/main/python/parquet_test.py::test_many_column_project
FAILED ../../../../integration_tests/src/main/python/parquet_test.py::test_nested_pruning_and_case_insensitive
FAILED ../../../../integration_tests/src/main/python/parquet_test.py::test_parquet_check_schema_compatibility
FAILED ../../../../integration_tests/src/main/python/parquet_test.py::test_parquet_compress_read_round_trip
FAILED ../../../../integration_tests/src/main/python/parquet_test.py::test_parquet_decimal_read_legacy
FAILED ../../../../integration_tests/src/main/python/parquet_test.py::test_parquet_fallback
FAILED ../../../../integration_tests/src/main/python/parquet_test.py::test_parquet_input_meta
FAILED ../../../../integration_tests/src/main/python/parquet_test.py::test_parquet_int32_downcast
FAILED ../../../../integration_tests/src/main/python/parquet_test.py::test_parquet_nested_column_missing
FAILED ../../../../integration_tests/src/main/python/parquet_test.py::test_parquet_partitioned_read_just_partitions
FAILED ../../../../integration_tests/src/main/python/parquet_test.py::test_parquet_pred_push_round_trip
FAILED ../../../../integration_tests/src/main/python/parquet_test.py::test_parquet_read_avoid_coalesce_incompatible_files
FAILED ../../../../integration_tests/src/main/python/parquet_test.py::test_parquet_read_buffer_allocation_empty_blocks
FAILED ../../../../integration_tests/src/main/python/parquet_test.py::test_parquet_read_coalescing_multiple_files
FAILED ../../../../integration_tests/src/main/python/parquet_test.py::test_parquet_read_count
FAILED ../../../../integration_tests/src/main/python/parquet_test.py::test_parquet_read_encryption
FAILED ../../../../integration_tests/src/main/python/parquet_test.py::test_parquet_read_forced_binary_schema
FAILED ../../../../integration_tests/src/main/python/parquet_test.py::test_parquet_read_ignore_missing
FAILED ../../../../integration_tests/src/main/python/parquet_test.py::test_parquet_read_int_upcast
FAILED ../../../../integration_tests/src/main/python/parquet_test.py::test_parquet_read_merge_schema
FAILED ../../../../integration_tests/src/main/python/parquet_test.py::test_parquet_read_merge_schema_from_conf
FAILED ../../../../integration_tests/src/main/python/parquet_test.py::test_parquet_read_round_trip
FAILED ../../../../integration_tests/src/main/python/parquet_test.py::test_parquet_read_round_trip_binary_as_string
FAILED ../../../../integration_tests/src/main/python/parquet_test.py::test_parquet_read_roundtrip_datetime_with_legacy_rebase
FAILED ../../../../integration_tests/src/main/python/parquet_test.py::test_parquet_read_schema_missing_cols
FAILED ../../../../integration_tests/src/main/python/parquet_test.py::test_parquet_read_with_corrupt_files
FAILED ../../../../integration_tests/src/main/python/parquet_test.py::test_parquet_reading_from_unaligned_pages_all_types
FAILED ../../../../integration_tests/src/main/python/parquet_test.py::test_parquet_reading_from_unaligned_pages_all_types_dict_optimized
FAILED ../../../../integration_tests/src/main/python/parquet_test.py::test_parquet_reading_from_unaligned_pages_basic_filters
FAILED ../../../../integration_tests/src/main/python/parquet_test.py::test_parquet_reading_from_unaligned_pages_basic_filters_with_nulls
FAILED ../../../../integration_tests/src/main/python/parquet_test.py::test_parquet_simple_partitioned_read
FAILED ../../../../integration_tests/src/main/python/parquet_test.py::test_read_case_col_name
FAILED ../../../../integration_tests/src/main/python/parquet_test.py::test_small_file_memory
@razajafri razajafri added bug Something isn't working ? - Needs Triage Need team to review and classify labels Jun 8, 2024
@razajafri razajafri added the Spark 4.0+ Spark 4.0+ issues label Jun 8, 2024
@mattahrens mattahrens removed the ? - Needs Triage Need team to review and classify label Jun 11, 2024
@razajafri
Copy link
Collaborator Author

Setting ANSI to false we have the following tests failing

test_nested_pruning_and_case_insensitive (missing gpu metric bufferTime)
test_parquet_check_schema_compatibility (didn't raise the right Exception)
test_parquet_compress_read_round_trip (missing gpu metric bufferTime)
test_parquet_decimal_read_legacy (missing gpu metric bufferTime)
test_parquet_input_meta (key not found: gpuDecodeTime)
test_parquet_int32_downcast (key not found: bufferTime)
test_parquet_nested_column_missing (key not found: bufferTime)
test_parquet_partitioned_read_just_partitions (The SQL config 'spark.sql.legacy.parquet.datetimeRebaseModeInWrite' was removed in the version 4.0.0. Use 'spark.sql.parquet.datetimeRebaseModeInWrite' instead)
test_parquet_read_avoid_coalesce_incompatible_files (key not found: bufferTime)
test_parquet_read_buffer_allocation_empty_blocks (key not found: gpuDecodeTime)
test_parquet_read_coalescing_multiple_files (key not found: bufferTime)
test_parquet_read_encryption (Could not read footer.)
test_parquet_read_forced_binary_schema (key not found: bufferTime)
test_parquet_read_int_upcast (key not found: bufferTime)
test_parquet_read_merge_schema (The SQL config 'spark.sql.legacy.parquet.datetimeRebaseModeInWrite' was removed in the version 4.0.0. Use 'spark.sql.parquet.datetimeRebaseModeInWrite' instead.)
test_parquet_read_merge_schema_from_conf (The SQL config 'spark.sql.legacy.parquet.datetimeRebaseModeInWrite' was removed in the version 4.0.0. Use 'spark.sql.parquet.datetimeRebaseModeInWrite' instead.)
test_parquet_read_round_trip (key not found: bufferTime)
test_parquet_read_round_trip_binary_as_string (key not found: bufferTime)
test_parquet_read_roundtrip_datetime_with_legacy_rebase (The SQL config 'spark.sql.legacy.parquet.int96RebaseModeInWrite' was removed in the version 4.0.0. Use 'spark.sql.parquet.int96RebaseModeInWrite' instead.)
test_parquet_read_schema_missing_cols (key not found: bufferTime)
test_parquet_read_with_corrupt_files (different output length)
test_parquet_reading_from_unaligned_pages_all_types (key not found: gpuDecodeTime)
test_parquet_reading_from_unaligned_pages_all_types_dict_optimized (key not found: bufferTime)
test_parquet_reading_from_unaligned_pages_basic_filters (key not found: gpuDecodeTime)
test_parquet_reading_from_unaligned_pages_basic_filters_with_nulls (key not found: gpuDecodeTime)
test_parquet_simple_partitioned_read (The SQL config 'spark.sql.legacy.parquet.datetimeRebaseModeInWrite' was removed in the version 4.0.0. Use 'spark.sql.parquet.datetimeRebaseModeInWrite' instead.)
test_read_case_col_name (key not found: bufferTime)
test_small_file_memory (key not found: bufferTime)

@razajafri razajafri self-assigned this Aug 26, 2024
@mythrocks mythrocks assigned mythrocks and unassigned razajafri Sep 24, 2024
@mythrocks
Copy link
Collaborator

Taking this over now.

mythrocks added a commit to mythrocks/spark-rapids that referenced this issue Sep 27, 2024
Fixes NVIDIA#11015.
Contributes to NVIDIA#11004.

This commit addresses the tests that fail in parquet_test.py, when
run on Spark 4.

1. Some of the tests were failing as a result of NVIDIA#5114.  Those tests
have been disabled, at least until we get around to supporting
aggregations with ANSI mode enabled.

2. `test_parquet_check_schema_compatibility` fails on Spark 4 regardless
of ANSI mode, because it tests implicit type promotions where the read
schema includes wider columns than the write schema.  This will require
new code.  The test is disabled until NVIDIA#11512 is addressed.

3. `test_parquet_int32_downcast` had an erroneous setup phase that fails
   in ANSI mode.  This has been corrected. The test was refactored to
run in ANSI and non-ANSI mode.

Signed-off-by: MithunR <mithunr@nvidia.com>
mythrocks added a commit that referenced this issue Oct 8, 2024
* Spark 4:  Fix parquet_test.py.

Fixes #11015. (Spark 4 failure.)
Also fixes #11531. (Databricks 14.3 failure.)
Contributes to #11004.

This commit addresses the tests that fail in parquet_test.py, when
run on Spark 4.

1. Some of the tests were failing as a result of #5114.  Those tests
have been disabled, at least until we get around to supporting
aggregations with ANSI mode enabled.

2. `test_parquet_check_schema_compatibility` fails on Spark 4 regardless
of ANSI mode, because it tests implicit type promotions where the read
schema includes wider columns than the write schema.  This will require
new code.  The test is disabled until #11512 is addressed.

3. `test_parquet_int32_downcast` had an erroneous setup phase that fails
   in ANSI mode.  This has been corrected. The test was refactored to
run in ANSI and non-ANSI mode.

Signed-off-by: MithunR <mithunr@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Spark 4.0+ Spark 4.0+ issues
Projects
None yet
Development

No branches or pull requests

3 participants