Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] parquet_testing_test.py failed on "AssertionError: GPU and CPU boolean values are different" #11715

Closed
NvTimLiu opened this issue Nov 12, 2024 · 4 comments · Fixed by rapidsai/cudf#17302
Assignees
Labels
bug Something isn't working

Comments

@NvTimLiu
Copy link
Collaborator

NvTimLiu commented Nov 12, 2024

Describe the bug

This failure was first/only occurred on Databricks-12.2, let's keep an eye to see if it is reproducible.

 =================================== FAILURES ===================================
 _ test_parquet_testing_valid_files[confs0-/home/ubuntu/spark-rapids/thirdparty/parquet-testing/data/alltypes_tiny_pages.parquet] _
 [gw0] linux -- Python 3.8.10 /usr/bin/python
 
 path = '/home/ubuntu/spark-rapids/thirdparty/parquet-testing/data/alltypes_tiny_pages.parquet'
 confs = {'spark.rapids.sql.format.parquet.reader.footer.type': 'NATIVE', 'spark.sql.legacy.parquet.datetimeRebaseModeInRead': 'CORRECTED', uet.int96RebaseModeInRead': 'CORRECTED'}
 
     @pytest.mark.parametrize("path", gen_testing_params_for_valid_files())
     @pytest.mark.parametrize("confs", [_native_reader_confs, _java_reader_confs])
     @allow_non_gpu(*non_utc_allow)
     def test_parquet_testing_valid_files(path, confs):
 >       assert_gpu_and_cpu_are_equal_collect(lambda spark: spark.read.parquet(path), conf=confs)
 
 ../../src/main/python/parquet_testing_test.py:162: 
 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
 ../../src/main/python/asserts.py:599: in assert_gpu_and_cpu_are_equal_collect
     _assert_gpu_and_cpu_are_equal(func, 'COLLECT', conf=conf, is_cpu_first=is_cpu_first, nc_before_compare=result_canonicalize_func_before_compare)
 ../../src/main/python/asserts.py:521: in _assert_gpu_and_cpu_are_equal
     assert_equal(from_cpu, from_gpu)
 ../../src/main/python/asserts.py:111: in assert_equal
     _assert_equal(cpu, gpu, float_check=get_float_check(), path=[])
 ../../src/main/python/asserts.py:43: in _assert_equal
     _assert_equal(cpu[index], gpu[index], float_check, path + [index])
 ../../src/main/python/asserts.py:36: in _assert_equal
     _assert_equal(cpu[field], gpu[field], float_check, path + [field])
 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
 
 cpu = False, gpu = True
 float_check = <function get_float_check.<locals>.<lambda> at 0x7f73eac6a5e0>
 path = [2047, 'bool_col']
 
     def _assert_equal(cpu, gpu, float_check, path):
         t = type(cpu)
         if (t is Row):
             assert len(cpu) == len(gpu), "CPU and GPU row have different lengths at {} CPU: {} GPU: {}".format(path, len(cpu), len(gpu))
             if hasattr(cpu, "__fields__") and hasattr(gpu, "__fields__"):
                 assert cpu.__fields__ == gpu.__fields__, "CPU and GPU row have different fields at {} CPU: {} GPU: {}".format(path, cpu.__fields__, 
                 for field in cpu.__fields__:
                     _assert_equal(cpu[field], gpu[field], float_check, path + [field])
             else:
                 for index in range(len(cpu)):
                     _assert_equal(cpu[index], gpu[index], float_check, path + [index])
         elif (t is list):
             assert len(cpu) == len(gpu), "CPU and GPU list have different lengths at {} CPU: {} GPU: {}".format(path, len(cpu), len(gpu))
             for index in range(len(cpu)):
                 _assert_equal(cpu[index], gpu[index], float_check, path + [index])
         elif (t is tuple):
             assert len(cpu) == len(gpu), "CPU and GPU list have different lengths at {} CPU: {} GPU: {}".format(path, len(cpu), len(gpu))
             for index in range(len(cpu)):
                 _assert_equal(cpu[index], gpu[index], float_check, path + [index])
         elif (t is pytypes.GeneratorType):
             index = 0
             # generator has no zip :( so we have to do this the hard way
             done = False
             while not done:
                 sub_cpu = None
                 sub_gpu = None
                 try:
                     sub_cpu = next(cpu)
                 except StopIteration:
                     done = True
     
                 try:
                     sub_gpu = next(gpu)
                 except StopIteration:
                     done = True
     
                 if done:
                     assert sub_cpu == sub_gpu and sub_cpu == None, "CPU and GPU generators have different lengths at {}".format(path)
                 else:
                     _assert_equal(sub_cpu, sub_gpu, float_check, path + [index])
     
                 index = index + 1
         elif (t is dict):
             # The order of key/values is not guaranteed in python dicts, nor are they guaranteed by Spark
             # so sort the items to do our best with ignoring the order of dicts
             cpu_items = list(cpu.items()).sort(key=_RowCmp)
             gpu_items = list(gpu.items()).sort(key=_RowCmp)
             _assert_equal(cpu_items, gpu_items, float_check, path + ["map"])
         elif (t is int):
             assert cpu == gpu, "GPU and CPU int values are different at {}".format(path)
         elif (t is float):
             if (math.isnan(cpu)):
                 assert math.isnan(gpu), "GPU and CPU float values are different at {}".format(path)
             else:
                 assert float_check(cpu, gpu), "GPU and CPU float values are different {}".format(path)
         elif isinstance(cpu, str):
             assert cpu == gpu, "GPU and CPU string values are different at {}".format(path)
         elif isinstance(cpu, datetime):
             assert cpu == gpu, "GPU and CPU timestamp values are different at {}".format(path)
         elif isinstance(cpu, date):
             assert cpu == gpu, "GPU and CPU date values are different at {}".format(path)
         elif isinstance(cpu, bool):
 >           assert cpu == gpu, "GPU and CPU boolean values are different at {}".format(path)
 E           AssertionError: GPU and CPU boolean values are different at [2047, 'bool_col']
 
 ../../src/main/python/asserts.py:91: AssertionError
  ---------------------------- Captured stderr setup -----------------------------

@NvTimLiu NvTimLiu added ? - Needs Triage Need team to review and classify bug Something isn't working labels Nov 12, 2024
@jlowe
Copy link
Contributor

jlowe commented Nov 12, 2024

@NvTimLiu please include the DATAGEN_SEED setting for any test failures ,as it may be crucial to reproduce it.

In this case , it was DATAGEN_SEED=1731408247

@parthosa
Copy link
Collaborator

Similar failure for Databricks Azure 13.3

integration_tests/src/test/resources/parquet-testing/data/alltypes_tiny_pages.parquet][DATAGEN_SEED=1731427707, TZ=UTC, INJECT_OOM]
    - AssertionError: GPU and CPU boolean values are different at [2855, 'bool_col']

@pmattione-nvidia pmattione-nvidia self-assigned this Nov 12, 2024
@pmattione-nvidia
Copy link
Collaborator

I hit it locally, but it only hits randomly. Must be a timing bug, looking into it.

@mattahrens mattahrens removed the ? - Needs Triage Need team to review and classify label Nov 12, 2024
rapids-bot bot pushed a commit to rapidsai/cudf that referenced this issue Nov 13, 2024
This fixes a synchronization bug in the parquet microkernels for plain-decoding bools.  This closes [several](NVIDIA/spark-rapids#11715) timing [issues](NVIDIA/spark-rapids#11716) found during testing of spark-rapids.

Authors:
  - Paul Mattione (https://github.com/pmattione-nvidia)

Approvers:
  - Bradley Dice (https://github.com/bdice)
  - Vukasin Milovanovic (https://github.com/vuule)

URL: #17302
@pmattione-nvidia
Copy link
Collaborator

Fixed by this cuDF PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants