Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[tests] Ray nightly image tests with pandas+numpy fails with TensorDType error #2452

Closed
tgaddair opened this issue Sep 5, 2022 · 6 comments · Fixed by #2493 or #2553
Closed

[tests] Ray nightly image tests with pandas+numpy fails with TensorDType error #2452

tgaddair opened this issue Sep 5, 2022 · 6 comments · Fixed by #2493 or #2553
Assignees
Labels
tests Issue with the tests

Comments

@tgaddair
Copy link
Collaborator

tgaddair commented Sep 5, 2022

(_map_block_nosplit pid=31146)   self._tensor = np.array([np.asarray(v) for v in values])
(_map_block_nosplit pid=31146) 2022-09-04 16:31:45,179	INFO worker.py:754 -- Task failed with retryable exception: TaskID(85e1c1d08ad412b6ffffffffffffffffffffffff01000000).
(_map_block_nosplit pid=31146) Traceback (most recent call last):
(_map_block_nosplit pid=31146)   File "/opt/hostedtoolcache/Python/3.9.13/x64/lib/python3.9/site-packages/ray/air/util/data_batch_conversion.py", line 158, in _cast_ndarray_columns_to_tensor_extension
(_map_block_nosplit pid=31146)     df.loc[:, col_name] = TensorArray(col)
(_map_block_nosplit pid=31146)   File "/opt/hostedtoolcache/Python/3.9.13/x64/lib/python3.9/site-packages/ray/air/util/tensor_extensions/pandas.py", line 720, in __init__
(_map_block_nosplit pid=31146)     raise TypeError(
(_map_block_nosplit pid=31146) TypeError: Tried to convert an ndarray of ndarray pointers (object dtype) to a well-typed ndarray but this failed; convert the ndarray to a well-typed ndarray before casting it as a TensorArray, and note that ragged tensors are NOT supported by TensorArray. First 5 subndarray types: [dtype('uint8'), dtype('uint8'), dtype('uint8'), dtype('uint8'), dtype('uint8')]
(_map_block_nosplit pid=31146) 
(_map_block_nosplit pid=31146) The above exception was the direct cause of the following exception:
(_map_block_nosplit pid=31146) 
(_map_block_nosplit pid=31146) Traceback (most recent call last):
(_map_block_nosplit pid=31146)   File "python/ray/_raylet.pyx", line 715, in ray._raylet.execute_task
(_map_block_nosplit pid=31146)   File "python/ray/_raylet.pyx", line 719, in ray._raylet.execute_task
(_map_block_nosplit pid=31146)   File "/opt/hostedtoolcache/Python/3.9.13/x64/lib/python3.9/site-packages/ray/data/_internal/compute.py", line 449, in _map_block_nosplit
(_map_block_nosplit pid=31146)     for new_block in block_fn(block, *fn_args, **fn_kwargs):
(_map_block_nosplit pid=31146)   File "/opt/hostedtoolcache/Python/3.9.13/x64/lib/python3.9/site-packages/ray/data/dataset.py", line 482, in transform
(_map_block_nosplit pid=31146)     yield output_buffer.next()
(_map_block_nosplit pid=31146)   File "/opt/hostedtoolcache/Python/3.9.13/x64/lib/python3.9/site-packages/ray/data/_internal/output_buffer.py", line 74, in next
(_map_block_nosplit pid=31146)     block = self._buffer.build()
(_map_block_nosplit pid=31146)   File "/opt/hostedtoolcache/Python/3.9.13/x64/lib/python3.9/site-packages/ray/data/_internal/delegating_block_builder.py", line 64, in build
(_map_block_nosplit pid=31146)     return self._builder.build()
(_map_block_nosplit pid=31146)   File "/opt/hostedtoolcache/Python/3.9.13/x64/lib/python3.9/site-packages/ray/data/_internal/table_block.py", line 85, in build
(_map_block_nosplit pid=31146)     return self._concat_tables(tables)
(_map_block_nosplit pid=31146)   File "/opt/hostedtoolcache/Python/3.9.13/x64/lib/python3.9/site-packages/ray/data/_internal/pandas_block.py", line 110, in _concat_tables
(_map_block_nosplit pid=31146)     df = _cast_ndarray_columns_to_tensor_extension(df)
(_map_block_nosplit pid=31146)   File "/opt/hostedtoolcache/Python/3.9.13/x64/lib/python3.9/site-packages/ray/air/util/data_batch_conversion.py", line 160, in _cast_ndarray_columns_to_tensor_extension
(_map_block_nosplit pid=31146)     raise ValueError(
(_map_block_nosplit pid=31146) ValueError: Tried to cast column value to the TensorArray tensor extension type but the conversion failed. To disable automatic casting to this tensor extension, set ctx = DatasetContext.get_current(); ctx.enable_tensor_extension_casting = False.

https://github.com/ludwig-ai/ludwig/runs/8177672264?check_suite_focus=true#step:10:7153

For some reason this does not repro locally, so could be an issue with different versions of pyarrow or another dependency.

@tgaddair tgaddair added the tests Issue with the tests label Sep 5, 2022
@arnavgarg1
Copy link
Contributor

Seems to be okay on the CI now - will investigate this if it starts showing up again

@arnavgarg1
Copy link
Contributor

Seems like the issue is back, will investigate: https://github.com/ludwig-ai/ludwig/runs/8273624112?check_suite_focus=true

@tgaddair
Copy link
Collaborator Author

tgaddair commented Sep 9, 2022

@arnavgarg1 the CI was okay because the test is being skipped on nightly. We still need to fix it.

It's curious that now it's showing up for ray 2.0 tests as well all of a sudden.

@arnavgarg1 arnavgarg1 linked a pull request Sep 14, 2022 that will close this issue
@tgaddair tgaddair reopened this Sep 25, 2022
@tgaddair
Copy link
Collaborator Author

@arnavgarg1 re-opening this issue to track.

@arnavgarg1
Copy link
Contributor

@tgaddair Thanks for re-opening the issue. It seems like this happens because of the non-determinism of "nan_percentage", particularly in cases where the last row in a partition is NaN since the missing value strategy is bfill. The result is that all other NaNs get filled except for the last row, and that results in the creation of ragged tensors.

I'll create a fix that involves ensuring that the last row of our random sampling isn't a NaN so that this situation is avoided. Might be worth calling out in our documentation somewhere as well since this can cause other errors downstream beyond our tests.

@arnavgarg1
Copy link
Contributor

Actually, the better way might be to do a bfill followed by ffill, or ffill followed by bfill to ensure there's never any NaNs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tests Issue with the tests
Projects
None yet
2 participants