Skip to content

[Datasets] LazyBlocklist split fails to split heteroeneous list #32950

@jianoaix

Description

@jianoaix

On Ray master, the following will fail:

import ray

inputs = ["example://iris.csv"] * 100
ds = ray.data.read_csv(inputs, parallelism=10)
ds.schema()
ds._plan._in_blocks.split(2)

Error message:

Traceback (most recent call last):
  File "test_split.py", line 6, in <module>
    ds._plan._in_blocks.split(2)
  File "/home/ubuntu/ray/python/ray/data/_internal/lazy_block_list.py", line 165, in split
    cached_metadata = np.array_split(self._cached_metadata, num_splits)
  File "<__array_function__ internals>", line 200, in array_split
  File "/home/ubuntu/.local/lib/python3.8/site-packages/numpy/lib/shape_base.py", line 786, in array_split
    sary = _nx.swapaxes(ary, axis, 0)
  File "<__array_function__ internals>", line 200, in swapaxes
  File "/home/ubuntu/.local/lib/python3.8/site-packages/numpy/core/fromnumeric.py", line 594, in swapaxes
    return _wrapfunc(a, 'swapaxes', axis1, axis2)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/numpy/core/fromnumeric.py", line 54, in _wrapfunc
    return _wrapit(obj, method, *args, **kwds)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/numpy/core/fromnumeric.py", line 43, in _wrapit
    result = getattr(asarray(obj), method)(*args, **kwds)
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (10,) + inhomogeneous part.

Metadata

Metadata

Assignees

Labels

bugSomething that is supposed to be working; but isn't

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions