-
Notifications
You must be signed in to change notification settings - Fork 7.1k
Closed
Labels
P1Issue that should be fixed within a few weeksIssue that should be fixed within a few weeksbugSomething that is supposed to be working; but isn'tSomething that is supposed to be working; but isn'tcommunity-backlogdataRay Data-related issuesRay Data-related issuesusability
Description
What happened + What you expected to happen
Daft code
import daft
import daft.expressions as exp
# Construct a simple Daft dataframe with a string column
df = daft.from_pydict({
"name": ["Alice", "Bob", "Charlie"]
})
print("Original:")
df.show()
# Append a literal string "_X" to each item in the "name" column
df2 = df.with_column(
"name_with_suffix",
df["name"] + exp.lit("_X")
)
print("Modified:")
df2.show()Output:
Original:
╭─────────╮
│ name │
│ --- │
│ Utf8 │
╞═════════╡
│ Alice │
├╌╌╌╌╌╌╌╌╌┤
│ Bob │
├╌╌╌╌╌╌╌╌╌┤
│ Charlie │
╰─────────╯
(Showing first 3 of 3 rows)
Modified:
╭─────────┬──────────────────╮
│ name ┆ name_with_suffix │
│ --- ┆ --- │
│ Utf8 ┆ Utf8 │
╞═════════╪══════════════════╡
│ Alice ┆ Alice_X │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ Bob ┆ Bob_X │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ Charlie ┆ Charlie_X │
╰─────────┴──────────────────╯
(Showing first 3 of 3 rows)Ray Data:
import ray
from ray.data import from_items
from ray.data.expressions import col
ray.init()
# Construct a dataset with a string column
ds = from_items([
{"name": "Alice"},
{"name": "Bob"},
{"name": "Charlie"},
])
print("Original:")
ds.show()
# Append "_X" to each value in the "name" column using with_column
ds2 = ds.with_column(
"name_with_suffix",
col("name") + "_X"
)
print("Modified:")
ds2.show()Traceback:
Traceback (most recent call last):
File "/home/admin/test/t1.py", line 24, in <module>
ds2.show()
File "/home/admin/venv/ray/lib/python3.11/site-packages/ray/data/dataset.py", line 3373, in show
for row in self.take(limit):
^^^^^^^^^^^^^^^^
File "/home/admin/venv/ray/lib/python3.11/site-packages/ray/data/dataset.py", line 3295, in take
for row in limited_ds.iter_rows():
File "/home/admin/venv/ray/lib/python3.11/site-packages/ray/data/iterator.py", line 246, in _wrapped_iterator
for batch in batch_iterable:
File "/home/admin/venv/ray/lib/python3.11/site-packages/ray/data/iterator.py", line 190, in _create_iterator
) = self._to_ref_bundle_iterator()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/admin/venv/ray/lib/python3.11/site-packages/ray/data/_internal/iterator/iterator_impl.py", line 27, in _to_ref_bundle_iterator
ref_bundles_iterator, stats = self._base_dataset._execute_to_iterator()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/admin/venv/ray/lib/python3.11/site-packages/ray/data/dataset.py", line 6293, in _execute_to_iterator
bundle_iter, stats, executor = self._plan.execute_to_iterator()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/admin/venv/ray/lib/python3.11/site-packages/ray/data/exceptions.py", line 87, in handle_trace
raise e.with_traceback(None)
ray.exceptions.RayTaskError(UserCodeException): ray::Project() (pid=61248, ip=127.0.0.1)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/admin/venv/ray/lib/python3.11/site-packages/ray/data/_expression_evaluator.py", line 90, in eval_expr
return _eval_expr_recursive(expr, batch, _ARROW_EXPR_OPS_MAP)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/admin/venv/ray/lib/python3.11/site-packages/ray/data/_expression_evaluator.py", line 62, in _eval_expr_recursive
return ops[expr.op](
^^^^^^^^^^^^^
File "/home/admin/venv/ray/lib/python3.11/site-packages/pyarrow/compute.py", line 252, in wrapper
return func.call(args, None, memory_pool)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "pyarrow/_compute.pyx", line 399, in pyarrow._compute.Function.call
File "pyarrow/error.pxi", line 155, in pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/error.pxi", line 92, in pyarrow.lib.check_status
pyarrow.lib.ArrowNotImplementedError: Function 'add' has no kernel matching input types (string, string)
The above exception was the direct cause of the following exception:
ray::Project() (pid=61248, ip=127.0.0.1)
File "/home/admin/venv/ray/lib/python3.11/site-packages/ray/data/_internal/execution/operators/map_operator.py", line 556, in _map_task
for b_out in map_transformer.apply_transform(iter(blocks), ctx):
File "/home/admin/venv/ray/lib/python3.11/site-packages/ray/data/_internal/execution/operators/map_transformer.py", line 528, in __call__
for data in iter:
File "/home/admin/venv/ray/lib/python3.11/site-packages/ray/data/_internal/execution/operators/map_transformer.py", line 373, in __call__
yield from self._block_fn(input, ctx)
File "/home/admin/venv/ray/lib/python3.11/site-packages/ray/data/_internal/planner/plan_udf_map_op.py", line 591, in transform_fn
out_block = fn(block)
^^^^^^^^^
File "/home/admin/venv/ray/lib/python3.11/site-packages/ray/data/_internal/planner/plan_udf_map_op.py", line 152, in fn
_try_wrap_udf_exception(e)
File "/home/admin/venv/ray/lib/python3.11/site-packages/ray/data/_internal/planner/plan_udf_map_op.py", line 410, in _try_wrap_udf_exception
raise UserCodeException("UDF failed to process a data block.") from e
ray.exceptions.UserCodeException: UDF failed to process a data block.Versions / Dependencies
Ray nightly build: https://github.com/ray-project/ray/commits/f896b686f52fa46ec2ab2d75d4660b99b3f54c3f
Reproduction script
import ray
from ray.data import from_items
from ray.data.expressions import col
ray.init()
# Construct a dataset with a string column
ds = from_items([
{"name": "Alice"},
{"name": "Bob"},
{"name": "Charlie"},
])
print("Original:")
ds.show()
# Append "_X" to each value in the "name" column using with_column
ds2 = ds.with_column(
"name_with_suffix",
col("name") + "_X"
)
print("Modified:")
ds2.show()Issue Severity
None
Metadata
Metadata
Assignees
Labels
P1Issue that should be fixed within a few weeksIssue that should be fixed within a few weeksbugSomething that is supposed to be working; but isn'tSomething that is supposed to be working; but isn'tcommunity-backlogdataRay Data-related issuesRay Data-related issuesusability