Skip to content

[Ray Data] with_column concat str got pyarrow.lib.ArrowNotImplementedError: Function 'add' has no kernel matching input types (string, string) #56572

@codingl2k1

Description

@codingl2k1

What happened + What you expected to happen

Daft code

import daft
import daft.expressions as exp

# Construct a simple Daft dataframe with a string column
df = daft.from_pydict({
    "name": ["Alice", "Bob", "Charlie"]
})

print("Original:")
df.show()

# Append a literal string "_X" to each item in the "name" column
df2 = df.with_column(
    "name_with_suffix",
    df["name"] + exp.lit("_X")
)

print("Modified:")
df2.show()

Output:

Original:
╭─────────╮
│ name    │
│ ---     │
│ Utf8    │
╞═════════╡
│ Alice   │
├╌╌╌╌╌╌╌╌╌┤
│ Bob     │
├╌╌╌╌╌╌╌╌╌┤
│ Charlie │
╰─────────╯

(Showing first 3 of 3 rows)
Modified:
╭─────────┬──────────────────╮
│ namename_with_suffix │
│ ------              │
│ Utf8Utf8             │
╞═════════╪══════════════════╡
│ AliceAlice_X          │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ BobBob_X            │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ CharlieCharlie_X        │
╰─────────┴──────────────────╯

(Showing first 3 of 3 rows)

Ray Data:

import ray
from ray.data import from_items
from ray.data.expressions import col

ray.init()

# Construct a dataset with a string column
ds = from_items([
    {"name": "Alice"},
    {"name": "Bob"},
    {"name": "Charlie"},
])

print("Original:")
ds.show()

# Append "_X" to each value in the "name" column using with_column
ds2 = ds.with_column(
    "name_with_suffix",
    col("name") + "_X"
)

print("Modified:")
ds2.show()

Traceback:

Traceback (most recent call last):
  File "/home/admin/test/t1.py", line 24, in <module>
    ds2.show()
  File "/home/admin/venv/ray/lib/python3.11/site-packages/ray/data/dataset.py", line 3373, in show
    for row in self.take(limit):
               ^^^^^^^^^^^^^^^^
  File "/home/admin/venv/ray/lib/python3.11/site-packages/ray/data/dataset.py", line 3295, in take
    for row in limited_ds.iter_rows():
  File "/home/admin/venv/ray/lib/python3.11/site-packages/ray/data/iterator.py", line 246, in _wrapped_iterator
    for batch in batch_iterable:
  File "/home/admin/venv/ray/lib/python3.11/site-packages/ray/data/iterator.py", line 190, in _create_iterator
    ) = self._to_ref_bundle_iterator()
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/admin/venv/ray/lib/python3.11/site-packages/ray/data/_internal/iterator/iterator_impl.py", line 27, in _to_ref_bundle_iterator
    ref_bundles_iterator, stats = self._base_dataset._execute_to_iterator()
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/admin/venv/ray/lib/python3.11/site-packages/ray/data/dataset.py", line 6293, in _execute_to_iterator
    bundle_iter, stats, executor = self._plan.execute_to_iterator()
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/admin/venv/ray/lib/python3.11/site-packages/ray/data/exceptions.py", line 87, in handle_trace
    raise e.with_traceback(None)
ray.exceptions.RayTaskError(UserCodeException): ray::Project() (pid=61248, ip=127.0.0.1)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/admin/venv/ray/lib/python3.11/site-packages/ray/data/_expression_evaluator.py", line 90, in eval_expr
    return _eval_expr_recursive(expr, batch, _ARROW_EXPR_OPS_MAP)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/admin/venv/ray/lib/python3.11/site-packages/ray/data/_expression_evaluator.py", line 62, in _eval_expr_recursive
    return ops[expr.op](
           ^^^^^^^^^^^^^
  File "/home/admin/venv/ray/lib/python3.11/site-packages/pyarrow/compute.py", line 252, in wrapper
    return func.call(args, None, memory_pool)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "pyarrow/_compute.pyx", line 399, in pyarrow._compute.Function.call
  File "pyarrow/error.pxi", line 155, in pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow/error.pxi", line 92, in pyarrow.lib.check_status
pyarrow.lib.ArrowNotImplementedError: Function 'add' has no kernel matching input types (string, string)
The above exception was the direct cause of the following exception:
ray::Project() (pid=61248, ip=127.0.0.1)
  File "/home/admin/venv/ray/lib/python3.11/site-packages/ray/data/_internal/execution/operators/map_operator.py", line 556, in _map_task
    for b_out in map_transformer.apply_transform(iter(blocks), ctx):
  File "/home/admin/venv/ray/lib/python3.11/site-packages/ray/data/_internal/execution/operators/map_transformer.py", line 528, in __call__
    for data in iter:
  File "/home/admin/venv/ray/lib/python3.11/site-packages/ray/data/_internal/execution/operators/map_transformer.py", line 373, in __call__
    yield from self._block_fn(input, ctx)
  File "/home/admin/venv/ray/lib/python3.11/site-packages/ray/data/_internal/planner/plan_udf_map_op.py", line 591, in transform_fn
    out_block = fn(block)
                ^^^^^^^^^
  File "/home/admin/venv/ray/lib/python3.11/site-packages/ray/data/_internal/planner/plan_udf_map_op.py", line 152, in fn
    _try_wrap_udf_exception(e)
  File "/home/admin/venv/ray/lib/python3.11/site-packages/ray/data/_internal/planner/plan_udf_map_op.py", line 410, in _try_wrap_udf_exception
    raise UserCodeException("UDF failed to process a data block.") from e
ray.exceptions.UserCodeException: UDF failed to process a data block.

Versions / Dependencies

Ray nightly build: https://github.com/ray-project/ray/commits/f896b686f52fa46ec2ab2d75d4660b99b3f54c3f

Reproduction script

import ray
from ray.data import from_items
from ray.data.expressions import col

ray.init()

# Construct a dataset with a string column
ds = from_items([
    {"name": "Alice"},
    {"name": "Bob"},
    {"name": "Charlie"},
])

print("Original:")
ds.show()

# Append "_X" to each value in the "name" column using with_column
ds2 = ds.with_column(
    "name_with_suffix",
    col("name") + "_X"
)

print("Modified:")
ds2.show()

Issue Severity

None

Metadata

Metadata

Labels

P1Issue that should be fixed within a few weeksbugSomething that is supposed to be working; but isn'tcommunity-backlogdataRay Data-related issuesusability

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions