Skip to content
This repository was archived by the owner on Nov 1, 2024. It is now read-only.
This repository was archived by the owner on Nov 1, 2024. It is now read-only.

Natively support creating a TorchArrow column from a numpy array #179

Open
@scotts

Description

@scotts

If users create a column from a Python list, we actually dispatch that directly to C++. For example,

vals = [1, 2, 3, 4, 5]
col = ta.Column(vals, device="cpu")

We dispatch that directly to C++ through pybind11:
https://github.com/facebookresearch/torcharrow/blob/d680bfdc0f6a6bb6c3a29c2a67d62006782d6558/csrc/velox/lib.cpp#L135-L141
However, if a user creates a column from a numpy array, we currently have to handle that (slowly) in Python. For example,

vals = [1, 2, 3, 4, 5]
arr = numpy.array(vals)
col = ta.Colmun(arr, device="cpu")

That will be handled only on the Python side:
https://github.com/facebookresearch/torcharrow/blob/d680bfdc0f6a6bb6c3a29c2a67d62006782d6558/torcharrow/scope.py#L226-L233
We should be able to handle numpy arrays natively in C++; pybind11 already exposes a numpy array type.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions