-
Notifications
You must be signed in to change notification settings - Fork 799
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support PyArrow arrays as tokenizer input #1415
Comments
Would you like to open a PR for this? 🤗 |
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
Marking this issue as a good first issue. If it doesn't get addressed after a while, I'll take a stab at it. |
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
Interested in this. Can someone get it assigned to me? |
@WenheLI Assigned :) |
@mariosasko - Sorry just saw this! Can you guide me how to get started as I am still new to this project! Thanks a lot for your help! |
Sure! The idea is to use the To build the project, check this workflow file, in particular the part that installs the dependencies. |
hello! @WenheLI are you still working on this? |
@shreya-51 - Hi! Sorry for the late reply. And yes, I am still working on that |
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
Most data processing libraries (Datasets, Polars, Pandas, DuckDB, etc.) are integrated with PyArrow, so native (zero-copy if possible) support for PyArrow arrays as input to avoid the unnecessary PyArrow to Python/NumPy conversion (pretty slow for string arrays) would be nice.
PS: PyArrow has recently added support for the PyCapsule interface, which should help with the implementation.
The text was updated successfully, but these errors were encountered: