-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/improvements batch #1181
Feature/improvements batch #1181
Conversation
1. apply_array_func for applying array operations recursively. Use it in to_numpy and to_torch 2. isnull, hasnull, dropnull 3. set_array_at_key for setting a subarray at a desired index inplace Added extensive tests for the new methods
A typo led to None instead of arr being returned
@dantp-ai Pls have a look if you have time, you also worked a lot with Batch in the past |
When review is done I'd extend the PR description and add a changelog entry. With this and buffer improvements merged, I think we should release version 1.1 |
Removed part of the tests of cat_ that were handling incompatible batches
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
quick skim, lgtm; haven't looked into detail
needed now due to the stricter Batch.cat_
Thanks for the review @Trinkle23897 ! Some more fixes in tests were needed, now CI should run through. I'd merge then |
I'll merge now and will update the changelog later |
* Seems that batch slicing leads to slightly different floats some of the time (See: thu-ml#1181)
This PR contains several important extensions and improvements of Batch.
isnull
,hasnull
anddropnull
, which helps finding errors early.schema
from a batch. This was used inBatch.cat_
to perform additional input validation (we now make sure there that the structures are the same when concatenating batches). This input validation is a breaking change! Some tests that concatenated incompatible batches were removed. Eventually, we can add aget_schema
method to the batch that will retrieve metainfo like shapes and datatypes. For now, this is delegated to the user who can use the newapply_values_transform
Distribution
which were not properly sliced, inviting bugs and errors in user code. This is now fixed - albeit not in a pretty way (torch doesn't allow slicing the objects natively)The new code was extensively tested and documented.