Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/improvements batch #1181

Merged
merged 13 commits into from
Jul 31, 2024

Conversation

MischaPanch
Copy link
Collaborator

@MischaPanch MischaPanch commented Jul 29, 2024

This PR contains several important extensions and improvements of Batch.

  1. A method for setting a new sequence. This is a first step towards using code that's less reliant on setting attributes directly. The method also permits setting a subsequence and filling up with default values. This will help ensuring that all entries in a batch are of the same length, something that we should start enforcing soon.
  2. A new method for applying arbitrary transformations to "leaves" in a Batch. This simplifies already existing code and is generally very handy for users
  3. The arbitrary transformations allowed implementing isnull, hasnull and dropnull, which helps finding errors early.
  4. The arbitrary transformations now also allow extracting a schema from a batch. This was used in Batch.cat_ to perform additional input validation (we now make sure there that the structures are the same when concatenating batches). This input validation is a breaking change! Some tests that concatenated incompatible batches were removed. Eventually, we can add a get_schema method to the batch that will retrieve metainfo like shapes and datatypes. For now, this is delegated to the user who can use the new apply_values_transform
  5. New feature: slicing of torch distributions. Previously several batches contained instances of Distribution which were not properly sliced, inviting bugs and errors in user code. This is now fixed - albeit not in a pretty way (torch doesn't allow slicing the objects natively)
  6. Stricter typing and some further minor extensions and simplifications

The new code was extensively tested and documented.

Michael Panchenko added 8 commits July 20, 2024 16:24
1. apply_array_func for applying array operations recursively. Use it in to_numpy and to_torch
2. isnull, hasnull, dropnull
3. set_array_at_key for setting a subarray at a desired index inplace

Added extensive tests for the new methods
A typo led to None instead of arr being returned
@MischaPanch
Copy link
Collaborator Author

@dantp-ai Pls have a look if you have time, you also worked a lot with Batch in the past

@MischaPanch
Copy link
Collaborator Author

When review is done I'd extend the PR description and add a changelog entry. With this and buffer improvements merged, I think we should release version 1.1

Michael Panchenko added 3 commits July 30, 2024 12:49
Copy link
Collaborator

@Trinkle23897 Trinkle23897 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

quick skim, lgtm; haven't looked into detail

Michael Panchenko added 2 commits July 31, 2024 19:14
@MischaPanch
Copy link
Collaborator Author

Thanks for the review @Trinkle23897 !

Some more fixes in tests were needed, now CI should run through. I'd merge then

@MischaPanch
Copy link
Collaborator Author

I'll merge now and will update the changelog later

@MischaPanch MischaPanch merged commit 6154292 into thu-ml:master Jul 31, 2024
4 checks passed
@MischaPanch MischaPanch deleted the feature/improvements-batch branch July 31, 2024 22:54
dantp-ai added a commit to dantp-ai/tianshou that referenced this pull request Aug 2, 2024
 * Seems that batch slicing leads to slightly different floats some of the time
 (See: thu-ml#1181)
MischaPanch pushed a commit that referenced this pull request Aug 2, 2024
Fixes: #1182

Note: Updated `test_batch.test_slice_distribution()` to use allclose
(See: #1181).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants