-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Batch: don't create new objects on getitem #1086
Comments
I can look into this. I think the example above has a small typo. It should be |
Yes, you're right about the typo. From all batch issues this might be the hardest one. I'm not sure how it can be solved at all, tbh. |
Interesting:
|
Even more confusing, since for batches with only subbatches getitem does work as expected, but if a sequence is involved it creates a new object: b = Batch(a=[1, 2, 3])
b[0] == b[0]
>>> False |
Note that if there is a solution, it should also work for slices. Right now b[:2] == b[:2]
>>> False One idea: we likely can't make it return the same object, but we could add |
Yes, seems to be quite involving at this point. I wonder how
Yes, this sounds good. I'll try this out. I don't think it would hurt later if we do find a solution for the object equality. |
As I found out just now, python's own list actually cannot do this, so
Since |
Huh, actually, I was slightly wrong but in a weird way. There seems some magic happening when a var is assigned to id of a list view.. Anyhow, the id of python list slices is not completely fixed |
Closes: #1086 ### Api Extensions - Batch received new method: `to_numpy_`. #1098 - `to_dict` in Batch supports also non-recursive conversion. #1098 - Batch `__eq__` now implemented, semantic equality check of batches is now possible. #1098 ### Breaking Changes - The method `to_numpy` in `data.utils.batch.Batch` is not in-place anymore. Instead, a new method `to_numpy_` does the conversion in-place. #1098
For reference: the objects returned on getitem still have different ids. This issue was resolved by implementing |
I just had the case where I wanted to compare two batches that contained torch distributions logged during the training process. This comparison fails with a |
Thx for spotting it! It should indeed work. There are some tests that cover this, but as I was digging into it I noticed that it fails for some other cases, e.g:
I will look into it asap. I apologize for the inconvenience. EDIT:
|
@maxhuettenrauch So far it seems that the issue is when dealing with zero-dimensional arrays. To remain flexible wrt to DeepDiff's, I suggest that we perform an additional processing step in |
In the last months I implemented a lot of helper things that also could help with this issue. Gonna open a PR tomorrow and assign you two as reviewers |
@MischaPanch Should I go ahead with the proposal above? Or does one of your helper methods already cover this edge case? |
@MischaPanch I experimented today with the new Batch API (#1181), specifically |
Currently
Batch.__getitem__
will always create a new object. This is counterintuitive and destroys equality checks. E.g.,will result in
id1 != id2
, which leads tob[0] == b[0]
beingFalse
Related to #922
The text was updated successfully, but these errors were encountered: