-
Notifications
You must be signed in to change notification settings - Fork 22.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DataPipe] Correct the type of exception that is being raised by ShufflerMapDataPipe #82666
Conversation
[ghstack-poisoned]
🔗 Helpful links
✅ No Failures (3 Pending)As of commit 8d9ac2f (more details on the Dr. CI page): Expand to see more💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Please report bugs/suggestions to the (internal) Dr. CI Users group. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then, why not change Shuffler
to raise IndexError
when key
doesn't exist?
I have considered that and I'm mostly indifferent. I found the definitions:
The exception comes from More practically, I worry that if there is another MapDataPipe that raises |
Even though we don't necessarily think of class DP(MapDataPipe):
def __getitem__(self, index):
return self.dp[index]
def __len__(self):
return len(self.dp)
dp = SequenceWrapper(...).shuffle()
dp = DP(dp)
list(dp) I would expect
I think this would be more dangerous as such WDYT? |
If we leave This is because we aren't calling |
Yeah. Here is the link from Python official about I would expect |
Okay if we treat |
Fixes pytorch/data#708 The following code snippet used to fail, now it has been added as a test case: ```python dp1 = dp.map.SequenceWrapper(range(10)) shuffle_dp1 = dp1.shuffle() dp2 = dp.map.SequenceWrapper(range(10)) shuffle_dp2 = dp2.shuffle() zip_dp = shuffle_dp1.zip(shuffle_dp2) list(zip_dp) # This used to fail ``` The issue was that `ShufflerMapDataPipe` raises a `KeyError` when an out of bound index is passed into it, but that was not handled by `zip_dp`'s `__getitem__` which only handled `IndexError`. With this change, it handles both. [ghstack-poisoned]
ghstack-source-id: 05929c9b4dbe55d2968c8123217dc6a27775be6d Pull Request resolved: #82666
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks
@pytorchbot merge |
@pytorchbot successfully started a merge job. Check the current status here |
…flerMapDataPipe (#82666) (#82666) Summary: Fixes pytorch/data#708 The following code snippet used to fail, now it has been added as a test case: ```python dp1 = dp.map.SequenceWrapper(range(10)) shuffle_dp1 = dp1.shuffle() dp2 = dp.map.SequenceWrapper(range(10)) shuffle_dp2 = dp2.shuffle() zip_dp = shuffle_dp1.zip(shuffle_dp2) list(zip_dp) # This used to fail ``` The issue was that `ShufflerMapDataPipe` raises a `KeyError` when an out of bound index is passed into it, but that was not handled by `zip_dp`'s `__getitem__` which only handled `IndexError`. With this change, it handles both. Pull Request resolved: #82666 Approved by: https://github.com/ejguan Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/14b660fcc0c32a7478c69a25a253b67d0ed36364 Reviewed By: kit1980 Differential Revision: D38424978 Pulled By: NivekT fbshipit-source-id: f3de6e0bb0d74249b51dd77f7a58de4ac35c4e2e
Stack from ghstack:
Fixes pytorch/data#708
The following code snippet used to fail, now it has been added as a test case:
The issue was that
ShufflerMapDataPipe
raises aKeyError
when an out of bound index is passed into it, but that was not handled byzip_dp
's__getitem__
which only handledIndexError
. With this change, it handles both.