Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added slice + flatten for iterable datapipes #730

Closed
wants to merge 3 commits into from
Closed

Conversation

dbish
Copy link
Contributor

@dbish dbish commented Aug 13, 2022

Please read through our contribution guide prior to
creating your pull request.

  • Note that there is a section on requirements related to adding a new DataPipe.

Fixes #656

Changes

  • added new iterdatapipe called islice for selecting a slice of an iterable item returned (ex: list, tuple, dict)
  • added indexing as part of this islice functionality, allowing a list of indexes (or keys) to also be used in place of a slice for selection
  • also added flatten iterdatapipe to allow for an item in an iterable to be flattened one level if an index or list of indices are provided
  • added default for flatten (no args) that yields all items for all iterables in the provided datapipe chained together

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 13, 2022
@dbish dbish self-assigned this Aug 13, 2022
@dbish dbish changed the title added islice for iterable datapipes added islice + flatten for iterable datapipes Aug 13, 2022
@dbish dbish changed the title added islice + flatten for iterable datapipes [WIP] added islice + flatten for iterable datapipes Aug 13, 2022
@dbish dbish marked this pull request as draft August 13, 2022 07:15
@dbish dbish marked this pull request as ready for review August 16, 2022 05:46
@dbish dbish changed the title [WIP] added islice + flatten for iterable datapipes added islice + flatten for iterable datapipes Aug 16, 2022
@dbish dbish changed the title added islice + flatten for iterable datapipes added slice + flatten for iterable datapipes Aug 16, 2022
Copy link
Contributor

@NivekT NivekT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chatted with Vitaly and understand what the intention is with the default behavior when nothing is passed in.

We also chatted about changing the name from flatten to flatten_sample but that is probably too much and potentially more confusing.

I would recommend:

  1. Being explicit in the docstring about the flatten operation is being applied at the element/sample level.
  2. Adding one example in docstring with the default behavior when nothing is passed in. This can be copied over from one of the unit tests.

Otherwise, LGTM!

@facebook-github-bot
Copy link
Contributor

@dbish has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

slice_dp = input_dp.slice(0, 2)
self.assertEqual([[0, 1], [3, 4], [6, 7]], list(slice_dp))

# Functional Test: filter with list of indices for list
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as previous lines?

Copy link
Contributor

@VitalyFedyunin VitalyFedyunin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall good.

As part of the exercise you can consider including assertWarns tests for borderline situations:

https://docs.python.org/3/library/unittest.html#unittest.TestCase.assertWarns

input_dp = IterableWrapper([["zero", ["one", "2"]], ["3", ["4", "5"]], ["6", ["7", "8"]]])
flatten_dp = input_dp.flatten()
self.assertEqual([["zero", "one", "2"], ["3", "4", "5"], ["6", "7", "8"]], list(flatten_dp))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Probably include test and mention in the doc, that we are not supporting nesting levels > 1

@facebook-github-bot
Copy link
Contributor

@dbish has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@andrewkho andrewkho deleted the dbish-islice branch September 26, 2024 04:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ability to manipulate columns and fields
5 participants