-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
apply formatting after iter_arrow to speed up format -> map, filter f…
…or iterable datasets (#7207) * apply formatting after iter_arrow * add support for formatting to map iteration * formatted iterator for filter * fix filtered formatting * option to disable formatting for outputs of map * remove format_outputs kwarg * rename batched_examples_iterator -> inputs_iterator * support arbitrary input formatting in filtered examples iterable iter arrow * preserve formatting on filtered shuffle * pass token_per_repo_id to python_feature_decoder in formatters * implement FormattedExamplesIterator * fix formatted examples iterable * restore is_typed property * pass formatting config to formatted examples iterable * fix formatter init * map examples iterable expects to receive rebatchedarrowexamplesiterable instance * only apply features if they exist * fix shuffle and shard * remove formatting from FilteredExamplesIterable * run pre commit * filtered iter_arrow always allowed if available * filtered examples iterable needs formatting when iter_arrow enabled * only iter arrow on filter if formatting is set * add features property to support feature inference * fix features property * dont re-encode featuers * avoid re-encoding outputs of map * map should not preserve formatting * update comment * update map features property * return bool for mapped ex iterable is typed * pass return features to mapped exampels iterable constructor * don't iter arrow with formatted filter to avoid re formatting * avoid re-formatting data * rename return features -> features * update refs to return_features * decode features in batched map * preserve formatting in with_format * fix features (mapped ex iterable * remove formatted examples iterable from with_format * avoid reapplying features when chaining filter, map * preserve formatting in map * fix tests * style * fix tests --------- Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com> Co-authored-by: Quentin Lhoest <lhoest.q@gmail.com>
- Loading branch information
1 parent
7a1a84b
commit 75e61d1
Showing
7 changed files
with
239 additions
and
74 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.