Skip to content

Commit

Permalink
Fix the document of buffer_size in max_token_bucketize (pytorch#834)
Browse files Browse the repository at this point in the history
Summary:
This PR would fix a document issue in bucketbatcher.py
Fixes pytorch#831

Pull Request resolved: pytorch#834

Reviewed By: NivekT

Differential Revision: D40430887

Pulled By: ejguan

fbshipit-source-id: e132a3a24e8d09815c36bba3ccd4ffaced7b17d4
  • Loading branch information
ling0322 authored and ejguan committed Oct 21, 2022
1 parent 2cb957c commit eaec62c
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion torchdata/datapipes/iter/transform/bucketbatcher.py
Original file line number Diff line number Diff line change
Expand Up @@ -220,7 +220,7 @@ class MaxTokenBucketizerIterDataPipe(IterDataPipe[DataChunk[T_co]]):
len_fn: Function to be applied to each element to get lengths. ``len(data)`` is used by default.
min_len: Optional minimum length to be included into each batch
max_len: Optional maximum length to be included into each batch.
buffer_size: This restricts how many tokens are taken from prior DataPipe to bucketize
buffer_size: This restricts how many samples are taken from prior DataPipe to bucketize
include_padding: If True, the size of each batch includes the extra padding to the largest length in the batch.
Example:
Expand Down

0 comments on commit eaec62c

Please sign in to comment.