Fix the document of buffer_size in max_token_bucketize (pytorch#834)

Summary: This PR would fix a document issue in bucketbatcher.py Fixes pytorch#831 Pull Request resolved: pytorch#834 Reviewed By: NivekT Differential Revision: D40430887 Pulled By: ejguan fbshipit-source-id: e132a3a24e8d09815c36bba3ccd4ffaced7b17d4
ejguan · Oct 21, 2022 · eaec62c · eaec62c
1 parent 2cb957c
commit eaec62c
Showing 1 changed file with 1 addition and 1 deletion.
diff --git a/torchdata/datapipes/iter/transform/bucketbatcher.py b/torchdata/datapipes/iter/transform/bucketbatcher.py
@@ -220,7 +220,7 @@ class MaxTokenBucketizerIterDataPipe(IterDataPipe[DataChunk[T_co]]):
         len_fn: Function to be applied to each element to get lengths. ``len(data)`` is used by default.
         min_len: Optional minimum length to be included into each batch
         max_len: Optional maximum length to be included into each batch.
-        buffer_size: This restricts how many tokens are taken from prior DataPipe to bucketize
+        buffer_size: This restricts how many samples are taken from prior DataPipe to bucketize
         include_padding: If True, the size of each batch includes the extra padding to the largest length in the batch.
 
     Example: