polina's comments

huggingface · Dec 15, 2022 · 00958f2 · 00958f2 · github-actions · Dec 15, 2022
1 parent 11a2158
commit 00958f2
Show file tree

Hide file tree

Showing 2 changed files with 3 additions and 3 deletions.
diff --git a/docs/source/use_with_pytorch.mdx b/docs/source/use_with_pytorch.mdx
@@ -150,7 +150,7 @@ Like `torch.utils.data.Dataset` objects, a [`Dataset`] can be passed directly to
 ### Optimize data loading
 
 There are several ways you can increase the speed your data is loaded which can save you time, especially if you are working with large datasets.
-PyTorch offers parallelized data loading, retrieving batches of indices instead of individually, and streaming to progressively download datasets.
+PyTorch offers parallelized data loading, retrieving batches of indices instead of individually, and streaming to iterate over the dataset without downloading it on disk.
 
 #### Use multiple Workers
 
@@ -200,7 +200,7 @@ You must use a `BatchSampler` if you want the transform to be given full batches
 
 ### Stream data
 
-Loading a dataset in streaming mode is useful to progressively download the data you need while iterating over the dataset.
+Loading a dataset in streaming mode allows one to iterate over the dataset without downloading it on disk.
 An iterable dataset from `datasets` inherits from `torch.utils.data.IterableDataset` so you can pass it to a `DataLoader`:
 
 ```py

diff --git a/src/datasets/iterable_dataset.py b/src/datasets/iterable_dataset.py
@@ -727,7 +727,7 @@ class ShufflingConfig:
 
 
 def _maybe_add_torch_iterable_dataset_parent_class(cls):
-    """Add torch.utils.data.IterableDataset as a parent class if 'torch' is imported"""
+    """Add torch.utils.data.IterableDataset as a parent class if 'torch' is available"""
     if config.TORCH_AVAILABLE:
         import torch.utils.data