You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
hello,
I am trying to download the openwebtext dataset from huggingface, but I keep getting the following error:
Downloading data: 100%|________________________________________________________________________________________________________________| 12.9G/12.9G [25:43<00:00, 8.35MB/s]
/home/nlp/sloboda1/anaconda3/envs/ssdlm/lib/python3.8/site-packages/datasets/download/download_manager.py:527: FutureWarning: 'num_proc' was deprecated in version 2.6.2 and will be removed in 3.0.0. Pass `DownloadConfig(num_proc=<num_proc>)` to the initializer instead.
warnings.warn(
Extracting data files: 100%|________________________________________________________________________________________________________| 20610/20610 [9:43:42<00:00, 1.70s/it]
Traceback (most recent call last):
File "ssd_process_data.py", line 485, in <module>
main()
File "ssd_process_data.py", line 369, in main
raw_datasets["train"] = load_dataset(
File "/home/nlp/sloboda1/anaconda3/envs/ssdlm/lib/python3.8/site-packages/datasets/load.py", line 1782, in load_dataset
builder_instance.download_and_prepare(
File "/home/nlp/sloboda1/anaconda3/envs/ssdlm/lib/python3.8/site-packages/datasets/builder.py", line 872, in download_and_prepare
self._download_and_prepare(
File "/home/nlp/sloboda1/anaconda3/envs/ssdlm/lib/python3.8/site-packages/datasets/builder.py", line 1649, in _download_and_prepare
super()._download_and_prepare(
File "/home/nlp/sloboda1/anaconda3/envs/ssdlm/lib/python3.8/site-packages/datasets/builder.py", line 985, in _download_and_prepare
verify_splits(self.info.splits, split_dict)
File "/home/nlp/sloboda1/anaconda3/envs/ssdlm/lib/python3.8/site-packages/datasets/utils/info_utils.py", line 100, in verify_splits
raise NonMatchingSplitsSizesError(str(bad_splits))
datasets.utils.info_utils.NonMatchingSplitsSizesError: [{'expected': SplitInfo(name='train', num_bytes=39769494896, num_examples=8013769, shard_lengths=None, dataset_name=None), 'recorded': SplitInfo(name='train', num_bytes=39769065791, num_examples=8013740, shard_lengths=[101000, 100000, 101000, 101000, 102000, 102000, 101000, 102000, 101000, 101000, 101000, 101000, 101000, 102000, 101000, 101000, 101000, 101000, 102000, 102000, 100000, 101000, 100000, 101000, 102000, 101000, 102000, 101000, 102000, 102000, 102000, 101000, 101000, 101000, 101000, 102000, 101000, 102000, 101000, 101000, 100000, 101000, 101000, 101000, 101000, 101000, 101000, 101000, 101000, 101000, 101000, 100000, 101000, 102000, 101000, 101000, 101000, 101000, 101000, 102000, 102000, 101000, 102000, 101000, 102000, 102000, 101000, 101000, 102000, 102000, 102000, 101000, 102000, 102000, 102000, 101000, 101000, 102000, 101000, 13740], dataset_name='openwebtext')}]
I have tried forcing the redownloading of the dataset by passing the download_mode="force_redownload" parameter, but it yield the same error.
I have also tried passing the ignore_verifications=True parameter, but this in turn yielded the following error:
raw_datasets["train"] = load_dataset(
File "/home/nlp/sloboda1/anaconda3/envs/ssdlm/lib/python3.8/site-packages/datasets/load.py", line 1754, in load_dataset
verification_mode = VerificationMode(
File "/home/nlp/sloboda1/anaconda3/envs/ssdlm/lib/python3.8/enum.py", line 339, in __call__
return cls.__new__(cls, value)
File "/home/nlp/sloboda1/anaconda3/envs/ssdlm/lib/python3.8/enum.py", line 663, in __new__
raise ve_exc
ValueError: 'none' is not a valid VerificationMode
Has anyone encountered such a problem, or knows what I can do?
The text was updated successfully, but these errors were encountered:
hello,
I am trying to download the openwebtext dataset from huggingface, but I keep getting the following error:
I have tried forcing the redownloading of the dataset by passing the download_mode="force_redownload" parameter, but it yield the same error.
I have also tried passing the
ignore_verifications=True
parameter, but this in turn yielded the following error:Has anyone encountered such a problem, or knows what I can do?
The text was updated successfully, but these errors were encountered: