Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

incompatibily issue when using load_dataset with datasets==3.0.1 #7238

Open
jupiterMJM opened this issue Oct 18, 2024 · 2 comments
Open

incompatibily issue when using load_dataset with datasets==3.0.1 #7238

jupiterMJM opened this issue Oct 18, 2024 · 2 comments

Comments

@jupiterMJM
Copy link

Describe the bug

There is a bug when using load_dataset with dataset version at 3.0.1 .
Please see below in the "steps to reproduce the bug".
To resolve the bug, I had to downgrade to version 2.21.0
OS: Ubuntu 24 (AWS instance)
Python: same bug under 3.12 and 3.10

The error I had was:
Traceback (most recent call last):
File "", line 1, in
File "/home/ubuntu/miniconda3/envs/maxence_env/lib/python3.10/site-packages/datasets/load.py", line 2096, in load_dataset
builder_instance.download_and_prepare(
File "/home/ubuntu/miniconda3/envs/maxence_env/lib/python3.10/site-packages/datasets/builder.py", line 924, in download_and_prepare
self._download_and_prepare(
File "/home/ubuntu/miniconda3/envs/maxence_env/lib/python3.10/site-packages/datasets/builder.py", line 1647, in _download_and_prepare
super()._download_and_prepare(
File "/home/ubuntu/miniconda3/envs/maxence_env/lib/python3.10/site-packages/datasets/builder.py", line 977, in _download_and_prepare
split_generators = self._split_generators(dl_manager, split_generators_kwargs)
File "/home/ubuntu/.cache/huggingface/modules/datasets_modules/datasets/mozilla-foundation--common_voice_6_0/cb17afd34f5799f97e8f48398748f83006335b702bd785f9880797838d541b81/common_voice_6_0.py", line 159, in _split_generators
archive_path = dl_manager.download(self._get_bundle_url(self.config.name, bundle_url_template))
File "/home/ubuntu/miniconda3/envs/maxence_env/lib/python3.10/site-packages/datasets/download/download_manager.py", line 150, in download
download_config = self.download_config.copy()
File "/home/ubuntu/miniconda3/envs/maxence_env/lib/python3.10/site-packages/datasets/download/download_config.py", line 73, in copy
return self.class(
{k: copy.deepcopy(v) for k, v in self.dict.items()})
TypeError: DownloadConfig.init() got an unexpected keyword argument 'ignore_url_params'

Steps to reproduce the bug

  1. install dataset with pip install datasets --upgrade
  2. launch python; from datasets import loaad_dataset
  3. run load_dataset("mozilla-foundation/common_voice_6_0")
  4. exit python
  5. uninstall datasets; then pip install datasets==2.21.0
  6. launch python; from datasets import loaad_dataset
  7. run load_dataset("mozilla-foundation/common_voice_6_0")
  8. Everything runs great now

Expected behavior

Be able to download a dataset without error

Environment info

Copy-and-paste the text below in your GitHub issue.

  • datasets version: 3.0.1
  • Platform: Linux-6.8.0-1017-aws-x86_64-with-glibc2.39
  • Python version: 3.12.4
  • huggingface_hub version: 0.26.0
  • PyArrow version: 17.0.0
  • Pandas version: 2.2.3
  • fsspec version: 2024.6.1
@huongngo-8
Copy link

Hi! I'm also getting the same issue - have you been able to find a solution to this?

@jupiterMJM
Copy link
Author

From what I remember, I stayed at the "downgraded" version of dataset (2.21.0)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants