-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NotADirectoryError while loading the CNN/Dailymail dataset #996
Comments
It is working now, thank you. Should I leave this issue open to address the Quota-exceeded error? |
Yes please. It's been happening several times, we definitely need to address it |
Any updates on this one? I'm facing a similar issue trying to add CelebA. |
I've looked into it and couldn't find a solution. This looks like a Google Drive limitation.. |
The original links are google drive links. Would it be feasible for HF to maintain their own servers for this? Also, I think the same issue must also exist with TFDS. |
It's possible to host data on our side but we should ask the authors. TFDS has the same issue and doesn't have a solution either afaik. |
Okay. I imagine asking every author who shares their dataset on Google Drive will also be cumbersome. |
I am getting this error as well. Is there a fix? |
Not as long as the data is stored on GG drive unfortunately. Hi @JafferWilson is there a download link to get cnn dailymail from another host than GG drive ? To give you some context, this library provides tools to download and process datasets. For CNN DailyMail the data are downloaded from the link you provide on your github repository. Unfortunately because of GG drive quotas, many users are not able to load this dataset. |
The following copy of CNN/DM dataset, fixed the problem for me: |
Downloading and preparing dataset cnn_dailymail/3.0.0 (download: 558.32 MiB, generated: 1.28 GiB, post-processed: Unknown size, total: 1.82 GiB) to /root/.cache/huggingface/datasets/cnn_dailymail/3.0.0/3.0.0/0128610a44e10f25b4af6689441c72af86205282d26399642f7db38fa7535602...
NotADirectoryError Traceback (most recent call last)
in ()
22
23
---> 24 train = load_dataset('cnn_dailymail', '3.0.0', split='train')
25 validation = load_dataset('cnn_dailymail', '3.0.0', split='validation')
26 test = load_dataset('cnn_dailymail', '3.0.0', split='test')
5 frames
/root/.cache/huggingface/modules/datasets_modules/datasets/cnn_dailymail/0128610a44e10f25b4af6689441c72af86205282d26399642f7db38fa7535602/cnn_dailymail.py in _find_files(dl_paths, publisher, url_dict)
132 else:
133 logging.fatal("Unsupported publisher: %s", publisher)
--> 134 files = sorted(os.listdir(top_dir))
135
136 ret_files = []
NotADirectoryError: [Errno 20] Not a directory: '/root/.cache/huggingface/datasets/downloads/1bc05d24fa6dda2468e83a73cf6dc207226e01e3c48a507ea716dc0421da583b/cnn/stories'
The text was updated successfully, but these errors were encountered: