-
Notifications
You must be signed in to change notification settings - Fork 297
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot download reazonspeech dataset #1675
Comments
@Triplecq Could you have a look? |
Thanks for the note! I just checked our hosting server on ABCI, and it seems that their service is down: https://abci.ai/en/about_abci/info.html We will also send you a note if we could provide an alternative access to the data. Thank you for your interests and patience. |
I can download the dataset from HF, but I don't know the structure directory of reazon, and I don't know how to store the downloaded data so that it can be properly processed by 'lhotse prepare reazonspeech' |
I have already used the Common Voice recipe for training. It requires a bit of work to combine the ReazonSpeech data with the Common Voice data. For modeling, I am using character-level modeling instead of BPE. |
Hi everyone, I'm trying to download and prepare reazonspeech but I got this error:
2024-07-01 15:17:59 (prepare.sh:37:main) Running prepare.sh
2024-07-01 15:17:59 (prepare.sh:39:main) dl_dir: /home/hoang/PycharmProjects/icefall/egs/reazonspeech/ASR/download
2024-07-01 15:17:59 (prepare.sh:42:main) Stage 0: Download data
2024-07-01 15:18:00,919 INFO [config.py:58] PyTorch version 2.3.1+cpu available.
2024-07-01 15:18:01,070 INFO [reazonspeech.py:98] Downloading ReazonSpeech part: tiny
Downloading builder script: 100%|███████████████████████████████████| 4.74k/4.74k [00:00<00:00, 7.02MB/s]
Downloading readme: 100%|███████████████████████████████████████████| 2.27k/2.27k [00:00<00:00, 4.26MB/s]
Traceback (most recent call last):
File "/home/hoang/miniconda3/envs/icefall/bin/lhotse", line 8, in
sys.exit(cli())
File "/home/hoang/miniconda3/envs/icefall/lib/python3.8/site-packages/click/core.py", line 1157, in call
return self.main(*args, **kwargs)
File "/home/hoang/miniconda3/envs/icefall/lib/python3.8/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
File "/home/hoang/miniconda3/envs/icefall/lib/python3.8/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/hoang/miniconda3/envs/icefall/lib/python3.8/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/hoang/miniconda3/envs/icefall/lib/python3.8/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/hoang/miniconda3/envs/icefall/lib/python3.8/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/home/hoang/miniconda3/envs/icefall/lib/python3.8/site-packages/lhotse/bin/modes/recipes/reazonspeech.py", line 52, in reazonspeech
download_reazonspeech(target_dir, dataset_parts=subset)
File "/home/hoang/miniconda3/envs/icefall/lib/python3.8/site-packages/lhotse/recipes/reazonspeech.py", line 99, in download_reazonspeech
ds = load_dataset(
File "/home/hoang/miniconda3/envs/icefall/lib/python3.8/site-packages/datasets/load.py", line 2616, in load_dataset
builder_instance.download_and_prepare(
File "/home/hoang/miniconda3/envs/icefall/lib/python3.8/site-packages/datasets/builder.py", line 1029, in download_and_prepare
self._download_and_prepare(
File "/home/hoang/miniconda3/envs/icefall/lib/python3.8/site-packages/datasets/builder.py", line 1791, in _download_and_prepare
super()._download_and_prepare(
File "/home/hoang/miniconda3/envs/icefall/lib/python3.8/site-packages/datasets/builder.py", line 1102, in _download_and_prepare
split_generators = self._split_generators(dl_manager, **split_generators_kwargs)
File "/home/hoang/.cache/huggingface/modules/datasets_modules/datasets/reazon-research--reazonspeech/e16d1ee2aae813b6ea960f564f4dc8481f58bfa6074be491eb4a6ddde66330bb/reazonspeech.py", line 79, in _split_generators
meta_path = dl_manager.download_and_extract(_BASE_URL + ds['tsv'])
File "/home/hoang/miniconda3/envs/icefall/lib/python3.8/site-packages/datasets/download/download_manager.py", line 434, in download_and_extract
return self.extract(self.download(url_or_urls))
File "/home/hoang/miniconda3/envs/icefall/lib/python3.8/site-packages/datasets/download/download_manager.py", line 257, in download
downloaded_path_or_paths = map_nested(
File "/home/hoang/miniconda3/envs/icefall/lib/python3.8/site-packages/datasets/utils/py_utils.py", line 484, in map_nested
mapped = function(data_struct)
File "/home/hoang/miniconda3/envs/icefall/lib/python3.8/site-packages/datasets/download/download_manager.py", line 313, in _download_batched
return [
File "/home/hoang/miniconda3/envs/icefall/lib/python3.8/site-packages/datasets/download/download_manager.py", line 314, in
self._download_single(url_or_filename, download_config=download_config)
File "/home/hoang/miniconda3/envs/icefall/lib/python3.8/site-packages/datasets/download/download_manager.py", line 323, in _download_single
out = cached_path(url_or_filename, download_config=download_config)
File "/home/hoang/miniconda3/envs/icefall/lib/python3.8/site-packages/datasets/utils/file_utils.py", line 201, in cached_path
output_path = get_from_cache(
File "/home/hoang/miniconda3/envs/icefall/lib/python3.8/site-packages/datasets/utils/file_utils.py", line 633, in get_from_cache
raise ConnectionError(f"Couldn't reach {url} ({repr(head_error)})")
ConnectionError: Couldn't reach https://reazonspeech.s3.abci.ai/v2-tsv/tiny.tsv (ConnectTimeout(MaxRetryError("HTTPSConnectionPool(host='reazonspeech.s3.abci.ai', port=443): Max retries exceeded with url: /v2-tsv/tiny.tsv (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x754709e817c0>, 'Connection to reazonspeech.s3.abci.ai timed out. (connect timeout=100)'))")))
Please help me! Thank you very much !
The text was updated successfully, but these errors were encountered: