Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot download reazonspeech dataset #1675

Open
hoangtm-aimesoft opened this issue Jul 1, 2024 · 4 comments
Open

Cannot download reazonspeech dataset #1675

hoangtm-aimesoft opened this issue Jul 1, 2024 · 4 comments

Comments

@hoangtm-aimesoft
Copy link

Hi everyone, I'm trying to download and prepare reazonspeech but I got this error:

2024-07-01 15:17:59 (prepare.sh:37:main) Running prepare.sh
2024-07-01 15:17:59 (prepare.sh:39:main) dl_dir: /home/hoang/PycharmProjects/icefall/egs/reazonspeech/ASR/download
2024-07-01 15:17:59 (prepare.sh:42:main) Stage 0: Download data
2024-07-01 15:18:00,919 INFO [config.py:58] PyTorch version 2.3.1+cpu available.
2024-07-01 15:18:01,070 INFO [reazonspeech.py:98] Downloading ReazonSpeech part: tiny
Downloading builder script: 100%|███████████████████████████████████| 4.74k/4.74k [00:00<00:00, 7.02MB/s]
Downloading readme: 100%|███████████████████████████████████████████| 2.27k/2.27k [00:00<00:00, 4.26MB/s]
Traceback (most recent call last):
File "/home/hoang/miniconda3/envs/icefall/bin/lhotse", line 8, in
sys.exit(cli())
File "/home/hoang/miniconda3/envs/icefall/lib/python3.8/site-packages/click/core.py", line 1157, in call
return self.main(*args, **kwargs)
File "/home/hoang/miniconda3/envs/icefall/lib/python3.8/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
File "/home/hoang/miniconda3/envs/icefall/lib/python3.8/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/hoang/miniconda3/envs/icefall/lib/python3.8/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/hoang/miniconda3/envs/icefall/lib/python3.8/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/hoang/miniconda3/envs/icefall/lib/python3.8/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/home/hoang/miniconda3/envs/icefall/lib/python3.8/site-packages/lhotse/bin/modes/recipes/reazonspeech.py", line 52, in reazonspeech
download_reazonspeech(target_dir, dataset_parts=subset)
File "/home/hoang/miniconda3/envs/icefall/lib/python3.8/site-packages/lhotse/recipes/reazonspeech.py", line 99, in download_reazonspeech
ds = load_dataset(
File "/home/hoang/miniconda3/envs/icefall/lib/python3.8/site-packages/datasets/load.py", line 2616, in load_dataset
builder_instance.download_and_prepare(
File "/home/hoang/miniconda3/envs/icefall/lib/python3.8/site-packages/datasets/builder.py", line 1029, in download_and_prepare
self._download_and_prepare(
File "/home/hoang/miniconda3/envs/icefall/lib/python3.8/site-packages/datasets/builder.py", line 1791, in _download_and_prepare
super()._download_and_prepare(
File "/home/hoang/miniconda3/envs/icefall/lib/python3.8/site-packages/datasets/builder.py", line 1102, in _download_and_prepare
split_generators = self._split_generators(dl_manager, **split_generators_kwargs)
File "/home/hoang/.cache/huggingface/modules/datasets_modules/datasets/reazon-research--reazonspeech/e16d1ee2aae813b6ea960f564f4dc8481f58bfa6074be491eb4a6ddde66330bb/reazonspeech.py", line 79, in _split_generators
meta_path = dl_manager.download_and_extract(_BASE_URL + ds['tsv'])
File "/home/hoang/miniconda3/envs/icefall/lib/python3.8/site-packages/datasets/download/download_manager.py", line 434, in download_and_extract
return self.extract(self.download(url_or_urls))
File "/home/hoang/miniconda3/envs/icefall/lib/python3.8/site-packages/datasets/download/download_manager.py", line 257, in download
downloaded_path_or_paths = map_nested(
File "/home/hoang/miniconda3/envs/icefall/lib/python3.8/site-packages/datasets/utils/py_utils.py", line 484, in map_nested
mapped = function(data_struct)
File "/home/hoang/miniconda3/envs/icefall/lib/python3.8/site-packages/datasets/download/download_manager.py", line 313, in _download_batched
return [
File "/home/hoang/miniconda3/envs/icefall/lib/python3.8/site-packages/datasets/download/download_manager.py", line 314, in
self._download_single(url_or_filename, download_config=download_config)
File "/home/hoang/miniconda3/envs/icefall/lib/python3.8/site-packages/datasets/download/download_manager.py", line 323, in _download_single
out = cached_path(url_or_filename, download_config=download_config)
File "/home/hoang/miniconda3/envs/icefall/lib/python3.8/site-packages/datasets/utils/file_utils.py", line 201, in cached_path
output_path = get_from_cache(
File "/home/hoang/miniconda3/envs/icefall/lib/python3.8/site-packages/datasets/utils/file_utils.py", line 633, in get_from_cache
raise ConnectionError(f"Couldn't reach {url} ({repr(head_error)})")
ConnectionError: Couldn't reach https://reazonspeech.s3.abci.ai/v2-tsv/tiny.tsv (ConnectTimeout(MaxRetryError("HTTPSConnectionPool(host='reazonspeech.s3.abci.ai', port=443): Max retries exceeded with url: /v2-tsv/tiny.tsv (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x754709e817c0>, 'Connection to reazonspeech.s3.abci.ai timed out. (connect timeout=100)'))")))

Please help me! Thank you very much !

@csukuangfj
Copy link
Collaborator

@Triplecq Could you have a look?

@Triplecq
Copy link
Contributor

Triplecq commented Jul 2, 2024

Thanks for the note!

I just checked our hosting server on ABCI, and it seems that their service is down: https://abci.ai/en/about_abci/info.html

We will also send you a note if we could provide an alternative access to the data.

Thank you for your interests and patience.

@yuyun2000
Copy link

谢谢你的留言!

我刚刚检查了我们在 ABCI 上的托管服务器,发现他们的服务似乎已关闭:https://abci.ai/en/about_abci/info.html

如果我们可以提供其他数据访问权限,我们也会向您发送通知。

感谢您的关注和耐心。

I can download the dataset from HF, but I don't know the structure directory of reazon, and I don't know how to store the downloaded data so that it can be properly processed by 'lhotse prepare reazonspeech'

@yuyun2000
Copy link

I have already used the Common Voice recipe for training. It requires a bit of work to combine the ReazonSpeech data with the Common Voice data. For modeling, I am using character-level modeling instead of BPE.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants