Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refine librispeech.py for DeepSpeech2. #78

Merged
merged 2 commits into from
Jun 9, 2017

Conversation

xinghai-sun
Copy link
Contributor

resolve #77

Summary:

  1. Add manifest line check.
  2. Avoid re-unpacking if unpacked data already exists.
  3. Add full_download (download all 7 sub-datasets of LibriSpeech).

Summary:
1. Add manifest line check.
2. Avoid re-unpacking if unpacked data already exists.
3. Add full_download (download all 7 sub-datasets of LibriSpeech).
@xinghai-sun xinghai-sun requested review from pkuyym, qingqing01 and kuke and removed request for pkuyym June 7, 2017 09:47
Copy link
Contributor

@pkuyym pkuyym left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost LGTM. Please explain the reason of checking the line number when md5 has been checked.

@@ -18,6 +18,7 @@ For some machines, we also need to install libsndfile1. Details to be added.
```
cd data
python librispeech.py
cat manifest.libri.train-* > manifest.libri.train-all
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we introduce the meaning of manifest.libri.train-* file ?
I see, the introduction details is in following section. Feel abrupt about manifest file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

MD5_TRAIN_CLEAN_100 = "2a93770f6d5c6c964bc36631d331a522"
MD5_TRAIN_CLEAN_360 = "c0e676e450a7ff2f54aeade5171606fa"
MD5_TRAIN_OTHER_500 = "d1a0fd59409feb2c614ce4d30c387708"

NUM_LINES_TEST_CLEAN = 2620
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please explain why it's necessary to check the line number when MD5 has been checked.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also feel confused

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

Copy link
Contributor

@pkuyym pkuyym left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@xinghai-sun xinghai-sun merged commit 548c675 into PaddlePaddle:develop Jun 9, 2017
@xinghai-sun xinghai-sun deleted the ds2_libri branch June 12, 2017 12:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Refine librispeech.py for DeepSpeech2
3 participants