-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve audio featurizer and add shift augmentor for DS2. #114
Conversation
1. Improve audio featurizer. 2. Add shift augmentor. 3. Update default argument to be the current best seggestion. 4. Add checkpoints with pass id.
@@ -67,6 +67,54 @@ def from_file(cls, file): | |||
return cls(samples, sample_rate) | |||
|
|||
@classmethod | |||
def slice_from_file(cls, file, start=None, end=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@reviewers:
No different for slice_from_file
and make_silence
. Only re-order them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost LGTM.
deep_speech_2/data_utils/audio.py
Outdated
:type shift_ms: float | ||
:raises ValueError: If shift_ms is longer than audio duration. | ||
""" | ||
if shift_ms / 1000.0 > self.duration: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be abs(shift_ms)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
extracting spectrogram features. | ||
:type target_sample_rate: float | ||
:param use_dB_normalization: Whether to normalize the audio to a certain | ||
decibels before extracting the features. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to change decibels to dB for consistency
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For comments, full name decibels
is used for clarity, while in arguments a short name of dB
is used instead.
I think it makes sense?
if audio_segment.sample_rate != self._target_sample_rate: | ||
raise ValueError("Audio sample rate is not supported. " | ||
"Turn allow_downsampling or allow up_sampling on.") | ||
# decibel normalization |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dB better ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For comments, full name decibels
is used for clarity, while in arguments a short name of dB
is used instead.
I think it makes sense?
:param use_dB_normalization: Whether to normalize the audio to a certain | ||
decibels before extracting the features. | ||
:type use_dB_normalization: bool | ||
:param target_dB: Target audio decibels for normalization. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dB better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For comments, full name decibels
is used for clarity, while in arguments a short name of dB
is used instead.
I think it makes sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great. LGTM.
Patchset for adding missing shift_perturb.py in PR #114.
resolve #113
Training experiment is in progress, and its results will be pasted here ASAP.