-
Notifications
You must be signed in to change notification settings - Fork 684
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update audio feature extraction tutorial #2391
Conversation
ca76d1c
to
7c35c28
Compare
The image in Section 2 seems not rendered correctly. |
@nateanl Fixed the asset link. thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm overall. just some nits to be addressed.
# The following diagram shows the relashipship between common audio features | ||
# and torchaudio APIs to generate them. | ||
# | ||
# .. image:: https://download.pytorch.org/torchaudio/tutorial-assets/torchaudio_feature_extractions.png |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you plan to add ComputeDelta
to this diagram? It can be above the MFCC feature IMO. Delta feature is often used along with MFCC by kaldi users for ASR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not this time. let's followup later.
|
||
###################################################################### | ||
# | ||
|
||
plot_spectrogram(melspec[0], title="MelSpectrogram - torchaudio", ylabel="mel freq") | ||
|
||
###################################################################### | ||
# Comparison against librosa |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section doesn't have section index, is it intended?
- Adopt torchaudio.utils.download_asset to simplify asset management. - Break down the first section about helper functions. - Reduce the number of helper functions - Add section number
7c35c28
to
894e7b3
Compare
pitch = F.detect_pitch_frequency(waveform, sample_rate) | ||
plot_pitch(waveform, sample_rate, pitch) | ||
play_audio(waveform, sample_rate) | ||
plot_pitch(SPEECH_WAVEFORM, SAMPLE_RATE, pitch) | ||
|
||
###################################################################### | ||
# Kaldi Pitch (beta) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ooc is this still in beta phase?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
Co-authored-by: nateanl <zni@fb.com> Co-authored-by: Caroline Chen <carolinechen@fb.com>
@mthrok has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Hey @mthrok. |
https://app.circleci.com/pipelines/github/pytorch/audio/11065/workflows/8fd5da50-71af-46f3-8c95-cadbc7be3a1a/jobs/704592