-
Notifications
You must be signed in to change notification settings - Fork 19.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added a new Spectrogram layer based on Conv1D operations, supporting GPU-parallelization and fine-tuning #20313
Conversation
…rts GPU-parallelization and fine-tuning.
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #20313 +/- ##
==========================================
+ Coverage 78.81% 78.87% +0.06%
==========================================
Files 512 513 +1
Lines 49058 49233 +175
Branches 9033 9075 +42
==========================================
+ Hits 38665 38834 +169
- Misses 8529 8532 +3
- Partials 1864 1867 +3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting feature, thanks for the PR!
What would be the main use cases for this? And what is the usage pattern in terms of what initializer to use, when to set trainable = False, etc? Can you a simple tutorial that demonstrates the value?
|
||
|
||
@keras_export("keras.layers.Spectrogram") | ||
class Spectrogram(layers.Layer): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"spectrogram" is a bit generic, maybe there could be a more specific name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I renamed to STFTSpectrogram
, which is more specific.
However, I aimed for this to be extended in later PRs to also include Mel-Spectrogram, LogMel-Spectrogram, and MFCCs. These are all audio-based spectrograms, unlike the layer I just committed which is more generic for time-series signals generally. Supporting these output modes would require extra computations at the end of the __call__
function.
If all of these variations would be in one layer in the future, then maybe having the name Spectrogram
is better, which will make this more generic. However, if this is too monolithic and should be handled in a new layer(maybe inheriting from the current layer), then I think the current naming STFTSpectrogram
is sufficient.
What do you think? Should I use STFTSpectrogram
or Spectrogram
? (keeping in mind the possible future extension to Mel-Spectrograms and MFCCs)
… imports for spectrogram layer
This layer comes with its default initializer (the The main two use-cases are:
I can craft an example using this layer. There are mainly three reasons for this this contribution:
K. W. Cheuk, H. Anderson, K. Agres and D. Herremans, "nnAudio: An on-the-Fly GPU Audio to Spectrogram Conversion Toolbox Using 1D Convolutional Neural Networks," in IEEE Access, vol. 8, pp. 161981-162003, 2020. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update!
) | ||
output = spectrogram_layer(input_signal) | ||
``` | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this layer is fairly niche, please illustrate its usage with detailed code examples (with descriptions of what they do and what they'd be used for), e.g. "different modes of output", "non trainable init vs fine-tuning", etc.
The user should be able to read the docstring and understand: what is this layer for? when would I need it? how do I use it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added now three code examples demonstrating the different use cases. Also check the more comprehensive tutorial I added here.
…ittests for it. Added examples in the docstrings of the STFTSpecgrogram
…the STFT modules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good -- thank you for the contribution!
Added a Spectrogram layer that computes several modes of the Spectrograms (like the magnitude, log-magnitude, power spectral density). The computation aims to serve as an official standardized layer for computing spectrograms while being part of the model. The computations are based on Conv1D operations, which makes it parallelizable when running the model on GPUs. Also, since the computations are based on trainable kernels (which can be adjusted to be trainable or not like any other kernel), further fine-tuning of the Spectrogram computations is possible.