Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added a new Spectrogram layer based on Conv1D operations, supporting GPU-parallelization and fine-tuning #20313

Merged
merged 17 commits into from
Oct 5, 2024

Conversation

mostafa-mahmoud
Copy link
Contributor

Added a Spectrogram layer that computes several modes of the Spectrograms (like the magnitude, log-magnitude, power spectral density). The computation aims to serve as an official standardized layer for computing spectrograms while being part of the model. The computations are based on Conv1D operations, which makes it parallelizable when running the model on GPUs. Also, since the computations are based on trainable kernels (which can be adjusted to be trainable or not like any other kernel), further fine-tuning of the Spectrogram computations is possible.

Copy link

google-cla bot commented Oct 1, 2024

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@codecov-commenter
Copy link

codecov-commenter commented Oct 1, 2024

Codecov Report

Attention: Patch coverage is 97.64706% with 4 lines in your changes missing coverage. Please review.

Project coverage is 78.87%. Comparing base (6ec0f46) to head (738e774).
Report is 4 commits behind head on master.

Files with missing lines Patch % Lines
keras/src/initializers/constant_initializers.py 95.34% 1 Missing and 1 partial ⚠️
keras/api/_tf_keras/keras/initializers/__init__.py 0.00% 1 Missing ⚠️
keras/src/layers/preprocessing/stft_spectrogram.py 99.17% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #20313      +/-   ##
==========================================
+ Coverage   78.81%   78.87%   +0.06%     
==========================================
  Files         512      513       +1     
  Lines       49058    49233     +175     
  Branches     9033     9075      +42     
==========================================
+ Hits        38665    38834     +169     
- Misses       8529     8532       +3     
- Partials     1864     1867       +3     
Flag Coverage Δ
keras 78.73% <97.64%> (+0.06%) ⬆️
keras-jax 62.38% <96.47%> (+0.11%) ⬆️
keras-numpy 57.40% <55.88%> (-0.01%) ⬇️
keras-tensorflow 63.64% <88.23%> (+0.08%) ⬆️
keras-torch 62.37% <96.47%> (+0.11%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@fchollet fchollet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting feature, thanks for the PR!

What would be the main use cases for this? And what is the usage pattern in terms of what initializer to use, when to set trainable = False, etc? Can you a simple tutorial that demonstrates the value?



@keras_export("keras.layers.Spectrogram")
class Spectrogram(layers.Layer):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"spectrogram" is a bit generic, maybe there could be a more specific name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed to STFTSpectrogram, which is more specific.

However, I aimed for this to be extended in later PRs to also include Mel-Spectrogram, LogMel-Spectrogram, and MFCCs. These are all audio-based spectrograms, unlike the layer I just committed which is more generic for time-series signals generally. Supporting these output modes would require extra computations at the end of the __call__ function.

If all of these variations would be in one layer in the future, then maybe having the name Spectrogram is better, which will make this more generic. However, if this is too monolithic and should be handled in a new layer(maybe inheriting from the current layer), then I think the current naming STFTSpectrogram is sufficient.

What do you think? Should I use STFTSpectrogram or Spectrogram? (keeping in mind the possible future extension to Mel-Spectrograms and MFCCs)

keras/src/layers/preprocessing/spectrogram.py Outdated Show resolved Hide resolved
keras/src/layers/preprocessing/spectrogram_test.py Outdated Show resolved Hide resolved
@mostafa-mahmoud
Copy link
Contributor Author

mostafa-mahmoud commented Oct 2, 2024

Interesting feature, thanks for the PR!

What would be the main use cases for this? And what is the usage pattern in terms of what initializer to use, when to set trainable = False, etc? Can you a simple tutorial that demonstrates the value?

This layer comes with its default initializer (the STFTInitializer in this pushed code), which computes STFT out-of-the-box.

The main two use-cases are:

  1. Set trainable=False, and this is purely a preprocessing layer.
  2. Set trainable=True, then this becomes a fine-tuned Spectrogram, which can be beneficial in some cases. Cheuk et. al. 2020 study this.

I can craft an example using this layer.

There are mainly three reasons for this this contribution:

  1. Standardize the spectrogram preprocessing as part of the model. Also, this allows deploying the model directly without taking the preprocessing code to the server.

    I often had issues with different implementations across different libraries.

    The convenience of this layer can make it a standard practice, especially, with the serialization in keras.

  2. Using implementation based on convolutions, which makes the operation faster on GPUs due to parallelization, compared to the standard FFT. It also resolves some bottlenecks of preprocessing the Spectrogram on CPU before transferring it to GPU.

    All of these allow faster execution.

  3. The use case of trainable=True, which could lead to some improvements.


K. W. Cheuk, H. Anderson, K. Agres and D. Herremans, "nnAudio: An on-the-Fly GPU Audio to Spectrogram Conversion Toolbox Using 1D Convolutional Neural Networks," in IEEE Access, vol. 8, pp. 161981-162003, 2020.

@mostafa-mahmoud
Copy link
Contributor Author

@fchollet I added another use case of outputting 2D images as well as the original 1D time signals. I also added a code example as you mentioned, you will find the PR on keras-io here.

Copy link
Member

@fchollet fchollet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update!

keras/src/layers/preprocessing/stft_spectrogram.py Outdated Show resolved Hide resolved
keras/src/layers/preprocessing/stft_spectrogram.py Outdated Show resolved Hide resolved
keras/src/layers/preprocessing/stft_spectrogram.py Outdated Show resolved Hide resolved
keras/src/layers/preprocessing/stft_spectrogram.py Outdated Show resolved Hide resolved
)
output = spectrogram_layer(input_signal)
```

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this layer is fairly niche, please illustrate its usage with detailed code examples (with descriptions of what they do and what they'd be used for), e.g. "different modes of output", "non trainable init vs fine-tuning", etc.

The user should be able to read the docstring and understand: what is this layer for? when would I need it? how do I use it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added now three code examples demonstrating the different use cases. Also check the more comprehensive tutorial I added here.

Copy link
Member

@fchollet fchollet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good -- thank you for the contribution!

@google-ml-butler google-ml-butler bot added kokoro:force-run ready to pull Ready to be merged into the codebase labels Oct 5, 2024
@fchollet fchollet merged commit f52f9f5 into keras-team:master Oct 5, 2024
6 checks passed
@google-ml-butler google-ml-butler bot removed ready to pull Ready to be merged into the codebase kokoro:force-run labels Oct 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants