scatter-sounds

This a web visualization and listening project based on scatter-gl, web-audio and tensorflow-hub AI sound models. Most of the time sound datasets are explored visually mainly through spectrograms or other time-frequency representation. This is an effort to show how fast you can overview a dataset through hearing it ordered in a similarity space.

Python preprocessing scripts are provided to handle single long audios, or datasets of sound clips that are stored in a single folder like ESC-50 or urbansound8k. The model used to generate the similarity space is YAMNet which is easily aviable throug the tensorflow-hub repository.

Examples

ESC50 2000 clips of 50 sound categories.
Urbansound8k fold1 875 clips of 10 sound categories.
Urbansound8k fold2 888 clips of 10 sound categories.

Interactions

When you hoover over a point or spectrogram image, you'll hear the clip and the metadata will appear on the left side of the screen.
When you click the point you'll hear the clip on loop 4 times.
Selection will play a random clip from the selected ones.

Note: You'll have to wait until the page loads the soundfile in order to hear the clips.

Usage

The wepbage needs 4 files to render the sound dataset.

config.json: Store metadata of the dataset and paths to the data files.
projections.json: Store the 3d projections of the YAMNet embeddings, labels and other useful metadata of the clips.
sprite.jpg: The spritesheet image of log-melspectrograms that uses the YAMNet model for each clip of the dataset.
audio.flac: The "spriteclip" audio with all the dataset clips merged.

ESC50 Spritesheet of 2000 mel-spectrograms

Data preprocessing

The python preprocessing script would receive as input a path to a folder that contains audio clips, or a path to a long audio file.

Dataset case

Audio clips on a folder(E.g. Esc50, UB8k)

cd preprocess 
python preprocess.py -d <path_to_audio_folder>

This would try to:

Load all the audios in the folder
Extract a clip region around the signal maximum amplitude of 0.96 seconds(YAMNet window analysis size). If the clip has a duration less than 0.96 seconds it would be padded with zeros.
Merge all trimmed clips, and resample the merged audio to the expected model sample rate(16Khz).
Get the the YAMNet embeddings and Log-melspectrogram of the merged signal. Note: odd index embeddings are discarded to get one embedding per clip, and avoid clipwise aggregation and inference.
Compute audio descriptors and parse labels from clip filenames for the metadata.
Reduce the dimensionality of the YAMNet emb(1024) to 3 components.
Generate the spritesheet image, the sprite clip, and the projections file.

A parse label function and a label list have to defined and passed as arguments to process_clips_from_folder function. Examples are provided for ESC50, and urbansound8k dataset.

Long audio case

cd preprocess
python preprocess.py -f <path_to_long_audio>

This would try to.

Load the long audio.
Resample to the expected model sample rate(16Khz).
Get the the YAMNet embeddings and Log-melspectrogram of the signal.
Compute audio descriptors and generate filenames to display of the starting second of the segment for the metadata.
Reduce the dimensionality of the YAMNet emb(1024) to 3 components.
Generate the spritesheet image, the sprite clip, and the projections file.

The generated files are stored in the data folder of the project, once they are generated you can pass as url argument the name of your dataset and it will be rendered.

Note: Remember that the space that is rendered in the page is a projection into 3 components that uses either UMAP(default) or T-SNE with it's advantages and caveats. Please read Understanding UMAP and How to Use t-SNE Effectively

Acknowledgements

This project relies on the work done in the GSOC 2021 with Orcasound about exploring sound datasets on embedding spaces, big thanks to my mentors. Also, if you're a Spanish speaker I'm going to recommend this course that developed Irán Roman, who introduced me and continues to teach me about this exciting field of sounds and ML.

Name	Name	Last commit message	Last commit date
Latest commit wetdog Merge pull request #6 from wetdog/add-license-1 Jan 4, 2023 9a6ee8b · Jan 4, 2023 History 79 Commits
assets	assets	Add media for README	Sep 18, 2022
data	data	Update datasets	Sep 18, 2022
preprocess	preprocess	Add generic parse label function for filenames	Sep 26, 2022
.gitignore	.gitignore	Add example data	Sep 15, 2022
LICENSE	LICENSE	Add apache license	Jan 4, 2023
README.md	README.md	Update README.md	Oct 1, 2022
favicon.ico	favicon.ico	Add example data	Sep 15, 2022
index.html	index.html	handle different dataset hrough selectors	Sep 18, 2022
renderAudio.js	renderAudio.js	delete unused function	Sep 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

scatter-sounds

Examples

Interactions

Usage

ESC50 Spritesheet of 2000 mel-spectrograms

Data preprocessing

Dataset case

Long audio case

Acknowledgements

About

Releases

Packages

Languages

License

wetdog/scatter-sounds

Folders and files

Latest commit

History

Repository files navigation

scatter-sounds

Examples

Interactions

Usage

ESC50 Spritesheet of 2000 mel-spectrograms

Data preprocessing

Dataset case

Long audio case

Acknowledgements

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages