Handle audio FileObjects/FileSets in Croissant #240

marcenacp · 2023-10-12T08:21:48Z

Proposal:

We propose to handle audio features using https://schema.org/AudioObject.

Technical strategy:

Done: Check that https://schema.org/AudioObject has all the needed attributes. Decision: use additionalProperties if missing (see the discussion Handle audio ml:Fields using sc:AudioObject #242).
Add an example toy dataset (used for fixtures in the integration tests). Make the mp3 file as small as possible in order to be able to commit it. You can generate the audio. For example, we generated 1-pixel images in pass-mini.
Add a schema.org constant for sc:AudioObject in _src/core/constants.py
Add a case to handle audio in Python in _src/operation_graph/operations/field.py. We have to choose a library to handle audio. We recommend choosing between librosa or sounddevice or pydub. Before choosing the library, make pros and cons of the library, and publish here to have the validation of a maintainer.
Add unit tests when needed.
Update the Croissant standard in the paragraph Known supported data types:.

This can be split in several PRs.

The text was updated successfully, but these errors were encountered:

monke6942021 · 2023-10-12T18:38:02Z

I think we should look into adding support for https://schema.org/VideoObject and plain binary files at some point too.

fineguy · 2023-10-16T10:00:42Z

I had a look at some audio libraries, here are my thoughts. In short: I'm in favor of using librosa.

Libraries overview

Things in common:

pros:
- using MIT or ISC or BSD 3-Clause license.
cons:
- still in 0.* version.

librosa:

Uses soundfile or audioread.
pros:
- Has a reach documentation with lots of examples.
- Downstream libraries support many audio formats.
cons:
- Doesn't support integer-value samples. The rationale being that downstream analyses would implicitly to floating point.

sounddevice:

Provides bindings for PortAudio. It's mainly focused on playing and recording audio.

pydub:

Uses ffmpeg or libav(abandoned project) for file reading/writing.
pros:
- Supports practically all audio formats.
cons:
- Returns integer-value samples which might require additional conversion.

soundfile:

Uses libsndfile for file reading/writing.
pros:
- Supports many audio formats.
cons:
- The online documentation is slightly behind the actual repository. E.g. it lacks information about MP3 support.

audioread:

pros:
- Supports many backends for file reading.
cons:
- Doesn't support file writing.

Conclusion

It looks to me that librosa and pydub are the two most used Python libraries for audio processing. pydub was last released in 2021 while librosa has been steadily updated. Given that librosa also has a better documentation, I'd recommend using it.

fineguy · 2023-10-20T14:30:27Z

I also had a look at the most popular audio datasets from Hugging Face and Papers With Code. They all use either FLAC or WAV audio formats. The only exception is Common Voice which uses MP3.

monke6942021 · 2023-12-19T19:44:12Z

Hey, I notice that in #242 , one of the attributes that we look into is the bitrate. What do we do if there are multiple bitrates, due to there being multiple mp3 files?

marcenacp assigned monke6942021 Oct 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle audio FileObjects/FileSets in Croissant #240

Handle audio FileObjects/FileSets in Croissant #240

marcenacp commented Oct 12, 2023 •

edited

Loading

monke6942021 commented Oct 12, 2023

fineguy commented Oct 16, 2023

fineguy commented Oct 20, 2023

monke6942021 commented Dec 19, 2023 •

edited

Loading

Handle audio FileObjects/FileSets in Croissant #240

Handle audio FileObjects/FileSets in Croissant #240

Comments

marcenacp commented Oct 12, 2023 • edited Loading

monke6942021 commented Oct 12, 2023

fineguy commented Oct 16, 2023

Libraries overview

Conclusion

fineguy commented Oct 20, 2023

monke6942021 commented Dec 19, 2023 • edited Loading

marcenacp commented Oct 12, 2023 •

edited

Loading

monke6942021 commented Dec 19, 2023 •

edited

Loading