Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle audio FileObjects/FileSets in Croissant #240

Open
marcenacp opened this issue Oct 12, 2023 · 4 comments
Open

Handle audio FileObjects/FileSets in Croissant #240

marcenacp opened this issue Oct 12, 2023 · 4 comments
Assignees

Comments

@marcenacp
Copy link
Contributor

marcenacp commented Oct 12, 2023

Proposal:

We propose to handle audio features using https://schema.org/AudioObject.

Technical strategy:

This can be split in several PRs.

@monke6942021
Copy link
Contributor

I think we should look into adding support for https://schema.org/VideoObject and plain binary files at some point too.

@fineguy
Copy link

fineguy commented Oct 16, 2023

I had a look at some audio libraries, here are my thoughts. In short: I'm in favor of using librosa.

Libraries overview

Things in common:

  • pros:
    • using MIT or ISC or BSD 3-Clause license.
  • cons:
    • still in 0.* version.

librosa:

  • Uses soundfile or audioread.
  • pros:
    • Has a reach documentation with lots of examples.
    • Downstream libraries support many audio formats.
  • cons:
    • Doesn't support integer-value samples. The rationale being that downstream analyses would implicitly to floating point.

sounddevice:

  • Provides bindings for PortAudio. It's mainly focused on playing and recording audio.

pydub:

  • Uses ffmpeg or libav(abandoned project) for file reading/writing.
  • pros:
  • cons:
    • Returns integer-value samples which might require additional conversion.

soundfile:

audioread:

  • pros:
    • Supports many backends for file reading.
  • cons:
    • Doesn't support file writing.

Conclusion

It looks to me that librosa and pydub are the two most used Python libraries for audio processing. pydub was last released in 2021 while librosa has been steadily updated. Given that librosa also has a better documentation, I'd recommend using it.

@fineguy
Copy link

fineguy commented Oct 20, 2023

I also had a look at the most popular audio datasets from Hugging Face and Papers With Code. They all use either FLAC or WAV audio formats. The only exception is Common Voice which uses MP3.

@monke6942021
Copy link
Contributor

monke6942021 commented Dec 19, 2023

Hey, I notice that in #242 , one of the attributes that we look into is the bitrate. What do we do if there are multiple bitrates, due to there being multiple mp3 files?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants