Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error loading MP3 files from CommonVoice #5488

Closed
kradonneoh opened this issue Jan 31, 2023 · 4 comments
Closed

Error loading MP3 files from CommonVoice #5488

kradonneoh opened this issue Jan 31, 2023 · 4 comments

Comments

@kradonneoh
Copy link

Describe the bug

When loading a CommonVoice dataset with datasets==2.9.0 and torchaudio>=0.12.0, I get an error reading the audio arrays:

---------------------------------------------------------------------------
LibsndfileError                           Traceback (most recent call last)
~/.local/lib/python3.8/site-packages/datasets/features/audio.py in _decode_mp3(self, path_or_file)
    310             try:  # try torchaudio anyway because sometimes it works (depending on the os and os packages installed)
--> 311                 array, sampling_rate = self._decode_mp3_torchaudio(path_or_file)
    312             except RuntimeError:

~/.local/lib/python3.8/site-packages/datasets/features/audio.py in _decode_mp3_torchaudio(self, path_or_file)
    351 
--> 352         array, sampling_rate = torchaudio.load(path_or_file, format="mp3")
    353         if self.sampling_rate and self.sampling_rate != sampling_rate:

~/.local/lib/python3.8/site-packages/torchaudio/backend/soundfile_backend.py in load(filepath, frame_offset, num_frames, normalize, channels_first, format)
    204     """
--> 205     with soundfile.SoundFile(filepath, "r") as file_:
    206         if file_.format != "WAV" or normalize:

~/.local/lib/python3.8/site-packages/soundfile.py in __init__(self, file, mode, samplerate, channels, subtype, endian, format, closefd)
    654                                          format, subtype, endian)
--> 655         self._file = self._open(file, mode_int, closefd)
    656         if set(mode).issuperset('r+') and self.seekable():

~/.local/lib/python3.8/site-packages/soundfile.py in _open(self, file, mode_int, closefd)
   1212             err = _snd.sf_error(file_ptr)
-> 1213             raise LibsndfileError(err, prefix="Error opening {0!r}: ".format(self.name))
   1214         if mode_int == _snd.SFM_WRITE:

LibsndfileError: Error opening <_io.BytesIO object at 0x7fa539462090>: File contains data in an unknown format.

I assume this is because there's some issue with the mp3 decoding process. I've verified that I have ffmpeg>=4 (on a Linux distro), which appears to be the fallback backend for torchaudio, (at least according to #4889).

Steps to reproduce the bug

dataset = load_dataset("mozilla-foundation/common_voice_11_0", "be", split="train")
dataset[0]

Expected behavior

Similar behavior to torchaudio<0.12.0, which doesn't result in a LibsndfileError

Environment info

  • datasets version: 2.9.0
  • Platform: Linux-5.15.0-52-generic-x86_64-with-glibc2.29
  • Python version: 3.8.10
  • PyArrow version: 10.0.1
  • Pandas version: 1.5.1
@albertvillanova
Copy link
Member

albertvillanova commented Feb 1, 2023

Hi @kradonneoh, thanks for reporting.

Please note that to work with audio datasets (and specifically with MP3 files) we have detailed installation instructions in our docs: https://huggingface.co/docs/datasets/installation#audio

  • one of the requirements is torchaudio<0.12.0

Let us know if the problem persists after having followed them.

@kradonneoh
Copy link
Author

I saw that and have followed it (hence the Expected Behavior section of the bug report).

Is there no intention of updating to the latest version? It does limit the version of torch I can use, which isn’t ideal.

@polinaeterna
Copy link
Contributor

@kradonneoh hey! actually with ffmpeg4 loading of mp3 files should work, so this is a not expected behavior and we need to investigate it. It works on my side with torchaudio==0.13 and ffmpeg==4.2.7. Which torchaudio version do you use?

datasets should support decoding of mp3 files with torchaudio when its version is >0.12 but as you noted it requires ffmpeg>4, we need to fix this in the documentation, thank you for pointing to this!

But according to your traceback it seems that it tries to use libsndfile backend for mp3 decoding. And libsndfile library supports mp3 decoding starting from version 1.1.0 which on Linux has to be compiled from source for now afaik.

fyi - we are aiming at getting rid of torchaudio dependency at all by the next major library release in favor of libsndfile too.

@mariosasko
Copy link
Collaborator

We now decode MP3 with soundfile, so I'm closing this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants