`audio/wave` `.wav` files not supported #603

NightMachinery · 2024-11-03T15:51:21Z

I'm recording audio from my microphone using sox and saving the recordings as .wav files. When I try to attach these files to the gemini-1.5-flash-8b-latest model, I receive this error:

Error: This model does not support attachments of type 'audio/wave', only application/pdf, image/png, image/jpeg, image/webp, image/heic, image/heif, audio/wav, audio/mp3, audio/aiff, audio/aac, audio/ogg, audio/flac, audio/mpeg, video/mp4, video/mpeg, video/mov, video/avi, video/x-flv, video/mpg, video/webm, video/wmv, video/3gpp

I suspect the issue is simply that llm doesn't recognize that audio/wave and audio/wav are actually the same MIME type. Is this correct?

The text was updated successfully, but these errors were encountered:

simonw · 2024-11-08T00:16:03Z

Yup, that's a bug - thanks. You can workaround it with the --at option which lets you specify the type directly:

 llm -m gemini-1.5-flash-latest --at output.wav audio/wav transcribe

Thanks for the tip about sox by the way, this worked for me on macOS:

brew install sox
sox -d output.wav                                                 
# Hit Ctrl+C when done

simonw · 2024-11-08T00:20:11Z

It looks like audio/wav is indeed the correct content type here. Not clear where audio/wave came from, but the library I'm using for content type detection - https://pypi.org/project/puremagic/ - apparently supports both wave https://github.com/cdgriffith/puremagic/blob/763349ec4d02ba930fb1142c6eb684afdf06c6ab/puremagic/magic_data.json#L103 and wav https://github.com/cdgriffith/puremagic/blob/763349ec4d02ba930fb1142c6eb684afdf06c6ab/puremagic/magic_data.json#L1118 and it looks like it detects audio/wave in preference for some reason.

simonw · 2024-11-08T00:22:37Z

puremagic uses data from https://www.garykessler.net/library/file_sigs.html - it lists two byte sequences for WAV

The first of those matches the puremagic definition of audio/wave, the second matches its audio/wav.

simonw · 2024-11-08T00:24:52Z

Interesting, the output.wav file I created using sox looks like this:

hexdump -C output.wav | head -n 4

00000000  52 49 46 46 48 e0 02 00  57 41 56 45 66 6d 74 20  |RIFFH...WAVEfmt |
00000010  28 00 00 00 fe ff 01 00  44 ac 00 00 10 b1 02 00  |(.......D.......|
00000020  04 00 20 00 16 00 20 00  04 00 00 00 01 00 00 00  |.. ... .........|
00000030  00 00 10 00 80 00 00 aa  00 38 9b 71 66 61 63 74  |.........8.qfact|

Which is BOTH of the lines in the file_sigs.html thing, so maybe I misinterpreted that and there is only one audio/wave file format and it's that?

In which case, why does puremagic have those two sequences listed separately in their magic_data.json file?

simonw · 2024-11-08T00:29:51Z

This file in the puremagic tests has the same header: https://github.com/cdgriffith/puremagic/blob/master/test/resources/audio/test.wav

That's one of four audio files in the tests https://github.com/cdgriffith/puremagic/tree/master/test/resources/audio - and the only assertion it runs is that the file extension .wav is correctly determined: https://github.com/cdgriffith/puremagic/blob/763349ec4d02ba930fb1142c6eb684afdf06c6ab/test/test_common_extensions.py#L43-L49

simonw · 2024-11-08T00:41:45Z

Filed an issue here:

.wav files detected as audio/wave when maybe they should be audio/wav cdgriffith/puremagic#104

But seeing as IANA doesn't list either audio/wav or audio/wave on https://www.iana.org/assignments/media-types/media-types.xhtml#audio it's not clear that there IS a correct answer here!

simonw · 2024-11-08T00:45:10Z

Also relevant:

python -c 'import puremagic, pprint, sys; pprint.pprint(puremagic.magic_stream(open(sys.argv[-1], "rb")))' output.wav

[PureMagicWithConfidence(byte_match=b'RIFFH\xe0\x02\x00WAVE', offset=8, extension='.wav', mime_type='audio/wave', name='Waveform Audio File Format', confidence=0.8),
 PureMagicWithConfidence(byte_match=b'WAVEfmt ', offset=8, extension='.wav', mime_type='audio/x-wav', name='Windows audio file ', confidence=0.8),
 PureMagicWithConfidence(byte_match=b'RIFF', offset=0, extension='.4xm', mime_type='', name='4X Movie video', confidence=0.4),
 PureMagicWithConfidence(byte_match=b'RIFF', offset=0, extension='.cdr', mime_type='', name='CorelDraw document', confidence=0.4),
 PureMagicWithConfidence(byte_match=b'RIFF', offset=0, extension='.avi', mime_type='video/avi', name='Resource Interchange File Format', confidence=0.4),
 PureMagicWithConfidence(byte_match=b'RIFF', offset=0, extension='.cda', mime_type='', name='Resource Interchange File Format', confidence=0.4),
 PureMagicWithConfidence(byte_match=b'RIFF', offset=0, extension='.qcp', mime_type='audio/vnd.qcelp', name='Resource Interchange File Format', confidence=0.4),
 PureMagicWithConfidence(byte_match=b'RIFF', offset=0, extension='.rmi', mime_type='audio/mid', name='Resource Interchange File Format', confidence=0.4),
 PureMagicWithConfidence(byte_match=b'RIFF', offset=0, extension='.wav', mime_type='audio/wav', name='Resource Interchange File Format', confidence=0.4),
 PureMagicWithConfidence(byte_match=b'RIFF', offset=0, extension='.ds4', mime_type='', name='Micrografx Designer graphic', confidence=0.4),
 PureMagicWithConfidence(byte_match=b'RIFF', offset=0, extension='.ani', mime_type='application/x-navi-animation', name='Windows animated cursor', confidence=0.4),
 PureMagicWithConfidence(byte_match=b'RIFF', offset=0, extension='.dat', mime_type='video/mpeg', name='Video CD MPEG movie', confidence=0.4),
 PureMagicWithConfidence(byte_match=b'RIFF', offset=0, extension='.cmx', mime_type='', name='Corel Presentation Exchange metadata', confidence=0.4),
 PureMagicWithConfidence(byte_match=b'RIFF', offset=0, extension='.webp', mime_type='image/webp', name='RIFF WebP', confidence=0.4),
 PureMagicWithConfidence(byte_match=b'WAVE', offset=8, extension='.wav', mime_type='audio/x-wav', name='WAV audio', confidence=0.4)]

simonw · 2024-11-08T00:49:07Z

For the moment I'm going to take the opinion that audio/wav is correct and have LLM treat audio/wave as audio/wav in core. I'll change that if it turns out to be a mistake in the future.

simonw · 2024-11-08T01:19:34Z

This works:

llm -m gemini-1.5-flash-latest -a output.wav transcribe

This is a quick test that I'm doing

NightMachinery · 2024-11-09T20:19:37Z

Thanks! ❤️ So llm detects the MIME type and hardcodes it for the API call? How does llm know if the API accepts some MIME or not?

simonw · 2024-11-09T21:10:44Z

Each plugin defines the list of accepted mime type like this:

llm/llm/default_plugins/openai_models.py

Lines 315 to 333 in 5d1d723

    
           self.attachment_types = set() 
        
           if vision: 
        
               self.attachment_types.update( 
        
                   { 
        
                       "image/png", 
        
                       "image/jpeg", 
        
                       "image/webp", 
        
                       "image/gif", 
        
                   } 
        
               ) 
        
           if audio: 
        
               self.attachment_types.update( 
        
                   { 
        
                       "audio/wave", 
        
                       "audio/mpeg", 
        
                   } 
        
               )

Full docs here: https://llm.datasette.io/en/stable/plugins/advanced-model-plugins.html#attachments-for-multi-modal-models

…els (#613) - #507 (comment) * register_model is now async aware Refs #507 (comment) * Refactor Chat and AsyncChat to use _Shared base class Refs #507 (comment) * fixed function name * Fix for infinite loop * Applied Black * Ran cog * Applied Black * Add Response.from_row() classmethod back again It does not matter that this is a blocking call, since it is a classmethod * Made mypy happy with llm/models.py * mypy fixes for openai_models.py I am unhappy with this, had to duplicate some code. * First test for AsyncModel * Still have not quite got this working * Fix for not loading plugins during tests, refs #626 * audio/wav not audio/wave, refs #603 * Black and mypy and ruff all happy * Refactor to avoid generics * Removed obsolete response() method * Support text = await async_mock_model.prompt("hello") * Initial docs for llm.get_async_model() and await model.prompt() Refs #507 * Initial async model plugin creation docs * duration_ms ANY to pass test * llm models --async option Refs #613 (comment) * Removed obsolete TypeVars * Expanded register_models() docs for async * await model.prompt() now returns AsyncResponse Refs #613 (comment) --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

Refs #507, #599, #600, #603, #608, #611, #612, #613, #614, #615, #616, #621, #622, #623, #626, #629

Refs #507, #600, #603, #608, #611, #612, #614

simonw added bug Something isn't working attachments labels Nov 6, 2024

simonw closed this as completed in 5d1d723 Nov 8, 2024

simonw added a commit that referenced this issue Nov 13, 2024

audio/wav not audio/wave, refs #603

145b5cd

simonw added a commit that referenced this issue Nov 13, 2024

audio/wav not audio/wave, refs #603

7520671

simonw added a commit that referenced this issue Nov 14, 2024

Release 0.18a0

041730d

Refs #507, #599, #600, #603, #608, #611, #612, #613, #614, #615, #616, #621, #622, #623, #626, #629

simonw added a commit that referenced this issue Nov 17, 2024

Release 0.18

a6d62b7

Refs #507, #600, #603, #608, #611, #612, #614

simonw added a commit that referenced this issue Nov 18, 2024

Release 0.18

fcdac08

Refs #507, #600, #603, #608, #611, #612, #614

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`audio/wave` `.wav` files not supported #603

`audio/wave` `.wav` files not supported #603

NightMachinery commented Nov 3, 2024 •

edited

Loading

simonw commented Nov 8, 2024

simonw commented Nov 8, 2024

simonw commented Nov 8, 2024

simonw commented Nov 8, 2024 •

edited

Loading

simonw commented Nov 8, 2024

simonw commented Nov 8, 2024

simonw commented Nov 8, 2024

simonw commented Nov 8, 2024

simonw commented Nov 8, 2024

NightMachinery commented Nov 9, 2024

simonw commented Nov 9, 2024

audio/wave .wav files not supported #603

audio/wave .wav files not supported #603

Comments

NightMachinery commented Nov 3, 2024 • edited Loading

simonw commented Nov 8, 2024

simonw commented Nov 8, 2024

simonw commented Nov 8, 2024

simonw commented Nov 8, 2024 • edited Loading

simonw commented Nov 8, 2024

simonw commented Nov 8, 2024

simonw commented Nov 8, 2024

simonw commented Nov 8, 2024

simonw commented Nov 8, 2024

NightMachinery commented Nov 9, 2024

simonw commented Nov 9, 2024

`audio/wave` `.wav` files not supported #603

`audio/wave` `.wav` files not supported #603

NightMachinery commented Nov 3, 2024 •

edited

Loading

simonw commented Nov 8, 2024 •

edited

Loading