Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raw audio recording not supported #2391

Closed
goldwaving opened this issue Mar 23, 2021 · 38 comments
Closed

Raw audio recording not supported #2391

goldwaving opened this issue Mar 23, 2021 · 38 comments
Labels
category: new feature Substantive changes that add new functionality. https://www.w3.org/policies/process/#class-4

Comments

@goldwaving
Copy link

I am creating an audio editor app. To allow editing of newly recorded audio, raw audio needs to be obtained. Unless I am missing something, this basic functionality seems to be missing from the specification. Here is what I have found so far:

  1. ScriptProcessorNode is deprecated and should not be used.
  2. MediaRecorder does not support raw audio.
  3. AnalyserNode does not provide seamless sequential data (it has gaps or overlaps).
  4. AudioWeblet is not supported on Safari and is needlessly complicated for such a simple task.

Is there another option I have not discovered yet? If not, could an AudioPassThroughNode be considered. It would have an ondata event that provides seamless, raw audio data passing through the node, perhaps with a starting time and an ending time and other details. Alternatively requiring support for raw audio in MediaRecord would work.

@rtoy
Copy link
Member

rtoy commented Mar 23, 2021

The ScriptProcessorNode is deprecated, but I personally do not expect it to be removed any time soon. It will likely continue to work in all browsers for a very long time.

It's also unlikely WebAudio wil fix this since MediaRecorder exists. If you want raw audio you should file an issue on the MediaRecorder spec, and/or with the browsers to support raw audio in some form. (I personally would like it if supported some kind lossless mode, compressed or not.)

@goldwaving
Copy link
Author

Thanks for the reply. I'll use ScriptProcessorNode for now. I would still like to see an AudioPassThroughNode, which should be very easy to implement or maybe even just add a timestamp to the AnalyserNode so we can know the exact time frame of the samples. (AudioContext may not provide a sample accurate time).

@rtoy
Copy link
Member

rtoy commented Mar 24, 2021

What is an AudioPassThroughNode and what does it do? You can file a feature request for this node, if you like.

We could add a timestamp to the AnalyserNode. Most likely this would tell you the context time of the first sample in the time domain data.

@guest271314

This comment was marked as off-topic.

@goldwaving
Copy link
Author

An AudioWorklet or any kind of processor node is too extreme for such a simple requirement (and it is not supported in Safari anyway).

An AudioPassThorughNode would be a very basic node that allows data to pass through it unmodified. It would simply have an event that provides AudioBuffers sequentially (gapless/seamless). This would allow an app to extract the raw audio data from any point within the graph.

@guest271314

This comment was marked as off-topic.

@guest271314

This comment was marked as off-topic.

@guest271314

This comment was marked as off-topic.

@guest271314

This comment was marked as off-topic.

@guest271314

This comment was marked as off-topic.

@guest271314

This comment was marked as off-topic.

@goldwaving
Copy link
Author

Thanks for the suggestions. MediaRecorder and decodeAudioData are not an option. Any encoding/decoding of the audio must be avoided for audio editing software. MediaRecorder does not support unencoded audio on most web browsers. Also live access to the audio data is required for peak meters and waveform drawing. decodeAudioData resamples to the AudioContext's sampling rate, which is also undesirable if getUserMedia uses a different sampling rate.

Maybe AudioPassThroughNode is not the best name. Perhaps AudioBufferNode or AudioDataEventNode would be better. Right now there is no node to passively access audio data passing though the graph in a seamless/gapless way.

For now I'll have to use ScriptProcessorNode until an alternative becomes more widely available.

@guest271314

This comment was marked as off-topic.

@goldwaving
Copy link
Author

I should have stated in the OP that access to the live recorded data is required. Sorry about that.

If MediaRecorder is not or cannot be used, how do you record the audio in the first place?

That is the problem and why ScriptProcessorNode is the only solution at the moment.

Yes, AudioWorklet could be used on most browsers, but it has the following disadvantages:

  1. It requires developers to implement this simple functionality each time. I think a basic node like this should be part of Web Audio, rather than having to implement it.
  2. It adds complexity, requiring a separate js file, registration, etc.
  3. It runs on a different thread.
  4. It has more overhead when all we need is to copy the data and not modify it.

@guest271314

This comment was marked as off-topic.

@goldwaving
Copy link
Author

Every new developer that needs to examine or copy the raw audio in a graph (there seems to be quite a few of them when I was searching for a solution) would have to re-implement this simple functionality as an AudioWorklet. It would be easier and better to have an dedicated node for this purpose. Instead of making developers search for how to do it using an AudioWorklet, they'd just create the dedicated node and use it.

Making developers write many lines of code for something this trivial is not developer friendly. Requiring developers to learn about AudioWorklet (and moving data between threads) for something this trivial also isn't. I can see many new developers struggling with this in the future. A dedicated node would have made things easier for me. I had to invest way too much time in figuring this out, then resort to a deprecated feature. Something like this should already be there.

AudioWorklets are great for many different things. This isn't one of them.

@rtoy
Copy link
Member

rtoy commented Mar 30, 2021

So, you basically want an AudioPassThruNode to take its input and periodically fires an event that contains the audio data that has been received since the last event was fired?

That could be a lot of events, generating lots of garbage.

Introspection like this was never part of the original design. I can see it being useful for looking at various parts of the graph to see what is happening. Kind of like an oscilloscope probe at desired points in the graph.

@bradisbell
Copy link

That could be a lot of events, generating lots of garbage.

That "garbage" is exactly the audio data we're using for our applications. :-) Yes, that's a lot of events, and yes they are useful/required for many use cases. Sure, there's a lot of overhead, and yet doing audio is exactly what we need to do.

In my own usage of the Web Audio API, almost everything I've built requires the ScriptProcessorNode to capture data precisely because there is no other way to capture raw audio data. Even once AudioWorklet becomes viable, I think there will be a lot of overhead in shuffling buffers around to get that data back on the main thread/context. Even if MediaRecorder were to support WAV, it still only emits chunks when it feels like it rather than being locked to the audio processing time. (You can specify a time, but it can't be guaranteed as it is dependent on the container and such. And realistically, we don't always want a container. WAV container support would be great, but there are plenty of use cases where we just want raw PCM samples.)

@rtoy
Copy link
Member

rtoy commented Mar 30, 2021

Basically, this is a synchronous version of an AnalyserNode, but only returns time data.

@guest271314

This comment was marked as off-topic.

@guest271314

This comment was marked as off-topic.

@padenot
Copy link
Member

padenot commented Apr 1, 2021

AudioWG call:

  • Adding a new node for this while MediaRecorder exists doesn't feel right
  • It's better to add RIFF WAV (possibly with different bit depth, probably f32 and i16 first) support to MediaRecorder, which is specifically designed to do what is needed in this thread
  • Any complex use-case can be implemented using an AudioWorklet and ring buffers, like it's done in native. WebCodecs is coming as well, but for WAV it's easy. There is no more overhead than in native, because SharedArrayBuffer is available now.
  • In general there is no "requirement" on what browser support what codec (a bit like there's no requirement for browsers to implement specifications in their entirety, or at all)
  • Safari's AudioWorklet implementation is around the corner

@guest271314

This comment was marked as off-topic.

@goldwaving
Copy link
Author

As someone that's been dealing with Wave files for over 25 years, my advice about adding RIFF Wave to MediaRecorder and potentially having malformed Wave files with zero length RIFF and 'data' chunks is: Don't do it! Also it is not safe to assume the Wave header is always 44 bytes. If the file contains 24 bit or multichannel audio, WAVE_FORMAT_EXTENSIBLE must be used, which has a completely different chunk size. Forcing developers to skip the Wave header just to get to the raw data is not a good idea. Trust me. Just give us the raw data, please.

I will reiterate that many developers (myself included) will continue to use ScriptProcessorNode because it does exactly what we need: we have some control over latency/block size, we get real-time raw data, and (very important) it is very easy to use (much easier than setting up an AudioWorklet).

If raw audio support was mandatory for MediaRecorder, that would help.

Good to know that Safari will eventually support AudioWorklet, but I'm still waiting for SharedArrayBuffer support. :)

@guest271314

This comment was marked as off-topic.

@guest271314

This comment was marked as off-topic.

@mdjp mdjp transferred this issue from WebAudio/web-audio-api-v2 Sep 23, 2021
@mdjp mdjp added the category: new feature Substantive changes that add new functionality. https://www.w3.org/policies/process/#class-4 label Sep 23, 2021
@emilfihlman
Copy link

It's absurd that this still hasn't been fixed.

The amount of work that needs to be done just to get raw samples from the microphone is absurd, and very prone to introducing bugs and wasting everyone's time.

There is simply no reason why webaudio doesn't directly support getting raw sample chunks.

@padenot
Copy link
Member

padenot commented Nov 27, 2023

This has been possible for years, many web apps are doing it. AudioWorklet is the solution to access audio sample data on the web. If you don't care about performances at all, you can simply postMessage the input chunks and call it a day, interleaving, converting to e.g. S16 samples, and slapping a RIFF header on it.

https://ringbuf-js.netlify.app/example/audioworklet-to-worker/ is a full example that is suited for high-performance workloads, heavily commented, and does Web Audio API -> WAV file. It does so without touching the main thread, so it is robust against load and real-time safe.

ScriptProcessorNode is not a valid solution because there is no resilience against any kind of load when using it. It's trivial to make a page that drops some audio.

Web Codecs is also now available in Chromium and soon others, so sample-type conversion and encoding (to lossless and lossy audio formats) is supported. It's a few lines of code to (e.g.) get real-time microphone data, encode that in real-time to (e.g.) mp3 or aac or opus or flac and do something with it.

I'm closing this because the Web Audio API doesn't really deal with encoding: it's a processing API, and there are solutions already.

@padenot padenot closed this as completed Nov 27, 2023
@emilfihlman
Copy link

emilfihlman commented Nov 27, 2023

Audioworklet is an abysmal "solution", and not simple at all.

I do not want any encoding, I want raw samples, and there is simply no reason why either mediarecorder can't be forced to add support for raw samples, or adding an analyser node to microphone input that gives out chunks on callback without dropping any of them.

The absurd thing is that sending raw samples has been made super easy with buffers, but getting raw samples is put behind absurd complexity, and even more absurd is that one can get time samples, but not with callbacks and guarantee that samples aren't dropped.

@padenot
Copy link
Member

padenot commented Nov 27, 2023

It's so absurdly complex that an entire example to get raw samples is 25 lines of code, with about 50% of the lines being boilerplate.

<script type="worklet">
  registerProcessor('test', class param extends AudioWorkletProcessor {
    constructor() { super(); }
    process(input, output, parameters) {
      this.port.postMessage(input[0]);
      return true;
    }
  });
</script>
<script type=module>
  var ac = new AudioContext;
  var worklet_src = document.querySelector("script[type=worklet]")
  const blob = new Blob([worklet_src.innerText],
                        {type: "application/javascript"});
  var url = URL.createObjectURL(blob);
  await ac.audioWorklet.addModule(url);
  var worklet = new AudioWorkletNode(ac, 'test', {});
  var osc = new OscillatorNode(ac);
  osc.start();
  osc.connect(worklet)
  worklet.connect(ac.destination);
  worklet.port.onmessage = (e) => {
    console.log(e.data[0]);
  }
</script>

@emilfihlman
Copy link

And the reason it couldn't be one line
analyser.addEventListener("dataavailable", callback)
is?

@nickjillings
Copy link

Callbacks on the main thread is a terrible idea for performance and you'll just lock up your app. You'll be getting more callbacks than you need, whilst the audio worklet provides you a thread just to process the audio information that you can process and then send back the information you are processing.

@emilfihlman
Copy link

That's based literally on nothing. At 48kS/s and 1024 samples per callback that's literally only 47 calls per second, much less than typical requestAnimationFrame, which runs at 60 fps usually, or even 120/144fps on modern phones, and usually does a lot more work than what analysing audio requires.

Patronizing or spreading fud is not ok.

@padenot
Copy link
Member

padenot commented Nov 28, 2023

And the reason it couldn't be one line
analyser.addEventListener("dataavailable", callback)
is?

Yes, unconditionally firing an event in an isosynchronous fashion to the main thread from a real-time audio thread with an expectation of real-time guarantees and no drops of audio buffers, with fixed buffering and without a way to handle main thread overload or any other form of control is simply bad design and doesn't work, in the same way that ScriptProcessorNode doesn't work: under ideal and controlled condition it's fine, but in the real world it isn't: main thread load (in our case, the characteristics that is important is its responsiveness) isn't something that can be controlled by the website developer in practice.

If the main thread isn't responsive for some time, you suddenly have a large number of events queued to its event loop. When it's processing events again, it now has to process all those events, loading the main thread again, delaying more events to be delivered, etc.

This is the same reason why requestAnimationFrame(...) isn't an event: avoiding problems piling up under load without a way to handle back-pressure.

In the case of the Web Audio API, developers can instead devise their own mechanism that suit their use-case better, using the lower-level primitive that is the AudioWorklet, or the higher level primitive that is the AnalyzerNode.

If it's for recording and not dropping buffers is important, use a worker and add a buffer there. If it's for visualization, maybe compute the values needed on the real-time audio thread and send the desired characteristics to the main thread, etc.

For that, it's possible to use message passing (postMessage(...)) that is easy to use but less efficient (because it can cause garbage collection and allocations), it's also possible to use real-time safe wait-free ring buffers based on atomics if the needs of the application is higher and its resilience is important.

Finally, AnalyzerNode suits the common case of needed some form of data for analysis, with some windowing but with the explicit non-guarantees about being able to get the entirety of the time-domain data without discontinuities or overlaps.

@goldwaving
Copy link
Author

goldwaving commented Nov 28, 2023

To follow up to my previous post and give some real world feedback...

Safari finally supports AudioWorklets, so I redesigned playback and recording in my app to use them. Unfortunately Safari had a major bug that caused distortion, but that has since been fixed. However audio playback on Safari is still very poor with frequent crackling and glitches when simply tapping the screen, even in the most basic app.

Using AudioWorklets requires more coding and has a steeper learning curve than should be required for such a simple task. It is really just a work-around to overcome the real problem.

The biggest design flaw is that AudioContext is tied to the main thread (which may explain Safari's poor quality). You have to create an AudioContext on the main thread. Playback has to be started, stopped, and managed on the main thread. AudioContext is tangled up in so many other things that prevent it from being available in Workers. If AudioContexts could be created in Workers, it would make this whole argument about ScriptProcessorNode and AudioWorklets irrelevant. There would be zero burden on the main thread.

One a separate note, decodeAudioData does not belong in AudioContext at all. Moving that functionality to the WebCodec API to actually handle containers would make far more sense. Web browser already have all that code to handle many different file types, but it is mostly wasted in the very limited decodeAudioData function.

@padenot
Copy link
Member

padenot commented Nov 28, 2023

Safari finally supports AudioWorklets, so I redesigned playback and recording in my app to use them. Unfortunately Safari had a major bug that caused distortion, but that has since been fixed. However audio playback on Safari is still very poor with frequent crackling and glitches when simply tapping the screen, even in the most basic app.

Using AudioWorklets requires more coding and has a steeper learning curve than should be required for such a simple task. It is really just a work-around to overcome the real problem.

The biggest design flaw is that AudioContext is tied to the main thread (which may explain Safari's poor quality). You have to create an AudioContext on the main thread. Playback has to be started, stopped, and managed on the main thread. AudioContext is tangled up in so many other things that prevent it from being available in Workers. If AudioContexts could be created in Workers, it would make this whole argument about ScriptProcessorNode and AudioWorklets irrelevant. There would be zero burden on the main thread.

Those are simply Safari bugs, that had nothing to do with the fact that you instantiate an AudioContext on the main thread. Case in point, the two other implementations are able to run very heavy workloads at low latency on the same host.

AudioWorkletProcessor's process method is very much the same as you do low-latency high-performance real-time audio in all desktop and mobile platform I know of (and I know all of them that aren't niche): a callback called on a real-time thread that provides input and requests output. Since it's not possible to create a Web Worker that has a real-time priority, being able to instantiate an AudioContext on a worker wouldn't change anything to the fact that ScriptProcessorNode isn't adequate. We're doing it anyway for other reasons (#2423). And even if it would be possible (something we're investigating), it's not OK to force a context switch just to do real-time audio.

One a separate note, decodeAudioData does not belong in AudioContext at all. Moving that functionality to the WebCodec API to actually handle containers would make far more sense. Web browser already have all that code to handle many different file types, but it is mostly wasted in the very limited decodeAudioData function.

decodeAudioData was shipped on the Web for too long before the cross-browser-vendor standardization effort started, and it's impossible to remove it, that would break too many websites. We've been able to remove the version that blocks the main thread until the entire file is decoded at least.

@goldwaving
Copy link
Author

Those are simply Safari bugs

Agreed, but the difficulty Apple is having with playing defect free audio might suggest that WebAudio is more complicated than it should be.

Since it's not possible to create a Web Worker that has a real-time priority, being able to instantiate an AudioContext on a worker wouldn't change anything to the fact that ScriptProcessorNode isn't adequate

In real world apps AudioWorklets have to interact with Workers or the main thread where real-time priority is not available. Without careful implementation, you end up with the same problems as ScriptProcessorNode, but with even more overhead (as in your example above). AudioWorklets may be great for the small percentage of high end developers that need super low latency, real-time audio for a very specific use, but what about everyone else that just wants a simple way to get the audio data? ScriptProcessorNode with a larger buffer size was very adequate.

decodeAudioData was shipped on the Web for too long before the cross-browser-vendor standardization effort started, and it's impossible to remove it

Of course, but the lack of a better API after all of this time is disappointing and if ScriptProcessorNode can be deprecated, so can decodeAudioData (eventually). There is a lot of code locked behind that one function that could be the foundation for an container handling API that WebCodec lacks. However that probably should be discussed in a different topic.

@SevenSystems
Copy link

padenot

Thanks so much for posting this code. It's funny I just spent 1 hour googling how to accomplish THE most basic, fundamental task of an audio recording API -- RECORD RAW AUDIO 😂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: new feature Substantive changes that add new functionality. https://www.w3.org/policies/process/#class-4
Projects
None yet
Development

No branches or pull requests

9 participants