This is a proposal for a new web API, Audio Device Client
, which functions as
an intermediate layer between Web Audio API and actual audio devices used by the
browser. It exposes various low-level properties that have been completely
hidden or unavailable to developers.
Web Audio API has been criticized by the lack of ability to run low-level audio processing properly. W3C Audio Working Group addressed the problem with the newly added Audio Worklet, but it is still confined by the boundary of Web Audio API's graph rendering mechanism.
To overcome this limitation, some "pro-audio" web apps were designed with a new processing model and it bypasses the most of Web Audio API's graph system by utilizing Audio Worklet, SharedArrayBuffer and Worker.
This setup is convenient to take advantage of the code compiled into WebAssembly; therefore this new model cuts the engineering cost significantly because audio developers can bring existing source codes (e.g. media encoding/decoding, signal processing or an entire audio application) to the web platform with minimal effort. Additionally, deploying to multiple targets by using the same source code ensures identical sonic results across various platforms.
However, this convoluted workaround only solves some of problems that we face. It is still locked by the graph render quantum size (128 sample-frames), and there is no unified API that controls essential properties of audio system such as I/O device selection, multi-channel I/O support, configurable sample rates and more.
Lastly, the slow adoption of Web Audio API from pro-audio or game industry is the clear evidence of why this problem is important to solve.
- Provides a dedicated global scope on a separate thread (real-time/high-priority when permitted) for audio processing purpose
- Selecting audio I/O devices can be done via
MediaTrackConstraints
pattern - Supports variable callback buffer size (as opposed to 128 sample-frames limitation of Web Audio API)
- Provides implicit sample rate conversion when needed
- Serves I/O audio data from hardware in a single callback function
- Provides an AudioContext instance when requested
- HTTPS only and follows Autoplay policy
- Does NOT replace Web Audio API
- Media encoding/decoding service
- A teleconference app with custom audio processing and direct hardware access (e.g. complex echo cancellation, source separation, audio spatialization, and auditory scene analysis/music information retrieval)
- Porting existing audio engines to the web platform with minimal engineering cost.
- A hybrid audio application that uses Web Audio API's graph system (processing and synthesis) and customized audio hardware configuration.
As discussed, the Audio Device Client will be an intermediate layer between Web Audio API and actual audio devices used by browser's audio service. When it is instantiated, UA will configure hardware accordingly and set up a global scope with an audio rendering thread.
For querying constraints, a pattern of mediaDevices.enumerateDevices()
can
be used. A device client constructor takes a constraint dictionary and generates
an instance when the query is acceptable by UA. Otherwise, the promise will be
rejected.
async () => {
const devices = await navigator.mediaDevices.enumerateDevices();
// Scenario: device #0 and #2 are audio input and output devices respectively.
const constraints = {
inputDeviceId: devices[0].deviceId,
outputDeviceId: devices[2].deviceId,
sampleRate: 8000,
callbackBufferSize: 512,
inputChannelCount: 2,
outputChannelCount: 6,
};
const client =
await navigator.mediaDevices.getAudioDeviceClient(constraints);
await client.addModule('my-client.js');
await client.start();
};
AudioDeviceClientGlobalScope
is similar to AudioWorkletGlobalScope. In the
scope, a callback function should be defined with JS and WASM so it can be
invoked by the user agent periodically (isochronously).
Note that the user code can trigger the associated AudioContext
to render its
graph by calling contextCallback
function. After that, the rendered data from
the context can be processed in the same global scope.
import Processor from './my-audio-processor.js';
// An imaginary function that creates a storage for multi-channel audio data.
const contextOutput = generateFloat32Arrays(2, callbackBufferSize);
// Main process function of device client's global scope.
const process = (input, output, contextCallback) {
// |input| will be routed to Web Audio API's |context.source| node, and
// the result from context renderer will fill up |contextOutput|. With the
// setting of 512 sample frames, this will pull the graph 4 times.
contextCallback(input, contextOutput);
// Takes the data rendered by AudioContext and perform custom processing to
// fill out |output| data.
Processor.process(contextOutput, output);
};
setDeviceCallback(process);
Lastly, an instance of AudioContext
can be obtained from the device client.
This is optional and user can choose not to use AudioContext
at all.
// A client can instantiate an AudioContext.
const audioContext = client.getContext();
const oscillator = new OscillatorNode(audioContext);
oscillator.connect(audioContext.destination);
oscillator.start();
This idea was quickly turned down by Audio Working Group because it changes the fundamental of Web Audio API implementation. Also this does not address problems outside of Web Audio API such as audio hardware I/O configuration.
The Audio Device Client brings the low-level audio functionality to the web platform without unnecessary complexity. It provides closer access to audio hardware with configurable parameters such as sample rate, callback buffer size and channel count. The user code can process data in an isolated global scope that runs one a dedicated thread with high priority when permitted. This new design solves many issues that Web Audio API currently faces by exposing a fundamental layer of audio system via an easy and safe path for developers.