-
Notifications
You must be signed in to change notification settings - Fork 168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AudioBufferSourceNode: How to interpolate after last sample? #2032
Comments
Teleconf: Firefox and Chrome pass the buffer-resampling test, so we should spec that behavior |
buffer-resampling.html doesn't expect that the value of the last sample is used for the whole subsequent interval. It expects that interpolation after the last sample of one buffer is consistent with the interpolation before the first sample of the subsequent (adjacent) buffer. Interpolation after the last sample should assume that the imaginary next sample in the buffer is zero, just like interpolation before the first sample should assume that the imaginary previous sample in the buffer was zero. i.e. it would be correct to interpolate with silence. |
I understand where your intuition comes from, but there is an inherent asymmetry: Whenever playback starts, it starts at least from the first sample in the buffer (maybe a bit later if the start time is subsample-accurate). Therefore, there is no situation in which we interpolate with an imaginary previous sample if we are linearly interpolating. But the plot thickens! I just implemented "use the last sample for the whole subsequent interval" in Servo and this does not make the test pass:
Apparently Blink extrapolates from the last two samples (this behavior was added along with buffer-resampling.html in [0]), and Firefox uses an interpolation algorithm that is better than linear (via libspeex). So buffer-resampling.html seems to require a better algorithm for the last interval. |
That asymmetry is a problem when upsampling as in this test (and also if applied to non-sample-aligned start times). The noise threshold of 0.09 seems fairly high, so I wonder if that is involved. In buffer-resampling.html, the buffer has a sample rate of 8000 Hz while the context is rendering at 48000 Hz. The first sample in the buffer represents a sinc function with frequency 8000Hz. If that is centered on the start time, then there would be a few significantly non-zero 48000 Hz rendering samples before the start time. I found https://www.psaudio.com/article/cardinal-sinc/ helpful. |
This is definitely an issue from an audio quality standpoint, but outputting samples before the Node's start time violates lines 95--96 of the playback algorithm (besides being a bit counter-intuitive). This would be impossible to implement if the Node is to start playing immediately, but I agree with you: this should be considered for nodes scheduled for playing in the future. I filed issue #2047 for tracking a specific way of implementing this.
Personally, I agree that linear interpolation is a bad idea due to physical/DSP considerations (and I plan on using libspeex in Servo too once I decide on how to handle the case where the loop length is not a multiple of the "buffer offset per tick"), and I would be happy to see the spec mandating better-than-linear interpolation. But I feel the issue you raise is mostly orthogonal to the present issue, because the spec strongly suggests linear interpolation is a valid implementation strategy. The spec says "may require additional interpolation between sample frames" (emphasis mine), which in my opinion requires clarification. From reading the spec, it didn't occur to me that linear extrapolation (or better, such as sinc interpolation) would be required instead of just desirable. |
I had to check the code to see what Chrome is doing. See https://cs.chromium.org/chromium/src/third_party/blink/renderer/modules/webaudio/audio_buffer_source_node.cc?rcl=d0788ba8029af2c73443ef598ed5871f1cc44450&l=348 Based on the comment there, it's linearly extrapolating the last two samples to find the output sample. I guess that's kind of reasonable since you don't know what the following value would be since you're at the end of the buffer. |
See also WebAudio/web-audio-api-v2#38. |
AudioWG call: We're going to do a fix for this bit in bold, but it is indeed related to WebAudio/web-audio-api-v2#38, that we'll get clarified in V2. |
A little more info from the call. We'll probably say it's extrapolated, but leave the extrapolation method unspecified. Simple justification: if you're doing buffer stitching and have ABSNs that are contiguous parts of a large audio source where all the pieces are basically continuous, then extrapolation will produce a value that is close to the next value from the next buffer. If you interpolate between the last sample and zero, the output will probably differ quite a bit from the next buffer value unless it happened to be 0. |
(Also) related v2 issue: https://github.com/WebAudio/web-audio-api-v2/issues/25 |
Let's see how this goes. Assume an Let Assume the user calls
I think the above is straightforward. But what about There were two options here:
The conclusion from the teleconf was to extrapolate to produce this output. So frames 1 and 2 (and possibly more) are used to extrapolate an appropriate output value. Whenever we run out of data but need one more sample we extrapolate from previous values. This includes the case where the sample rates are different or the I don't intend to put this much detail into the spec; I think we can just say that if any of the following holds for a non-looping source
then last output value is extrapolated from the last values of the buffer. The extrapolation method is implementation-dependent. |
Update `playbackSignal` to mention that extrapolation is used to compute the output sample when the buffer is not looping and we're at the end of the buffer but need output a sample after the end of the buffer but before the next output sample frame.
The case for this is even stronger with If, however, the first sample is interpolated with zero, then the last sample can be interpolated with zero, which provides consistent interpolation between contiguous buffers. If the playback algorithm doesn't support this, then it is not conforming to the stated principles "Sub-sample start offsets or loop points may require additional interpolation between sample frames" and "Resampling of the buffer may be performed arbitrarily by the UA at any desired point to increase the efficiency or quality of the output." |
Describe the issue
The spec requires interpolation for playhead positions not corresponding to sampled times. Therefore, after each sample, there is an interval of duration 1/buffer.sampleRate corresponding to valid playhead positions which must be interpolated. This is also true for the interval after the last sample, but the spec doesn't specify how to produce the interpolated values for this region (since there is no next point to interpolate with).
This affects existing WPT tests: buffer-resampling.html seems to assume that the value of the last sample should be used for the whole interval, but another option is to interpolate with silence.
Where Is It
https://webaudio.github.io/web-audio-api/#playback-AudioBufferSourceNode -- more precisely, "Sub-sample start offsets or loop points may require additional interpolation between sample frames" does not cover the last interval described above, because it is not between sample frames.
Additional Information
This is not related to looping, and in fact only applies when looping is disabled.
The text was updated successfully, but these errors were encountered: