-
Notifications
You must be signed in to change notification settings - Fork 168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AudioBufferSourceNode: Allow for (small) negative offsets in Start() when subsampling #2047
Comments
I'm not sure how this would work in general and be self-consistent. Let's say you called But let's say This also begs the question of what does |
I understand your point (I had the same objections in #2032 (comment) too), but my proposal is more conservative than that. Here's an example that might help my explanation: Suppose buffer.sampleRate = 8kHz, context.sampleRate = 48kHz. Denote by EPS the time between two buffer samples (six ticks, 0.000125s). In this setup, the last buffer sample corresponds to time Similarly, my proposal is to allow
I'm not saying this is worth the trouble, but it would at least make the algorithm symmetric with regards to playback direction. |
I suspect there may be another way to address this.
I think that is a key part of the issue here. Half the energy of the first sample in the buffer is before the time of the first sample and half is after, so centering the first buffer sample on time When the sample rate of the buffer matches that of the AudioContext and I've been assuming that the first sample in the buffer corresponds to buffer playhead position 0 and that the last sample corresponds to playhead position There are similar interpretations possible for the time of the first sample rendered by the AudioContext. Presumably this should be consistent with that of the buffer playhead position. I have been assuming that the first sample rendered by the context corresponds to time zero. However, there is a strong case to indicate that this is wrong.
For zero |
Maybe we can rephrase the question a bit. Let's say the AudioBuffer creates a new (internal) resampled array. Let's also assume that the resampling is done using either a truncated sinc function or a typical linear-phase FIR interpolating (decimation) filter so that you know the precise delay caused by the filter. Apply the filter to the original buffer to create a new buffer at the context rate. We know what the delay is, so drop the samples before the delay, and just keep those samples in the new buffer. Then we can process the AudioBuffer using the new array. If the start time is on a frame boundary, we are done. If not, we can just linearly interpolate, or we can get fancy. Say the requested start time, t0, between n/Fs and (n+1)/Fs. Create a new filter like the original interpolating filter, but adds an additional delay of t0-n/Fs sec. Apply this filter to get a new audio buffer but keep only the samples from (n+1)/Fs and greater. Output this signal at the frame boundaries. I think this approach produces the output you want, and still preserves the fact that for start(t0), the output is 0 for t < t0, and non-zero for t > t0. But fundamentally, the developers who really need this kind of precise output should not depend on WebAudio doing exactly what they want. They should ensure all the buffers have rates that match the cotnext sample rate or the context sample rate should match the buffer rate. They should also not do subsample starts and always start on a frame boundary. |
Those leading samples are there because they contain some of the energy from the buffer signal. If delay is measured to the time corresponding to the center of the first buffer sample (This is not the only interpretation of delay.), dropping the samples before the delay when up-sampling would be dropping much of the energy of the first sample. I would agree though that it is reasonable to expect clients wanting precise output to resample themselves. |
AudioWG call: We discussed this, and, bottom line, this sentence:
more or less summarize the group's position. In this day and age with |
This refers to the discussion at #2032 (comment). In my opinion, @karlt makes a very good point in his comment: Due to subsampling, "interpolating" before the first sample provides (in some cases) meaningful audio content to be played. Consider the case, extracted from buffer-resampling.html, where the buffer's sample rate is 8000Hz and the context's sample rate is 48000Hz. Interpolating, even linearly, before the first sample would give us 5 non-silent samples which improve the audio quality in stitching cases.
For backwards compatibility, the AudioBufferSourceNode can't start playing before its start time, so I am suggesting an opt-in alternative: Allowing for negative offsets (at least those falling between "buffer sample -1" (silence) and buffer sample 0) would have the desired outcome and address @karlt's audio quality concern. In this way, buffer stitching (as in sub-sample-buffer-stitching.html) could be achieved in a robust manner without linear extrapolation at the endpoints. The spec already allows for offsets a bit before after the last buffer sample (see #2032; this behavior is tested in buffer-resampling.html), so this would merely make the playback symmetric with regards to endpoints.
The text was updated successfully, but these errors were encountered: