-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support sample accurate audio splicing using timestampOffset/appendWindowStart/appendWindowEnd #37
Comments
Does the suggested change not fall within the non-normative note already in the spec? (pasted below) |
On reread, this looks like a request to make gapless behavior (as is in Chrome) normative, not non-normative. For v1, this remains a quality-of-implementation issue. @Melatonin64 do you have a better non-normative note that we could put into v1? Per triage process, marking V1NonBlocking to resolve any non-normative note fixes. |
Thanks for your comments. I'm not entirely sure why this requires multiple decoders (for audio codecs, where every frame is a random access point AFAIK). Otherwise, I don't have anything to add to the note. |
It's my understanding this would be implemented post decode. I'm not sure either why the note mentions multiple decoders, but it is not a simple change. |
It is not a simple change. Chrome does this post-decode, with parser-time marking of the encoded frames to assist the post-decode. w.r.t. multiple decoders, at minimum a faster-than-realtime decoder would probably be necessary, if doing the splicing for gapless post-decode at playback time (because some of the decoder outputs from pre- and post- splice will be extra decoder work vs the actual decoded samples kept). Some implementations may not be able to do this faster-than-realtime decode without having more than one decoder (this is my educated guess why that non-normative note is phrased that way). |
Ok, thanks for your comments. I think it's a shame that implementers are not required to implement this. Are there any plans to make this behavior normative in subsequent versions of the spec? |
@Melatonin64 at the moment we are focused on getting MSE v1 spec across the line. A feature request like this imposes constraints that may be too much for some implementations, especially at this point in the spec process. I propose we move this to VNext. |
@wolenetz Ok, thanks. |
IIUC, this is asking for PCM sample-accurate audio splicing, which is definitely a desirable feature. |
Yup, correct! |
See also #165 |
This feature needs to cover the various ways the audio could be spliced after decoding and applying the append window in the decoded domain. For example, if there remains overlap between the old and new audio, what are the options for browsers ? Cross-fading should remain an option. This would allow sites to specify the exact position of the start of the cross-fade within an audio frame. |
This was discussed on today's Media Workgroup call. Since not all implementations may be capable of supporting sample-accurate splicing, perhaps a normative feature detection would be best for apps to adapt appropriately, and then normative behavior for what is necessary for an implementation to support interoperable gapless/sample-accurate splicing. This would enable interop tests, and ideally promote better interop of bytestream-specific metadata interpretations of things like negative timestamps, edit lists, decoder preroll, and decoder delay - all to improve interoperability of especially gapless/sample-accurate implementations. Note that cross-fade at whatever audio splice points result may depend on implementation capability for doing that (for example, currently Chrome doesn't cross-fade, yet does do sample-accurate audio splicing). See #165 for related issue around splice rendering behavior. |
Minutes from 8 Nov 2022 Media WG meeting: https://www.w3.org/2022/11/08-mediawg-minutes.html#t03 |
One of the use cases for sample-accurate-audio-splicing is gapless audio playback (by removing the excess front/back padding added by most audio codecs).
Step 9 of the for loop in the Coded Frame Processing algorithm states:
If frame end timestamp is greater than appendWindowEnd, then set the need random access point flag to true, drop the coded frame, and jump to the top of the loop to start processing the next coded frame.
For audio, this means that we'll be dropping a complete coded frame (i.e. for AAC - 1024 audio samples) even if some of the audio samples would have fallen within the append window.
This granularity (coded frames) is not sufficient in order to achieve gapless audio playback.
It would be great if instead, we could keep the frame, and mark a range of samples to be discarded from frame, so that once the frame is decoded only the samples that fall within the append window will be used.
It's possible that this change alone is not sufficient to support sample-accurate-audio-splicing...
A test webpage can be found here, (based on Dale Curtis' article).
Further discussion can be found here
The text was updated successfully, but these errors were encountered: