-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Media elements should support a rational time value for seek() #609
Comments
There is an old Bugzilla bug for this, which I'll close and redirect here: |
If the At least the WebM container stores a Basically, would it make sense to expose the resource's |
Apple's media frameworks support seeking to any rational time value. Generally, the final seek value is created by finding the least common denominator between the input value and the media's time scale. If a final value can't be created without losing precision, the value is marked as having been rounded. See CMTime and -[AVPlayer seekToTime:] for examples of platform support. But keep in mind that for MPEG containers, tracks can have different time scales from one another, and different from the movie as a whole. I suspect WebM is similar (though I can't tell from the documentation whether TrackTimecodeScale is a multiplier of the segment TimecodeScale, or an independent value). The Web platform can generally only support a single video track per |
For WebM, the muxer guidelines say "The @tomfinegan @vigneshvg @jzern, I see that you are the main recent contributors to libwebm, can any of you summarize how frame- or sample-accurate seeking in WebM must work, and what the structure of the metadata is? (global? per track? can vary between clusters?) |
The reasoning is listed just below the quote you provided:
It could go on and say "because block timecode's are expressed as signed 16 bit integers relative to the cluster timecode" to make things clearer. Frame accurate seeking:
I'm assuming same-accurate is a typo/autocorrect mishap and it should be sample. It's essentially the same-- just that for audio the marker is set for all blocks, so you can get a little closer to exactly what you want without the pre-roll decoding you end up doing to reach a video non-keyframe. (though one needs to be careful about codecdelay and discard padding when handling Opus audio) Not sure about your metadata question. Are you asking if things like the video frame rate and audio sample rate are non-constant? The video definitely can be (i.e. a webcam feed run through a live encoder is rarely constant frame rate). I don't think a non-constant audio sample rate would work in any player, but I've been wrong before. |
Thanks, @tomfinegan!
I saw this, but didn't realize that blocks and clusters used different representations for the timecodes. I still can't tell from the documentation, but see in libwebm that blocks use
Oops, edited in place.
In essence, is there only a single timescale (recommended to be 1.000.000) across a whole WebM file? It seems so from the documentation, and if is that's good, something like chained Ogg would not have this property I think. Anyway, if the timecode scale is constant across the file, how can one seek to a specific audio sample if the sample rate isn't divisor of 1.000.000? Or indeed the start of a video frame if the source material was 29.97 fps or some such? I'm guessing that times are simply rounded to the closest possible value, but if so it seems tricky for a precise seeking API like we're discussing here to know what rational number will correspond to a specific frame, even assuming one knows what the (constant) framerate of the source material was? @jernoble, how does this work for MPEG? Are audio and video frame time offsets stored as rational numbers, or is it also always converted to a file-wide timescale? |
It's kind of buried on https://www.matroska.org/technical/specs/index.html You'll find the relevant bit in the second row of the block header structure table: It took me a couple minutes to find it again, and I already knew it was there. :) Abusing a quote from the WebM guidelines again:
Basically, the timecode scale of 1.000.000 is a requirement.
Yes, there is rounding of the timestamp values (and the durations) when converting from time-in-{audio-samples|video-frames} to time-in-milliseconds. It is going to be tricky to seek to a precise frame or sample as a result. TBH I don't think frame/sample accurate seeking were primary concerns at the time the guidelines were written. Being off by a frame or a sample usually isn't a big deal in terms of playback, excepting A/V sync wrt entire frames and w/sensitive viewers or very low frame rates. Anyway, when dealing with combined playback of A/V streams most playback solutions I've seen treat the pts as informative but rely on audio hardware clocks for syncing with the video (i.e. read the samples actually played back from the hardware, and then use that to determine when it's time to render a video frame). |
Very interesting, thanks again! I don't know what that means for the HTMLMediaElement feature under consideration here, but it's a safe bet that it's not trivial to get this right across container formats, or even a single one. |
Using floating point time values when seeking is inherently imprecise. Specifically, authors attempting to precisely seek to the beginning of a specific video frame will often find that they have seeked to the end of the previous video frame. This is due to rounding errors when converting from double-width floating point values used by JavaScript to the rational integer values used by media file formats.
Double-width floating point values can accurately represent integers up to 2^52. So a rational integer time value can be represented by two JavaScript numbers so long as those numbers are both smaller than 2^52.
Building off of @foolip's work in issue #553, rational time seeking could be supported by adding an optional
timeScale
parameter to theSeekOptions
, which defaults to1
if absent.An author with a 29.97 fps video file could then accurately seek to the 30th frame by issuing:
Left unspecced for now is how the author would determine the correct time scale for the movie or track. For the current proposal, the time scale could be provided out of band.
The text was updated successfully, but these errors were encountered: