-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Frame accurate seeking of HTML5 MediaElement #4
Comments
Yes. Similar discussions happened during the MSE project: https://www.w3.org/Bugs/Public/show_bug.cgi?id=19676 |
There's some interesting research here, with a survey of current browser behaviour.
|
I should also mention that there is some uncertainty about the precise meaning of currentTime - particularly when you have a media pipeline where the frame/sample coming out of the end may be 0.5s further along the media timeline than the ones entering the media pipeline. Some people think currentTime reflects what is coming out of the display/speakers/headphones. Some people think it should reflect the time were video and graphics are composited as this is easy to test and suits apps trying to sync graphics to video or audio. Simple implementations may re-use a time available in a media decoder. |
Related to the matter of frame accuracy on the whole, one idea would be to add a new property to VideoElement called |
The currently displayed frame can be hard to determine, e.g. if the UA is running on a device without a display with video being output over HDMI or (perhaps) a remote playback scenario ( https://w3c.github.io/remote-playback/ ). |
Remote playback cases are always going to be best effort to keep the video element in sync with the remote playback state. For video editing use cases, remote playback is not as relevant (except maybe to render the final output). There are a number of implementation constraints that are going to make it challenging to provide a completely accurate instantaneous frame number or presentation timestamp in a modern browser during video playback.
Some estimates could be made based on knowing the latency of the downstream pipeline. It might be more useful to surface the last presentation timestamp submitted to the renderer and the estimated latency until frame paint. It may also be more feasible to surface the final presentation timestamp/time code when a seek is completed. That seems more useful from a video editing use case. Understanding the use cases here and what exactly you need know would help guide concrete feedback from browsers. |
One of the main use cases for me would be the ability to synchronize content changes outside video to frame changes in the video. As a simple example, the test case in the frame-accurate-ish repo shows this with the background color change. In my case the main thing would be the ability to accurate synchronize custom subtitle rendering with frame changes. Being even one or two screen refreshes off becomes a notable issue when you want to ensure subtitles appearing/disappearing with scene changes - even a frame or two of subtitles hanging on the screen after a scene change happens is very much notable and ugly to look at during playback. |
It depends on the inputs to the custom subtitle rendering algorithm. How do you determine when to render a text cue? |
Currently, I'm using |
Perhaps @palemieux could comment on how the imsc.js library handles this? |
This highlights the importance of being clear what currentTime means as hardware-based implementations or devices outputting via HDMI may have several frames difference between the media time of the frame being output from the display and the frame being composited with graphics. |
With the timingsrc [1] library we are able to sync content changes outside the video with errors <10ms (less than a frame). The library achieves this by
This still leaves delays from DOM changes to on-screen rendering. In any case, this should typically be sub-framerate sync. This assumes that currentTime is a good representation of the reality of video presentation. If it isn't, but you know how wrong it is, you can easily compensate. Not sure if this is relevant to the original issue, which I understood to be about accurate frame stepping - not sync during playback? Ingar Arntzen |
@jpiesing I can't speak for @palemieux obviously but my understanding is that imsc.js does not play back video and therefore does not do any alignment; it merely identifies the times at which the presentation should change. However it is integrated into the dash.js player which does need to synchronise the subtitle presentation with the media. I believe it uses Text Track Cues, and from what I've seen they can be up to 250ms late depending on when the Time Marches On algorithm happens to be run, which can be as infrequent as every 250ms, and in my experience often is. As @Daiz points out, that's not nearly accurate enough. |
What @nigelmegitt said :) What is needed is a means of displaying/hiding HTML (or TTML) snippets at precise offsets on the media timeline. |
@palemieux this is exactly what I described above. The sequencer of the timingsrc library does this. It may be used with any data, including HTML or TTML. |
@ingararntzen It is a different use case, but a good one nonetheless. Presumably, frame accurate time reporting would help with synchronised media playback across multiple devices, particularly where different browser engines are involved, each with a different pipeline delay. But, you say you're already achieving sub-frame rate sync in your library, based on currentTime, so maybe not? |
@ingararntzen forgive my lack of detailed knowledge, but the approach you describe does raise some questions at least in my mind:
Just questions for my understanding, I'm not trying to be negative! |
On the matter of "sub-framerate sync", I would like to point out that for the purposes of high quality media playback, this is not enough. Things like subtitle scene bleeds (where a cue remains visible after a scene change occurs in the video) are noticeable and ugly even if they remain on-screen for just an extra 15-30 milliseconds (ie. less than a single 24FPS frame, which is ~42ms) after a scene change occurs. Again, you can clearly see this yourself with the background color change in this test case (which has various tricks applied to increase accuracy) - it is very clear when the sync is even slightly off. Desktop video playback software outside browsers do not have issues in this regard, and I would really like to be able to replicate that on the web as well. |
@nigelmegitt These are excellent questions, thank you 👍
yes. the sequencer is separate from the media element (which also means that you can use it for use cases where you don't have a media element). It takes direction from a timing object, which is basically just a thin wrapper around the system clock. The sequencer uses <setTimeout()> to schedule enter/exit events at the correct time.
Being run in the js environment, sequencer timeouts may be subject to delay if there are many other activities going on (just like any appcode). The sequencer guarantees the correct ordering, and will report how much it was delayed. It something like the sequencer was implemented by browsers natively, this situation could be improved further I suppose. The sequencer itself is light-weight, and you may use multiple for different data sources and/or different timing objects.
Excellent question! The model does not mandate one or the other. You may 1) continuously update the timing object from the currentTime, or 2) you may continuously monitor and adjust currentTime to match the timing object (e.g. using variable playbackrate). Method 1) is fine if you only have one media element, you are doing sync only within one webpage, and you are ok with letting the media element be the master of whatever else you want to synchronize. In other scenarios you'll need method 2), for at least (N-1) synchronized things. We use method 1) only occasionally. The timingsrc has a mediasync function for method 2) and a reversesync function for method 1) (...I think)
The short answer: using mediasync or reversesync you don't have to think about that, it's all taken care of. Some more details: |
So, while the results are pretty good, there is no way to ensure that they are always that good (or that they will stay this good), unless these issues are put on the agenda through standardization work. There are a number of ways to improve/simplify sync.
|
@ingararntzen in this forum we certainly do want to think about the details of how the thing works so we can assure ourselves that eventual users genuinely do not have to think about them. Having been "bitten" by the impact of timeupdate and Time Marches On we need to get it right next time! |
Having noted that Time Marches On can conformantly not be run frequently enough to meet subtitle and caption use cases, it does have a lot of other things going for it, like smooth handling of events that take too long to process. In the spirit of making the smallest change possible to resolve it, here's an alternative proposal:
I would expect that to be enough to get frame accuracy at 25fps. |
@nigelmegitt - sure thing - I was more thinking of the end user here - not you guys :) If you want me to go more into details that's ok too :) |
Assuming that framerates are uniform is going to go astray at some point, as mp4 can contain media with different rates. Walking through the media rates and getting fame times is going to give you glitches with longer files If you want to construct an API like this I'd suggest mirroring what QuickTime did - this had 2 parts: the movie export API, which would give you callbacks for each frame rendered in sequence, telling you the media and movie times. Mozilla did make seekToNextFrame, but that was deprecated: |
@Diaz For your purposes, is it more important to have a frame counter, or an accurate currentTime? |
@mfoltzgoogle That depends - what exactly do you mean by a frame counter? As in, a value that would tell me the absolute frame number of the currently displayed frame, like if I have a 40000 frame long video with a constant frame rate of 23.976 FPS, and when currentTime is about 00:12:34.567 (754.567s), this hypothetical frame counter would have a value of 18091? This would most certainly work be useful for me. To reiterate, for me the most important use case for frame accuracy right now would be to accurately snap subtitle cue changes to frame changes. A frame counter like described above would definitely work for this. Though since I personally work on premium VOD content where I'm in full control of the content pipeline, accurate currentTime (assuming that it means that with a constant frame rate / full frame rate information I would be able to reliably calculate the currently displayed frame number) would also work. But I think the kind of frame counter described above would be a better fit as more general purpose functionality. |
We would need to consider skipped frames, buffering states, splicing MSE buffers, and variable FPS video to nail down the algorithm to advance the "frame counter", but let's go with that as a straw-man. Say, adding a When you observe the |
@mfoltzgoogle Instead of a "frame counter", which is video-centric, I would consider adding a combination of |
For frame accuracy purposes, it should obviously correspond to the currently displayed frame on the screen. Also, something that I wanted to say is I understand that there's a lot of additional complexity to this subject under various playback scenarios and that it's probably not possible to guarantee frame accuracy under all scenarios. However, I don't think should stop us from pursuing frame accuracy where it would indeed be possible. Like if I have just a normal browser window in full control of video playback playing video on a normal screen attached to my computer, even having frame accuracy just there alone would be a huge win in my books. |
@kevinmarks-b "media time" is also used elsewhere as a generic term for "the timeline related to the media", independently of the syntax used, i.e. it can be expressed as an arbitrary fraction or a number of frames etc, for example in TTML. |
I've seen this issue with MP3 before and it is always with the VBR ones, CBR worked fine (but most MP3 are VBR)
|
Thanks for the extra analysis @Laurian . I suspect you're right that MP3 is a particular offender, but we should not focus on one format specifically, but on the more general problem that for some media encodings it can be difficult to seek accurately, and look for a solution that might work more widely. Typically I think implementers have gone down the route of finding some detailed specifications of media types that work for their particular application. In the web context it seems to me that we need something that would work widely. The two approaches I can think of so far that might work are:
|
Nigel, I'm not seeing the difference in the demo you shared. With either MP3 or WAV selected, playback starts at time zero. I must be doing something wrong..? |
@chrisn listen to the audio description clips as they play back - the words you hear should match the text that shows under the video area, but they don't, especially for the MP3 version. |
@nigelmegitt good work! I can't find the BBC Adhere repo anymore.. was it moved or removed? |
@giuliogatto unfortunately the repo itself is still not open - we're tidying some bits up before making it open source, so please bear with us. It's taking us a while to get around to alongside other priorities 😔 |
@nigelmegitt ok thanks! Keep up the good work! |
@Daiz I saw a new methods here: https://stackoverflow.com/questions/60645390/nodejs-ffmpeg-play-video-at-specific-time-and-stream-it-to-client How
Do you think it's possible to use this way to achive subtitle display (with near-perfect sync)? I haven't experiment this myself, I was thinking build this project: https://github.com/1c7/Subtitle-Timeline-Editor/blob/master/README-in-English.mdin Swift & OC & SwiftUI for mac-only desktop app |
One more possible way do to it. (for Desktop)If building a desktop app with electron.js node-mpv can be used to control a local version so load subtitle and display subtitle is doable (.ass is fine) Node.js codeconst mpvAPI = require('node-mpv');
const mpv = new mpvAPI({},
[
"--autofit=50%", // initial windows size
]);
mpv.start()
.then(() => {
// video
return mpv.load('/Users/remote_edit/Documents/1111.mp4')
})
.then(() => {
// subtitle
return mpv.addSubtitles('/Users/remote_edit/Documents/1111.ass')
})
.then(() => {
return mpv
})
// this catches every arror from above
.catch((error) => {
console.log(error);
});
// This will bind this function to the stopped event
mpv.on('stopped', () => {
console.log("Your favorite song just finished, let's start it again!");
// mpv.loadFile('/path/to/your/favorite/song.mp3');
}); package.json
Conclusion
|
Apologies, forgot to update this thread: the library part of the Adhere project was moved to https://github.com/bbc/adhere-lib/ so that we could open it up. |
Just, to make it clear, if I do |
As a next step, I suggest that we summarise this thread into a short document that covers the use cases and current limitations. It should take into account what can be achieved using new APIs such as WebCodecs and requestVideoFrameCallback, and be based on practical experience. This thread includes discussion of frame accurate seeking and frame accurate rendering of content, so I suggest that the document includes both, for completeness. Is anyone interested in helping to do this? Specifically, we'd be looking for someone who could edit such a document. |
This would be really cool to have guarantees on how to reach a special frame. For instance, I was thinking that:
was always reaching the accurate frame... But it turns out it's not! (at least not using Chromium 95.0) Sometimes, I need a larger value for the additional term, like at least on one frame, I needed to do:
(this appear to fail for me when trying to reach for instance the frame 1949 of a 24fps video) similarly, reading out the current time. Edit: similarly, reading this.video.currentTime (even when paused using |
It's way worse for me. I've made a 20 fps testing video, where seeking This makes it impossible to implement frame accurate video cutting/editing tools, as the time ffmpeg needs to seek to a frame is always a lot different than what video element needs to display it, and trying to add or subtract these arbitrary numbers just feels like a different kind of footgun. |
What is the state of the art of this? Can this currently be achieved only with WEBCodecs API? |
Here is an approach that uses |
Is that one really working?
|
The technique by @mzur leads to better accuracy, but in our experience doesn't lead to perfect results always either. |
I've heard a couple of companies point out that one of the problems that makes it hard (at least harder than it could be) to do post-production of videos in Web browsers is that there is no easy way to process media elements on a frame by frame basis, whereas that is the usual default in Non-Linear Editors (NLE).
The
currentTime
property takes a time, not a frame number or an SMPTE timecode. Conversion from/to times to/from frame numbers is doable but supposes one knows the framerate of the video, which is not exposed to Web applications (a generic NLE would thus not know about it). Plus that framerate may actually vary over time.Also, internal rounding of time values may mean that one seeks to the end of the previous frame instead of the beginning of a specific video frame.
Digging around, I've found a number of discussions and issues around the topic, most notably:
https://lists.w3.org/Archives/Public/public-whatwg-archive/2011Jan/0120.html
https://www.w3.org/Bugs/Public/show_bug.cgi?id=22678
https://www.w3.org/Bugs/Public/show_bug.cgi?id=8278#c3
Media elements should support a rational time value for seek() whatwg/html#609
There have probably been other discussions around the topic.
I'm raising this issue to collect practical use cases and requirements for the feature, and gauge interest from media companies to see a solution emerge. It would be good to precisely identify what does not work today, what minimal updates to media elements could solve the issue, and what these updates would imply from an implementation perspective.
The text was updated successfully, but these errors were encountered: