Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

change units of timestamp or rename to timestampMicros #122

Closed
dogben opened this issue Jan 14, 2021 · 23 comments · Fixed by #140 or #248
Closed

change units of timestamp or rename to timestampMicros #122

dogben opened this issue Jan 14, 2021 · 23 comments · Fixed by #140 or #248
Labels
breaking Interface changes that would break current usage (producing errors or undesired behavior). tag-tracker Group bringing to attention of the TAG, or tracked by the TAG but not needing response.

Comments

@dogben
Copy link

dogben commented Jan 14, 2021

Nearly every existing web API dealing with timestamps uses DOMTimeStamp or DOMHighResTimeStamp, both of which are defined to be in units of milliseconds. However, the timestamp attribute found throughout the WebCodecs spec has units of microseconds. This is quite confusing and has resulted in several bugs in my use of WebCodecs.

I suggest either changing the units of timestamp to DOMHighResTimeStamp or renaming the attribute timestampMicros to make it clear that it is different from other timestamps.

I'll also point out that HTMLMediaElement.currentTime is in units of seconds. The timestamp attribute of WebCodecs is analogous to the currentTime of a media element, so using seconds as the unit could also be an option.

@sandersdan
Copy link
Contributor

There was some discussion about this in #52, and the issue is under current discussion. I'll post updates here.

@chcunningham
Copy link
Collaborator

After discussion w/ Dan, we lean toward DOMHighResTimeStamp. The current choice of integer microseconds was made mostly because thats what we used internally and we had vague worries about floating point losing precision (maybe manifesting as av sync issues or audio glitches down the road). Dan did some analysis to find that we get ~285 years of microseconds before that starts to manifest though. @sandersdan - can you give us the details of that calc?

@padenot FYI

@sandersdan
Copy link
Contributor

sandersdan commented Jan 14, 2021

FYI there is a distinction between DOMTimeStamp and DOMHighResTimeStamp (thanks @dogben, I hadn't noticed DOMTimeStamp!):

  • DOMTimeStamp: "number of milliseconds, either as an absolute time ... or as a relative amount of time"
  • DOMHighResTimeStamp: "a time value measured relative to the navigationStart attribute ... or a time value that represents a duration between two DOMHighResTimeStamps"

We are not using the document epoch, so we should prefer DOMTimeStamp. A different argument could be made for capture timestamps, though.

The rough calculation is that a double has 53 bits of precision, and 2^53 microseconds is 285 years. Thus we can expect 285 years of at least microsecond precision from whatever epoch is being used, which is usually 0 at the start of a media file, but could commonly be the Unix epoch (1970) for live media cases.

If microsecond precision is required in a guaranteed-rounding sense, I'd half that, so one file could be 142 years long with microsecond precision or a Unix timestamped stream could cover 1970--2112 (also earlier using negative timestamps).

For comparison, our long long microseconds approach covers about 300,000 years at exactly microsecond precision, but in JS it's represented as a double and as a result has similar precision limits to DOMTimeStamp.

@dogben
Copy link
Author

dogben commented Jan 15, 2021

DOMTimeStamp seems to be an integral type.

@sandersdan
Copy link
Contributor

Oops, you're right. Hmm. Might need to talk to the hr-time folks to see if our use is acceptable without sharing an epoch.

@sandersdan
Copy link
Contributor

Based on w3c/hr-time#104 it does not appear there is any concern with using DOMHighResTimeStamp without an explicit epoch.

@padenot
Copy link
Collaborator

padenot commented Feb 16, 2021

What about audio alignment and rounding? Experience shows that using floating point for time when doing audio work cannot really work. In the general case, people are expected to integrate over buffer durations in sample-frames.

Generally, media APIs deal with integer time because it solves a whole class rounding bugs, but this is js, is there anything we can do ?

@sandersdan
Copy link
Contributor

sandersdan commented Feb 16, 2021

The best idea I have for these cases to to build a generic metadata API, so that apps can tag their own metadata into chunks and frames. Then apps can store timestamps in whatever format they want.

You can also just store integers in a DOMHighResTimeStamp, but doing so may interfere with rate control algorithms in encoders. Hopefully we can specify that in a bits-per-frame rather than bits-per-second way to avoid this limitation.

@padenot
Copy link
Collaborator

padenot commented Feb 17, 2021

Or we can leave everything as integer microseconds.

@sandersdan
Copy link
Contributor

sandersdan commented Feb 17, 2021

When timestamp < 275 years, DOMHighResTimeStamp has higher precision than integer microseconds. We can remain interoperable with other web specs and apps that want exact microseconds can use micros = Math.round(1000 * frame.timestamp).

@padenot
Copy link
Collaborator

padenot commented Feb 18, 2021

Yes, this is the annoying part. For audio, this is the kind of design that will cause bugs, because people will use the timestamp instead of building a clock with the buffer sizes, and won't round properly. But it's not like we can remove it either.

@sandersdan
Copy link
Contributor

Makes sense, it can be hard to teach this.

My position is that microseconds are just as arbitrary as float milliseconds. If we could have rational seconds (timestamp + timebase), that would be superior for media, but I didn't find a way to do that that was ergonomic enough to justify.

I doubt a note in the spec that recommends counting audio in samples would be sufficient, but it could be a start.

Maybe we could convince the world to use a timebase that is a power of two, then floats are exact! 😂

@tidoust
Copy link
Member

tidoust commented Feb 19, 2021

My position is that microseconds are just as arbitrary as float milliseconds. If we could have rational seconds (timestamp + timebase), that would be superior for media, but I didn't find a way to do that that was ergonomic enough to justify.

The need to use rational numbers keeps being raised in media discussions. Past discussions include:

Conclusion seems to always be: that is more complicated than it seems, JS does not have a rational type, API would be less straightforward as a result, plus the ship has sailed... but we still need a solution with rational numbers "at some point".

WebCodecs is lower-level than other media APIs and has not shipped yet. Perhaps that's the right level and time to introduce to such a mechanism if we know that rounding errors are going to bite.

@chrisn
Copy link
Member

chrisn commented Feb 19, 2021

Perhaps that's the right level and time to introduce to such a mechanism if we know that rounding errors are going to bite.

I did a quick search for previous discussions in TC39 and only found this, in related to a Decimal proposal: tc39/proposal-decimal#6

@padenot
Copy link
Collaborator

padenot commented Feb 19, 2021

This will be critical when building professional media applications on the Web, which is supposed to be possible. When you're looking at a frame, it's necessary to know exactly what frame it is without ambiguity, and be able to do frame accurate seeking and presentation, and build a video editing timeline.

Looking around (ffmpeg, apple APIs, windows APIs, gstreamer), it's all properly done. Now is the right time to do this properly, or we'll be doing a new API that is unfit for its stated use-case.

@sandersdan
Copy link
Contributor

sandersdan commented Feb 19, 2021

This is hyperbolic, I think we need to consider the actual practical implications when evaluating solutions.

When you're looking at a frame, it's necessary to know exactly what frame it is without ambiguity

This seems to imply that floats are ambiguous. They are not.

Looking around (ffmpeg, apple APIs, windows APIs, gstreamer), it's all properly done.

Android's MediaCodec uses integer microseconds, which is arbitrary.

Microsoft's MediaFoundation uses integer tenths-of-microseconds, which is arbitrary. The underlying D3D11VideoContext doesn't deal with timestamps at all.

FFmpeg and Apple's CMTime use rational time, as you say.

Now is the right time to do this properly, or we'll be doing a new API that is unfit for its stated use-case.

Existing APIs differ, yet all are used in professional media applications. The solutions to these problems are well-known.

I'd like to build something as nice as is possible, but we're not going to be unfit-for-purpose with any of the proposed solutions.

@sandersdan
Copy link
Contributor

Just clarifying my position:

  • There doesn't seem to be an ideal solution.
  • Currently Chrome's WebCodecs implementation is using integer microseconds, which may be acceptable if we rename the field to timestampMicroseconds/timestampMicros/timestampUs.
  • DOMHighResTimestamp seems to be a good choice as it is about the right precision and is familiar to web developers. There is some concern that media developers will not comfortable with correctly handling variable precision.
  • We could add a timebase field, giving us rational time. It's not very ergonomic but it doesn't seem contentious.

From an implementation point-of-view, integer microseconds is by far the easiest to implement in Chrome, as that's what our internal media timestamps already are. Integer microseconds appear to be well-supported by all platform APIs we've looked at, and so are likely easy for other implementations to use also.

@chcunningham chcunningham added the tag-tracker Group bringing to attention of the TAG, or tracked by the TAG but not needing response. label Mar 12, 2021
@padenot
Copy link
Collaborator

padenot commented Mar 25, 2021

Microsoft's MediaFoundation uses integer tenths-of-microseconds

It depends on what we're talking about. It uses rational for frame rates, and hns on frames themselves, with an entire page explaining how to properly round during computations when implementing an MFT.

Here, we want to have a rate in video decoder config (I just noticed there is no rate on VideoDecoderConfig !?) that is rational, and then have timestamps that are in integers microseconds OR rational. I prefer rational, because it's simpler to work with in the case of media that have a constant frame rate, but as you say integer microseconds or hns work, and it's more ergonomic when the media is (possibly) variable frame rate (where I assume we'd put a denominator that is 1). Milliseconds is too coarse, and floating point will be rounded incorrectly.

I think my comment was essentially lumping together the (related) issue of expressing the frame rate of a video stream (which is a problem in itself) and the timestamp on the frames, and those are two different issues, which I'm going to split off now, apologies.

@chcunningham
Copy link
Collaborator

Conclusions from editors call.

  • change to int64 to allow passthrough of negative timestamps (fairly common in various container formats)
  • continue using microseconds for now. leaves open possibility of adding timebase rational later, for which default is still backward compatible micros.
  • naming: keep timestamp, given flexibility to pursue adding timebase later if needed. will pursue additional documentation in spec / impl (e.g. warnings if your timestamps look like seconds)

@chcunningham
Copy link
Collaborator

change to int64 to allow passthrough of negative timestamps (fairly common in various container formats)

leaving issue open for now to track this change.

@chcunningham
Copy link
Collaborator

I'm updating the TAG review to note the resolution here (int64 microseconds). To ease their review, here's a quick summary of our path to that outcome:

  • We initially used unsigned microseconds without much thought. This matches what Chrome does internally.
  • This issue highlighted that DOMHighResTimestamp is probably more consistent with other areas of the platform. This includes media APIs like WebAudio and WebRTC.
  • Chrome (sandersdan@) suggested DOMHighResTimestamp could work .
  • Firefox (padenot@) disagreed. He discussed in more in depth over a call:
    • Media people are uncomfortable with floating point
      • It raises questions about exact vs epsilon comparison
      • It makes it harder to reason about error bounds
    • Floating point is not precise enough
      • Media frame rates are often weird rational numbers like 60 Hz*1000/1001. He gave the example of computing timestamps by summing frame durations a double the above rational rate. He argued that, over the course of a movie, enough error could accumulate to produce avsync drift.
  • The purest media representation is to use a rational, where the denominator (timebase) could be some multiple of 1001 in the example above and the division is done lazily. The web doesn't have a rational type, and we feel this is probably overkill anyway. Remember: both firefox and chrome just use microseconds internally... much easier, and working well.
  • In the end, we settled on integer microseconds because it was something everyone was comfortable with and it leaves open the door to later adding a timebase if we need.
  • We decided to deviate slightly from the original unsigned int64 -> signed int64, to allow passthrough of negative timestamps (fairly common in various container formats)

@annevk
Copy link
Member

annevk commented May 7, 2021

I would recommend giving feedback on tc39/proposal-decimal#6 if rational (and not decimal) would be useful, even if only in the future. If we're adding another number type at some point, let's ensure it works for media.

@chcunningham
Copy link
Collaborator

Triage note: marked 'breaking', since changing the type can theoretically break. Having said that, Chrome has already updated the implementation and probably no one was broken anyway (ts values in range of unsigned by not signed would generally not expected).

I'll have a PR out for this shortly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking Interface changes that would break current usage (producing errors or undesired behavior). tag-tracker Group bringing to attention of the TAG, or tracked by the TAG but not needing response.
Projects
None yet
7 participants