Skip to content

Latest commit

 

History

History
92 lines (49 loc) · 8.06 KB

requirements.md

File metadata and controls

92 lines (49 loc) · 8.06 KB

DataCue API Requirements

Introduction

There is a need in the media industry for an API to support arbitrary data associated with points in time or periods of time in a continuous media (audio or video) presentation. This data may include:

  • Metadata that describes the content in some way, such as program or chapter titles, geolocation information, often referred to as timed metadata, used to drive an interactive media experience
  • Control messages for the media player that are expected to take effect at specific times during media playback, such as ad insertion cues

This document presents the use cases and technical requirements for an API that supports media-timed metadata and event cues.

Use cases

Dynamic content insertion

A media content provider wants to allow insertion of content, such as personalised video, local news, or advertisements, into a video media stream that contains the main program content. To achieve this, timed metadata is used to describe the points on the media timeline, known as splice points, where switching playback to inserted content is possible.

SCTE 35 defines a data cue format for describing such insertion points. Use of these cues in MPEG-DASH streams is described in SCTE 214-1, SCTE 214-2, and SCTE 214-3. Use in HLS streams is described in SCTE-35 section 12.2.

Media player control messages

MPEG-DASH defines several control messages for media streaming clients (e.g., libraries such as dash.js). Control messages exist for several scenarios, such as:

  • The media player should refresh or update its copy of the manifest document (MPD)
  • The media player should make an HTTP request to a given URL for analytics purposes
  • The media presentation will end at a time earlier than expected

These messages may be carried as in-band emsg events in the media container files.

Media stream with video and synchronized graphics

A content provider wants to provide synchronized graphical elements that may be rendered next to or on top of a video.

For example, in a talk show this could be a banner, shown in the lower third of the video, that displays the name of the guest. In a sports event, the graphics could show the latest lap times or current score, or highlight the location of the current active player. It could even be a full-screen overlay, to blend from one part of the program to another.

The graphical elements are described in a stream or file containing cues that describe the start and end time of each graphical element, similar to a subtitle stream or file. The web application takes this data as input and renders it on top of the video image according to the cues.

The purpose of rendering the graphical elements on the client device, rather than rendering them directly into the video image, is to allow the graphics to be optimized for the device's display parameters, such as aspect ratio and orientation. Another use case is adapting to user preferences, for localization or to improve accessibility.

This use case requires frame accurate synchronization of the content being rendered over the video.

Limitations of existing solutions

Today, most media player libraries include support for timed metadata. Support varies between players, with some supporting only HLS timed metadata, e.g., JWPlayer, others having support for DASH emsg boxes, such as DASH.js and some that support both, e.g., Shaka Player. Video.js can be used with mux.js to parse in-band timed metadata and captions.

Processing efficiency

On resource constrained devices such as smart TVs and streaming sticks, parsing media segments in JavaScript to extract timed metadata or event information leads to a significant performance penalty, which can have an impact on UI rendering updates if this is done on the UI thread. There can also be an impact on the battery life of mobile devices. Given that the media segments will be parsed anyway by the user agent, parsing in JavaScript is an expensive overhead that could be avoided.

Low latency streaming

Avoiding parsing in JavaScript is important for low latency video streaming applications, where it's important to minimize the time taken to pass media content through to the media element's playback buffer.

If the proposed Media Source Extensions appendStream method (see GitHub issue) is used to deliver media content directly from a Fetch API response to the playback buffer, application level parsing of the timed metadata or emsg boxes adds unnecessary delay.

Requirements

Subscribing to receive media timed event cues

The API should allow web applications to subscribe to receive specific types of media timed event cue. For example, to support MPEG-DASH emsg and MPD events, the cue type is identified by a combination of the scheme_id_uri and (optional) value. The purpose of this is to make receiving cues of each type opt-in from the application's point of view. The user agent should deliver only those cues to a web application for which the application has subscribed. The API should also allow web applications to unsubscribe from specific cue types.

Out-of-band events

To be able to handle out-of-band media timed event cues, including MPEG-DASH MPD events, the API should allow web applications to create and add timed data cues to the media timeline, to be triggered by the user agent. The API should allow the web application to provide all necessary parameters to define the cue, including start and end times, cue type identifier, and data payload. The payload should be any data type.

Event triggering

For those events that the application has subscribed to receive, the API should:

  • Generate a DOM event when an in-band media timed event cue is parsed from the media container or media stream (DASH-IF on-receive mode).
  • Generate DOM events when the current media playback position reaches the start time and the end time of a media timed event cue during playback (DASH-IF on-start mode). This applies equally to cues generated by the user agent when parsed from the media container and cues added by the web application.

The API should provide guarantees that no media timed event cues can be missed during linear playback of the media.

MPEG-DASH events

Implementations should support MPEG-DASH emsg in-band events and MPD out-of-band events, as part of their support for the MPEG Common Media Application Format (CMAF).

Cues with unbounded duration

Implementations should support media timed event cues with unknown end time, where the cue is active from its start time to the end of the media stream.

Updating media timed event cues

The API should allow media timed event cue information to be updated, such as an event's position on the media timeline, and its data payload. Where the media timed event is updated by the user agent, such as for in-band events, we recommend that the API allows the web application to be notified of any changes.

Synchronization

In order to achieve synchronization accuracy between media playback and web content rendered by a web application, media timed event cue enter and exit events should be delivered to the web application within 20 milliseconds of their positions on the media timeline.

Additionally, to allow such synchronization to happen at frame boundaries, we recommend introducing a mechanism that would allow a web application to accurately predict, using the user's wall clock, when the next frame will be rendered (e.g., as done in the Web Audio API).