Skip to content

Proposal for minimal timestamp API to allow for synchronising media with CPAL streams #363

@mitchmindtree

Description

@mitchmindtree

This is a proposal to begin addressing #279 with the most minimal API necessary.

Background

The most seemingly accurate and thorough research I could come across on this topic is Ross Bencina's excellent paper PortAudio and Media Synchronisation - It's All in the Timing. It contains an overview of the media synchronisation problem with example scenarios, visual diagrams, etc that make it more intuitive.

http://www.portaudio.com/docs/portaudio_sync_acmc2003.pdf

The first few sections of the paper describe some hypothetical scenarios and different techniques for synchronising audio with some other kind of media. A MIDI clock is the primary example used in the paper, but the same techniques apply to presenting frames of graphics and other forms of media sync.

Section 6 describes the minimal set of information necessary in order to make these synchronisation techniques possible:

  • Sample rate. We already have this.
  • Buffer start times. We do not yet provide this. This refers to the most accurate form of monotonic clock time available on the system. It is also essential that users have access to the same source of time that provides this value in order to timestamp their media events. This means 1. describing the exact source for each host in the docs and possibly 2. providing a function for easily retrieving this value (portaudio do so via a GetStreamTime(Stream* s) function).

PortAudio decided to provide this monotonic time in seconds using a double-precision floating-point data type:

The double data type was chosen after considerable deliberation because it provides sufficient resolution to represent time with high-precision, may be manipulated using numerical operators, and is a standard part of the C and C++ languages.

Section 7 also describes implementation issues. They can be roughly summed up as follows:

  • 7.1 Sample rates: Subtle variations between the nominal sample rate and the observed sample rate occur between sound card / chipsets, resulting in subtle inaccuracies occurring within the aforementioned synchronisation techniques. PortAudio provides an actual sample rate via its stream info parameters. Calculating this requires a high-resolution system clock, though this isn't always available.
  • 7.2 One Shared Time-base: The time source of timestamps provided via audio callbacks sometimes differ from the source used to provide timestamps for other media events (e.g. Windows' MIDI API). This is why PortAudio found it necessary to provide a function for easy access to the correct source (GetStreamTime).
  • 7.3 Buffer playback times: Exact buffer playback times are often unprovided or inaccurate. PortAudio takes on the initiative of trying to calculate this for the user in the case that it isn't provided by the platform. ASIO buffer timestamps have a best-case resolution of 1ms, significantly worse than necessary for sample-level synchronisation.

Proposal

I propose that we add the following:

  • A StreamInstant struct representing a monotonic time instance retrieved from either 1. the stream's underlying audio data callback or 2. the same time source used to generate time stamps for a stream's underlying audio data callback. No guarantees are made about the duration that the value represents, only that it is monotonic and begins either before or equal to the moment the stream was started. Internally we could represent the instant in a similar manner to std::time::Duration, providing methods for easy access to more accessible representations e.g. .as_secs_f64(), etc.
  • The following timestamp structs:
    • InputStreamTimestamp
    • OutputStreamTimestamp
      Both structs contain two fields of type StreamInstant:
    1. callback indicating the instant at which the data callback was called.
    2. buffer_adc and buffer_dac representing the instance of capture and playback from the audio device for the input and output streams respectively.
      An instance of these structs would be provided to the respective user's data callback.
  • A fn now(&self) -> StreamInstant method for the Stream handle type, allowing users to produce an instant in time via the same source used to generate timestamps for the data callback, useful for media sync. It will be important to document exactly what system API is used for each host and to list any notable limitations (e.g. the 1ms best-case resolution on ASIO).

I've been doing some research into the way that timing information is provided by each of the different hosts supported by CPAL. I'll add a follow-up comment soon with the relevant info for some more context for those interested and for myself to refer back to during implementation.

The transport API discussed within #279 has been intentionally omitted in the hope that it can be implemented on top of the proposed timestamp API. In the case that it cannot, this is likely best left to be addressed in a future PR either way.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions