Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How would extended AD be supported? #71

Open
nigelmegitt opened this issue Sep 27, 2022 · 2 comments
Open

How would extended AD be supported? #71

nigelmegitt opened this issue Sep 27, 2022 · 2 comments

Comments

@nigelmegitt
Copy link
Contributor

    @nigelmegitt - audio in the case of dub/vo, but for AD, the video is the important association!

Not sure if you have considered 'Extended AD' - where the AD presentation of the original media may cause the original media to pause at certain points for the AD to complete before playing again.
https://www.w3.org/TR/WCAG20-TECHS/G8.html#:~:text=Extended%20audio%20description%20temporarily%20pauses,are%20insufficient%20for%20adequate%20description.

If not, then it's probably a DAPT wishlist item, as I'm not aware of good examples of preparing such a thing at the moment.
It's not something we've implemented yet - A method of representing it is evading me so far (although I've not thought too hard yet).

Originally posted by @btsimonh in #45 (comment)

@nigelmegitt
Copy link
Contributor Author

My thoughts on this:

The extension that Extended AD gives relative to "ordinary" AD is that it allows the audio description resource to be longer than the period within the related media that is set aside for playback of that resource.

In other words, within, say, a 2s period of media there is a 5s duration description.

For Text to Speech applications, it is very often not known at authoring time exactly how long the speech will last.

For pre-recorded audio, the duration of each description is known exactly, after it has been recorded.

The current semantics of TTML2 playback of audio is that when the end time of the <span> (for TTS) or <audio> element is reached, playback is stopped. This is independent of how far through playback of its audio resource it has reached: if it has reached the end, fine; if it has not, it will be truncated.

From a player perspective, it is not obvious whether any truncation is intended or accidental, so we cannot just set a playback flag saying "until audio playback is complete, pause media timeline" because for AD files authored that rely on the truncation behaviour will be undesirable. This could be discouraged of course.

It probably makes sense then to add some syntax whose semantic is:

  1. "the actual duration of this audio element needs to be extended to [duration]" (comes into play when the "opportunity" time, i.e. the active duration of the <audio> element is less than duration) and
  2. (possibly) "if you have to pause the media, do it at time M".

Then it would be a player flag that says "honour extended durations" that would do so by pausing for max(duration - opportunity, 0) at either: 1. the defined pause time M or 2. some implementation-defined time.

For text to speech playback, the pause time would probably need to be at the end of the "opportunity" window, while the player waits until the text to speech system fires a "completed utterance" event.

If we're adding syntax and semantics, it would be ideal to do that in TTML2, but it may be more practicable from a standardisation perspective to do it in DAPT directly.

nigelmegitt added a commit that referenced this issue Mar 17, 2023
@nigelmegitt
Copy link
Contributor Author

@btsimonh I've proposed in #118 informative text that says implementations can support extended descriptions by varying the play rate of the audio description audio or of the programme audio so that there is enough time to play all the audio description within the time interval allowed. But that it's implementation-defined behaviour and not specified.

e.g. player could:

  • pause video until audio description has completed
  • play audio description more quickly
  • slow down video playback
  • do some combination of all the above

This stops short of any syntax but hopefully shows a route forward, and if we find there is a practical requirement to add more syntax or semantics later, then we still can.

nigelmegitt added a commit that referenced this issue Mar 28, 2023
nigelmegitt added a commit that referenced this issue Mar 28, 2023
Closes #118 and #71.

* Add Audio and MixingInstruction to the class diagram
* Show shared properties in italics
* Describe syntax of data model diagram
* Define audio recording properties
* Define Mixing Instruction representation in TTML
* Add placeholder for Audio Mixing section
* Define TTML representation of Synthesized Audio
* Note that Audio is abstract
* Add Audio Recording representation
* Also change "defines" to "represents" in TTML representation sections.
* Add `xml:lang` constraints
* Add informative section explaining audio mixing
* Reference audio recording question issues
* Reference issue #117.
* Add embedded audio example, clarify multiple Source, re-add content profile
* Remove unnecessary conformance keywords
* When the conformance is defined in TTML2 already, don't duplicate.
* Add `#source-data` extension and prohibit it
* Add and prohibit `#xmlLang-audio-nonMatching` extension
* Fix up feature support for inline-only animation for mixing instructions
* Add issue #116 link below data model diagram
* Additional examples of audio

Also refer back to examples in the introduction where appropriate.
Fix some linting issues.

* Show Source as an array not one property value
* Audio sources are not nodes
* Add note about extended descriptions
* Add space before : in enum abnf of `daptm:eventType`
* Move Audio Mixing to appendix
* Address verbal review feedback
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant