-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Support for Querying HDR Decode and Render Capabilities #118
Comments
I dig it! I'm not a color expert, but I think this seems sane and does a great job of addressing the pitfalls of previous proposals. Kudos to you all for diligence. Re: color expertise, we'll want some other folks to weigh in.
Give me a bit to collect additional Chrome feedback from folks who know the color stack. |
Aside: If we do spec these values we may need to go the path of having a registry (similar to MSE byte-streams). Nudge, @mounirlamouri who's familiar with those reqs. |
Great proposal, thanks! We have a small problem with the coupling of decoding and rendering. A video codec has no knowledge about the pixel data encoding, except the bit depth and spacial aspects. The four buckets correspond to:
All of these could potentially work with AV-1 as well as HEVC or any other 10-bit codec (e.g. VP9). But at least the term HDR10 implies HEVC. Now, the codecs string for some codecs (including VP9 and AV-1 [1]) can include information about Transfer Function, Color Space as well as bit depth, chroma sub-sampling, video range flag and matrix coefficients but does not include HDR dynamic metadata information. And for other codecs (e.g. HEVC) this information is not in the codec string. I think the bucketing is fine, but the buckets should be constrained to the rendering capabilities and precisely defined in terms of Transfer Function, Color Space and metadata specification. The buckets should not carry an implication about bit depth, chroma sub-sampling, video range flag and matrix coefficients. And then, finally, we should describe the error case where the HDR capability bucket is incompatible with the codec string. |
I don't like adding vendor-specific names to specifications, so I'm hesitant to enshrine "DolbyVision" into Media Capabilities. I proposed a something similar in #110, but using transfer function, color space, and bit depth. |
I'm also concerned about conflating whether a decoder supports these |
Just to confirm, this leaves 3 parts: eotf, color gamut, and metadata? For the screen interface (#119) this would be just the first 2 (metadata handled in software)?
2 routes we could go:
I see you've found #119 ;) |
Good point. Additionally, HDR profiles like Dolby Vision could theoretically support 12-bit & 10-bit color depth [1]. We are for constraining HDR capabilities to transfer function, color gamut, and metadata. What if the HdrCapability buckets explicitly reflected these properties?
That makes sense. Would this edited enum also address the reservation against proprietary names?
The display side does not technically need metadata; what do you think, though, about
Thanks for suggesting this route. We would like to strive for consistency. [1] https://www.dolby.com/us/en/technologies/dolby-vision/dolby-vision-profiles-levels.pdf |
I gather you meant Pq for DolbyVision transfer function. There are other proposed HDR formats, in particular is SL-HDR1 and SL-HDR2. In any case, I think splitting the capabilities between what the user-agent can handle and what can be displayed properly is the way to go. For example, a UA using an SDR display may handle HDR content well, doing proper tone mapping etc. Preferring HDR content over SDR may still be preferred here, even if the display isn't HDR |
These formats can be added to
Agreed -- #119 complements this discussion by covering the display aspect. |
@vi-dot-cpp and I chatted a bit more offline. Another approach we could take here is one similar to @jernoble's recommendation in #110:
Where ColorGamut is an defined enum as follows:
TransferFunction is an enum defined as follows:
And MetadataDescriptor is an enum defined as follows:
The MediaCapabilities spec could then define which combinations of the above enum values are valid "buckets" and we could throw a |
I'd vote for the de-bucketing (separate enums for gamut, transfer, and metadata). To me its more elegant and forward looking. It may also solve the issue of what to do for screen (no need to include metadata). This may make a case for doing away with the wrapper HdrCapability enum, flattening these new fields into VideoConfiguration directly. Then you can pick a handful for the screen API without needing a new wrapper (or a wrapper with parts that don't apply). On a related note, these are all optional inputs (HDR is new), so we'll want to choose some sane defaults for these fields. I think srgb works for ColorGamut and TransferFunction. We probably need a "none" for the MetadataDescriptor. Nit: consider renaming MetadataDescriptor to HdrMetadata? |
@mwatson2 @chcunningham @jernoble @jyavenard I made a PR (#124) that reflects points brought up in this thread. I would appreciate it if you all could review it -- many thanks. |
Is this an actually useful addition to VideoConfiguration? I.e., are there any decoders that can otherwise decode the underlying frames, but are unable to meaningfully read the HDR metadata? I was under the impression that the HDR information was a container-level concept, and not a codec one. Decoders are happy to decode encoded media data, and don't really care about the interpretation of the color values emitted by the decoder; that's left to the renderer and the display hardware. |
Dynamic HDR metadata is typically inserted into the compressed bitstream (e.g in HEVC, the metadata is inserted into the bitstream via SEI messages).
While this is correct, in the case of dynamic HDR metadata (and also static HDR metadata in many cases), the decoder needs to be able to parse the metadata from the compressed bitstream (in order to pass the metadata through to the renderer / display hardware). |
In that case, do we need to specify each subtype of metadata to query whether the decoder supports each individually, or would it be sufficient to add a "HDR" boolean to VideoConfiguration, and mandate that decoders which advertise support for "HDR" must be able to provide all the necessary metadata to the renderer and display hardware? In other words, could we make this an 'all-or-nothing' check? |
Having thought further about it, I'm concerned that querying the display capabilities give too much fingerprinting abilities. Regardless of what we add to VideoConfiguration, it appears to me that we'll never cover all cases anyway. So I kind of like a HDR bool that is all or nothing and only in VideoConfiguration. @jernoble av1 has all the information you typically found in the container, in the frame header: colorspace, range, primaries, coefficient, transfer characteristics etc (the way all codecs should have been :)) |
I'm confused about this proposal to have an HDR boolean. I'm also confused about the idea of fingerprinting using this data. If all recent Apple products support one particular set of technologies, how much fingerprinting data does that provide? If all 2019 TV sets from Samsung support one particular set of technologies and all 2019 TV sets from LG support a different set, how much fingerprinting data does this provide? |
My comment was only about
The danger with HDR comes from being able to query the abilities of the display. Even for devices with built-in screens, they can be plugged into external monitors with different capabilities. Those combinations of capabilities can be extremely unique. |
We want to alleviate fingerprinting concerns. A proposal to just add a boolean for HdrCapability to both To give more info, adding the same boolean to Screen would be important to give flexibility to website developers to decide if they should serve HDR content based on whether the display supports it or not. Having the ability to make this decision is important especially because there can be power (for the user agent) & network (for the content provider) implications if the content provider chose to serve HDR content even if the Screen did not support it. So as long as we give them an ability to make this decision consciously, it is good enough. We would still want to keep the HdrMetadata for the reasons @isuru-c-p mentioned above
|
I'm sorry but I don't understand these. My employer is a large manufacturer of monitors and TVs. I asked a colleague about information carried in HDMI and what might be vulnerable. What he said was the following;
I don't believe it was ever to expose the detailed data specific to the technology he mentions. What am I missing? |
Generally speaking, it only requires 33 bits of entropy in order to uniquely identify a user by fingerprinting, and these bits of entropy are cumulative. So the concern is not that exposing detailed device information alone will be able to uniquely identify a user, but that in combination with all the other sources of entropy available pages will be able to do so. "Does this display support HDR or not?" is one bit of entropy [1] (out of 33 total). "Does this display support ST2084 but not HLG?" is another two. "Does this display support Dolby Vision, but not HDR10+ and SL-HDR2" is another three. "What is the display luminance?", if expressed as a floating point number, could be as many as 32 bits of entropy. [1] This is a theoretical maximum amount of entropy. If everyone in the world gave the same answer to that question, it wouldn't really add fingerprinting risk. So it's not as useful to be able to determine that "this user is on an iPhone", which isn't very unique, as it is "this user is attached to a LG model 27UK650 external display and their brightness setting is 55". |
There's a lot more information here, Privacy WG's Fingerprinting Guidance document. |
Given the concerns of finger printing and given that we really don't need more than a Boolean to represent HDR support for the major scenarios, I think we no longer need to discuss having more granular data representing the device and can keep it simple. I have updated the pull request with the new proposal. @jernoble , @chcunningham , @jpiesing , @jyavenard, @mwatson2 can you see if the latest pull request seems aligned with you all. We will add a similar Boolean for HDR support to |
Help me clarify the meaning of the boolean. The latest PR update defines it as "hasHdrMetadata". I think we want to avoid having to say "we support all forms of HDR metadata". Can we reliably infer the type of metadata from the VideoConfiguration contentType? DV has its own contentType strings (e.g. codecs="dvhe.05.07"), so infer SMPTE ST2094-10. But for the other codecs they're not tightly bound to either SMPTE ST 2086 nor SMTPE ST2094-40 nor whatever future metadata spec arises. What does the boolean mean for non DV codecs? Giving it a clear meaning is good to avoid ambiguity about future metadata formats. But also, even for known formats, UAs are likely to support just a subset. I can't predict whether Chrome will ever support DV. I also expect support for ST2094-40 to be spotty for many UAs for some time. Re: fingerprinting, the Chrome security team's position is nuanced. Please have a read here. In short, I'm happy to consider alternatives to the buckets above, but I'm not personally worried that these APIs are meaningful additions to the required 33 bits. |
I think you should be, however, having said that I don't think that the decision to okay or not as far as fingerprinting goes for the inclusion of such feature in the spec should be left to a single person. Maybe this is something we can put on the agenda for when the new media WG meet at the next TPAC. |
Again I apologise if I'm missing something but please can you point me to where the major scenarios are documented and where this analysis is recorded? Thanks. |
Definitely (see you there). Meanwhile, lets keep discussing how a boolean would work. See my questions above; its not clear to me that its viable yet. |
Agree with @mwatson2 and @jernoble - I prefer not to formally require a particular mitigation. New/improved mitigations will arise and each UA will do it differently. For ex, the latest thinking in Chrome-land is to use a "privacy budget" that throttles/blocks calls to the API above a certain threshold (distinguish fingerprinting from legitimate use).
Do these remaining points imply a change to the spec/PR (vs just forming points of agreement)? IIUC, #1 is already true. We have a nod to #2 here - @jernoble do you think this should be amended (e.g. more complete description of the fingerprinting surface)? Switching gears for a sec, I want to return to some discussion of the colorGamut property that came up near the end of our recent meeting. Quick summary:
Picking back up with new info/questions
|
Yep, was just trying to get a clear resolution on it all so we can put a wrap on this issue. Let's add number 2 to the PR. Regarding @vi-dot-cpp can you add the following to the PR:
Thanks for the quick responses and feedback. |
This is also a mitigation. Please don't add this to the PR.
@gregwhitworth can we keep it here? This issue is as much about the interface (including enum values) as it is fingerprinting concerns. As-is, the PR would add a colorGamut property to MediaCapabilities that does yet exist. A handful of folks were concerned this is not quite right, so we should get consensus on that before landing a PR to add it. |
Greg closed the separate issue (thanks). @mwatson2 @jernoble @jpiesing interested to continue the discussion re: colorGamut vs ISO_IEC_23001-8_2016. See my earlier comment. |
@chcunningham that's fine, this thread has already numerous issues so let's keep it here. With regards to your feedback on
This implies that they're overloading
So this seems to contradict the first item as you stated and is only about the display, not the rendering capabilities & the display. I can file an issue and follow up with the CSSWG on a call following TPAC to see which direction they intended for this and we can either amend our spec to build on top of theirs. Or we can see if they'll have the color spec adjusted to align the color space definitions with the earlier paragraph as it doesn't make sense to go down a code path for a color space that the display can support but the UA can't adequately render. I personally think that we want to adjust the spec to the following (for all of the color space definitions):
Would that be sufficient? |
@chcunningham @mwatson2 @jernoble @jpiesing I presume I should move forward with opening an issue on the CSSWG to fix the contradictions between their propdef of color-gamut and that of the color space definitions; correct? |
Regarding whether we need to separately specific matrix coefficients, to completely make sense of decoded pixel data you need to know full range flag, eotf, matrix coefficients and color primaries:
When labelling a video stream, the values of all of these things are known and there is little reason not to declare them all in the codec string. This is just accurate labeling of a stream. For capability discovery we can get away with a smaller set when it is known that all devices support all relevant values of one of these. Many of the values for color primaries and matrix coefficients in the codec-independent code points document are not relevant in a web context. Specifically, we only care about SDR (709) and BT.2020 for color primaries and there is only one matrix coefficients value used with 709. I am actually not sure whether it is the case that only one value of the full range flag is used in practice or whether devices universally support both values, but I infer from the lack of problems related to this flag that one of these is true ;-) Same for the two values of matrix coefficients associated with BT.2020, though I do know here that the 'constant luminance' one is not widely supported if at all. So, for capability discovery we are probably fine with TF and color primaries. Matrix coefficients could be added later on if someone has support for BT.2020 constant luminance and wants that to be discoverable. But this is not so likely to happen as I doubt people will want to double up their streams for the small benefit this option provides. |
@mwatson2 said:
We’ve been down this road before with EME. Existing codec strings don’t carry this information, and bodies that standardize them are very resistant to putting stream characteristics into the codec string. So not only will this not work for existing codecs and containers, it’s unlikely to work universally for future codecs and containers as well. I don’t think we’re going to be able to get away with putting all this information into the content type. |
Thanks everyone for the feedback. Based on our discussion, I have updated #124 to include the following:
The update is based on:
|
i think we're conflating the color-gamut media-query and the ColorGamut enum. the color-gamut media-query takes a ColorGamut enum as input and tests support by the UA and the output device. the ColorGamut enum values only represent a color space, nothing more. it is the color-gamut media-query which is returning device information for a given color space. the proposal here is to add the ColorGamut enum to represent a color space, without the color-gamut semantics. |
@gregwfreedman valid point that it's an enum and not necessary what's doing the evaluation of support. That said, I went ahead and filed an issue with the CSSWG spec and they'll be fixing it to reflect rendering & display. w3c/csswg-drafts#4281 @vi-dot-cpp you should be able to either change your PR for this to be a note or remove the description altogether because the CSS spec will be the definition you're expecting. |
FYI, I'll be largely out of office next week as I head to Japan and squeeze in some tourism before TPAC. Looking forward to a f2f chat!
This is how I understood the proposal. Just want to make sure it has everything we need. Interested to hear @mwatson2 come back on @jernoble's last comment. @vi-dot-cpp - the PR presently says "The ColorGamut represents the color gamut supported by the UA and output device." I follow that this is the CSS wording, but we should somehow call out that calls to decodingInfo() actually aren't checking the output device. IIUC the plan has been to leave output device queries to the Screen API, meaning color gamut for decodingInfo() is purely a question of what the UA supports. |
Correct me if I misunderstand -- will there not be UAs for whom decodingInfo() checks the attached screen, e.g., Cast?
Some of us will regrettably miss this opportunity; Is calling in an option? |
@jernoble wrote:
The VP9 and AV1 codec strings carry this information, but I understand others don't. Let me clarify my point though: I was not proposing we use codec strings for capability discovery past the identification of the codec that is common. I was pointing out the difference between describing stream properties and discovering capabilities, since someone had mentioned that I had argued for matrix coefficients as an item in the VP9 codec string, but in this discussion I think we don't need it. If you are describing stream properties, then these are just descriptive values and you might as well include everything to be fully descriptive. When discovering capabilities the task may be simplified by known facts of the form "all implementations that support X also support Y" or "no implementation exists that supports P with Q". We don't need to separately specify matrix coefficients for discovery since there is only one relevant value for each color gamut. Also, in future, if necessary, new capability discovery fields can be added when new capabilities are added to an implementation but it would be much harder to add a field to the codec string since that has no forwards compatibility mechanism and is embedded in many implementations. |
Just reviewing the PR and trying to understand what is now being proposed, this text seems ambiguous:
Does hasHdrCapabilities mean all of sRGB, p3 and rec2020 need to be supported and all of sRGB, pq and hlg need to be supported as the current text implies? Or is it intended to be a query covering all capabilities that is considered supported if at least one HDR-relevant color gamut and transfer function is supported (in which case, why list sRGB)? If we're aiming to have just one boolean then I can see pros and cons with either interpretation and which is best rather depends on how likely it is that a device will support some but not all of the capabilities listed. At the very least, the wording needs tightening to be clear what is being described. |
This is true, but I think we have to be careful about when we explicitly mention the screen to avoid confusing the reader. The current language makes it sound as if we will only return support for rendering a specific color gamut if the attached screen also supports outputting this gamut. We want to avoid that coupling (having screen output capabilities addressed by the Screen API). When I mentioned the Cast example earlier this was to motivate the inclusion of eotf. In these cases, the screen line between the display and UA are blurred. There will also be cases where the UA software runs entirely within the display (Smart TVs). But we don't need to bring attention to this fact in the spec because it isn't important for sites to know and it implies the coupling I mention above. IMO the way to draw the line is to continue separate Screen vs Decoding+Rendering such that we only put things on Screen that were traditionally Screen properties (before screens starting building in computers) - things like dimensions, color gamut, hdr support. SmartTVs that act as a UA + Display can continue to answer the non-Screen decodingInfo() questions in the same way we would for a traditional desktop + display. |
It was nice to speak with everyone at the TPAC face-to-face and get agreement on this issue. #124 has been updated to reflect suggestions surfaced here and at TPAC. |
I realize this is a comment from some time ago, but it may be important to note that Dolby Vision is a superset of SMPTE 2094-10, particularly when it comes to OTT video distribution. See https://www.dolby.com/us/en/technologies/dolby-vision/dolby-vision-profiles-levels_v1.3.2.pdf I believe this is why the vendor strings were chosen for Android: https://developer.android.com/reference/android/view/Display.HdrCapabilities.html |
@rdoherty0, could you clarify: I don’t see any reference to SMPTE 2094-10 in that document, only SMPTE 2086. When you say “superset”, do you mean that the bitstream carries multiple metadata formats at the same time? Or that the bitstream is capable of carrying one out of a defined set of metadata formats? The “BL signal cross-compatibility ID” section seems to indicate the latter. |
There is a lot to unpack here, unfortunately. Your second statement is closer to the truth: there is one complete metadata set per stream. There is more documentation from Dolby here which documents the inclusion of Dolby Vision streams into various formats (DASH, for example): https://www.dolby.com/us/en/technologies/dolby-vision/dolby-vision-for-creative-professionals.html#5 The 2094-10 metadata is used in several standards' based efforts, including ATSC and DVB, and specified in DASH-IF IOP spec. But most Dolby Vision profiles extend this metadata, including the composing metadata specified in the ETSI specification (https://www.etsi.org/deliver/etsi_gs/CCM/001_099/001/01.01.01_60/gs_CCM001v010101p.pdf), which does reference SMPTE 2094-10. Most online distribution is using Dolby Vision profiles 5 or 8.1. I would suggest none of this complexity needs to be exposed at this API layer, the simple existence bit as proposed is ok, but it would be not accurate to label the Dolby Vision "family" of HDR metadata as SMPTE 2094-10. |
Celebrate!!! PR #124 is merged! This includes all the bits we agreed to in this discussion and tpac. It does not include the Screen API changes that are still under discussion. I'm going to close the out and file a separate issue to see if we should make any revision for the points raised by @rdoherty0. Thanks everyone! |
This is part 1, which covers decoding and rendering, of the HDR two-part series. Part 2 (#119) covers display.
Modern day scenarios, based on data & partner asks we have analyzed, are increasingly requiring the need of HDR capability detection in v1. We let the following design considerations guide this proposal:
MediaCapabilities
) and display capabilities (Screen
). Relevant threads/comments: [1][2][3][4][5][6]We propose the following changes to
MediaCapabilities
. These changes will be complemented by changes toScreen
in the aforementioned linked issue.HdrCapability
enum toVideoConfiguration
in similar fashion to Android’s HdrCapabilities.1. Define HdrCapability Enum
Shared in Screen and MediaCapabilities:
2. Add HdrCapability Enum to VideoConfiguration
Team: @scottlow @GurpreetV @isuru-c-p @vi-dot-cpp from Microsoft
The text was updated successfully, but these errors were encountered: