Add description of an API for controlling SDP codec negotiation #186

alvestrand · 2023-06-14T17:21:19Z

Fixes #172

index.bs

fippo · 2023-06-20T06:17:24Z

index.bs

+#### AddSendCodecCapability #### {#add-send-codec-capability}
+
+1. Let 'capability' be the "capability" argument
+1. If 'capability' is a known codec, throw an IllegalModification error.


how do you define "known"? I would suggest forbidding the same mimeType so you can't have collisions between e.g. H264 and a newly added H264 with a different profile-level. But you seem to allow same-mimeType-different-fmtp in the explainer?

What test is used to determine whether a capability is 'known"? Do we check against what would be returned by RTCRtpSender.getCapabilities(), Media Capabilities or WebCodecs isSupported()?

The purpose of throwing here is so that you can't ask to duplicate existing codecs. If you could duplicate, how could the other side know which one to choose for sending?
So it's important on the receive side, but exists on the send side mostly for symmetry.
I think "known" should be "a codec with exactly the same name and fmtp parameters appears in the list that is generated by default" - adding special codecs for special H.264 parameters is probably a valid use case.

Trying to edit this in, I find that clock rates and channels are involved too. There can be 2 codecs that differ only in clock rate (CN codecs in particular), so I think we should throw in the kitchen sink and say that "equal if all the fields compare equal".

from discussion elsewhere: rtcp-fb might need consideration too (and * does not cut it sadly)

sdp-explainer.md

aboba · 2023-06-27T14:52:33Z

index.bs

+#### AddSendCodecCapability #### {#add-send-codec-capability}
+
+1. Let 'capability' be the "capability" argument
+1. If 'capability' is a known codec, throw an IllegalModification error.


What test is used to determine whether a capability is 'known"? Do we check against what would be returned by RTCRtpSender.getCapabilities(), Media Capabilities or WebCodecs isSupported()?

index.bs

jtsiskin · 2023-06-27T21:47:19Z

index.bs

+
+1. Let 'capability' be the "capability" argument
+1. If 'capability' is a known codec, throw an IllegalModification error.
+1. If 'packetizationMode' is defined, and is not a known codec, throw a SyntaxError DOMException.


How would one specify they want to use a generic/codec agnostic packetizer/depacketizer?

Currently, the transform must preserve whatever bits the packetizers happen to read from (example libwebrtc reference) - however, if the RTCEncodedVideoFrameMetadata is correctly filled, the packetization process could use this instead, and not need to read any bits from the frame.

Is that something we need to distinguish between - the
"I am not modifying any necessary bits in the header, go ahead and use VP8/AV1/etc. packetizer"
vs the
"I am transforming the entire frame, don't read from it, but the frame metadata is there"
cases?

Should we even support both, or could we only allow generic RTCEncodedVideoFrameMetadata packetization in the first place?

also very minor nit: the name is slightly confusing due to the matching H264 parameter 'packetization-mode'

Once that generic / codec agnostic packetizer/depacketizer has a name, we can use it.
Once a spec exists, it's likely to have a name in it.

Added some more language about packetizers while moving to naming them explicitly. See if this conversation can be resolved, or whether comments should go elsewhere.

dontcallmedom-bot · 2023-06-28T06:34:37Z

This issue had an associated resolution in WebRTC June 2023 meeting – 27 June 2023 (Encoded Transform Codec Negotiation):

RESOLUTION: Adopt PR #186 with details to be discussed in the PR

jan-ivar · 2023-08-23T23:14:57Z

index.bs

+<pre class="idl">
+// New methods for RTCPeerConnection
+partial interface RTCPeerConnection {
+   undefined AddSendCodecCapability(DOMString kind,


kind seems redundant if we throw on !mimeType.startsWith("video/") && !mimeType.startsWith("audio/")

Do I understand correctly that once e.g this is called

const acmeCodec = { mimeType: "video/acme-encrypted", clockRate: 90000, sdpFmtpLine: "encapsulated-codec=vp8", packetizationMode: "video/vp8", }; pc.addSenderCodecCapability(acmeCodec);

...then every negotiation from here on will offer sending of this new codec on ALL transceivers until the pc is deleted? And in first place?

This seems more limiting than e.g. setCodecPreferences which allows negotiation of codecs per transceiver.

In fact, I was wondering, why can't we just reuse setCodecPreferences here instead?

transceiver.setCodecPreferences([acmeCodec]);

I.e. have this be how JS both adds new codecs and specifies their position among existing ones?

We already accepts dictionaries there, but have prose to throw on unknown codecs. But if we're about to allow JS to make up codecs anyway, then that limitation seems superfluous suddenly.

We perhaps adopt a different limitation then, which seems to be that setCodecPreferences sets the same codec preference for both send and receive, do I have that right? But maybe that's something we should solve anyway, or perhaps it's not a problem in practice since JS can just create one-way transceivers?

This might also sidestep input validation question, since we could throw on partially unique codecs (since we're expecting either known codecs or new codecs that can't be mistaken for known ones).

setCodecPreferences() has the limitation that it doesn't distinguish between capacity-to-send and capacity-to-receive.
It might actually make sense to add a direction field to RTCRtpCodecCapability - possible values "send", "receive" and "sendrecv". This could be useful for the receive-only codecs that are currently a design problem (HEVC is the biggest name in town on that front currently).

There are two derived dictionaries from RTCRtpCodec: RTCRtpCodecCapability (adds nothing) and RTCRtpCodecParameters (adds payloadType). Should the direction field be in both?

One argument against using SetCodecPreferences is that it will turn unintended changes to the capabilities (on which we currently throw) into definitions for new codecs. Is that a Bad Thing?

In reply to "All negotiations will offer..." - only if unmodified by SetCodecPreferences.
SetCodecPreferences should allow for selecting the additional codecs to offer / answer with as well as the "baseline" codecs as desired by the user.

jan-ivar

Overall I have some questions of scope here, and I'd like to understand the use cases beyond adding SDP negotiation to JS-e2ee.

I also have some concerns about the number of nuts and bolts and application would need to keep straight here, e.g. having to modify frame data.

Also in the spec, we have the RTCRtpScriptTransform interface. I wonder if it would simplify things if we put some options on that interface instead? (but perhaps not since an app may switch transform on the fly AFAICT. @youennf is that right?

jan-ivar · 2023-08-23T23:15:38Z

index.bs

+<pre class="idl">
+// New methods for RTCPeerConnection
+partial interface RTCPeerConnection {
+   undefined AddSendCodecCapability(DOMString kind,


Do I understand correctly that once e.g this is called

const acmeCodec = { mimeType: "video/acme-encrypted", clockRate: 90000, sdpFmtpLine: "encapsulated-codec=vp8", packetizationMode: "video/vp8", }; pc.addSenderCodecCapability(acmeCodec);

...then every negotiation from here on will offer sending of this new codec on ALL transceivers until the pc is deleted? And in first place?

This seems more limiting than e.g. setCodecPreferences which allows negotiation of codecs per transceiver.

In fact, I was wondering, why can't we just reuse setCodecPreferences here instead?

transceiver.setCodecPreferences([acmeCodec]);

I.e. have this be how JS both adds new codecs and specifies their position among existing ones?

We already accepts dictionaries there, but have prose to throw on unknown codecs. But if we're about to allow JS to make up codecs anyway, then that limitation seems superfluous suddenly.

We perhaps adopt a different limitation then, which seems to be that setCodecPreferences sets the same codec preference for both send and receive, do I have that right? But maybe that's something we should solve anyway, or perhaps it's not a problem in practice since JS can just create one-way transceivers?

This might also sidestep input validation question, since we could throw on partially unique codecs (since we're expecting either known codecs or new codecs that can't be mistaken for known ones).

jan-ivar · 2023-08-23T23:55:10Z

index.bs

 partial interface RTCRtpSender {
    attribute RTCRtpTransform? transform;
+    undefined setEncodingCodec(RTCRtpCodecParameters parameters);


This seems redundant now that we have set codec in setParameters. Mentioned in the slides.

Yes, this will be removed, and replaced with a pointer to "set codec in setParameters" where appropriate.

I'm leaving this in with a note for now, when we're separating the encoding and packetization steps, I want to make sure we're controlling both appropriately.

index.bs

jan-ivar · 2023-08-24T00:06:05Z

index.bs

+1. The {{RTCPeerConnection/[[CustomSendCodecs]]}} is included in the list of codecs which the user agent
+    is currently capable of sending; the {{RTCPeerConnection/[[CustomReceiveCodecs]]}} is included in the list of codecs which the user agent is currently prepared to receive, as referenced in "set a session description" steps.


Again I wonder if these custom codecs could instead just live in transceiver.[[PreferredCodecs]]

Step 8 in setCodecPreferences() filters against codecCapabilities (the static).
If we modify that, it would be possible - but would we entirely stop filtering, then?

sdp-explainer.md

alvestrand

Apologies for the spam - I only thought to group these into a review late in the day.

This addresses comments on index.bs - I'll address comments on the explainer in a separate review.

index.bs

alvestrand · 2023-08-29T12:11:22Z

index.bs

+1. The {{RTCPeerConnection/[[CustomSendCodecs]]}} is included in the list of codecs which the user agent
+    is currently capable of sending; the {{RTCPeerConnection/[[CustomReceiveCodecs]]}} is included in the list of codecs which the user agent is currently prepared to receive, as referenced in "set a session description" steps.


Step 8 in setCodecPreferences() filters against codecCapabilities (the static).
If we modify that, it would be possible - but would we entirely stop filtering, then?

alvestrand

Explainer updated. PTAL!

sdp-explainer.md

alvestrand · 2023-10-06T13:15:41Z

Modified according to discussions at TPAC according to solution presented there to #199

PTAL.

dontcallmedom

Fix bikeshed markup

index.bs

dontcallmedom · 2023-10-09T09:16:10Z

(the remaining CI error is pending merge of #208)

Philipel-WebRTC · 2023-10-09T09:28:23Z

I have read this proposal, and I can't quite understand how it is suppose to work. AFAICT the goal is to allow negotiation of custom codecs, but we still only allow standardized packetizers to be (ab)used.

This part of the spec:

It is up to the transform to ensure that all data and metadata conforms
to the format required by the packetizer; the sender may drop frames that
do not conform.

seems extremely complex. The user needs to know at the lowest possible level of how the packetizers work, know which bits specifically they are allowed to touch or not touch for each codec, and AFAICT it would also be borderline impossible to debug if a packetizer decided to drop the frame for you.

Would something like this work instead?

customCodec = {
   // Type is an enum of {"VP8", "VP9", "H264", "custom"}
   type: “custom”,
   // postfixName only has an effect when the type is "custom". Setting the postfixName
   // to "VP8" would make the codec show up as "custom-VP8" in the SDP.
   postfixName: "VP8",
   clockRate: 90000
};

When custom is selected as type the packetization format is implied to be raw.

Now when a sender and receiver needs to negotiate a codec they can do string matching on the codec names, and assuming they are aware of how the bits are being transformed, then they know everything about the codec and the format.

alvestrand · 2023-10-10T12:17:21Z

to @Philipel-WebRTC - I don't understand your proposal.
In particular:

h.264 has two packetizers, distinguished by "packetization-mode=0/1" in the fmtp string. This is part of my argument for using a full codec description to identify the packetizer.
codec names have to be unique, and if standardized names are used, they need to be used for standardized codecs
there's no particular reason why the SDP exchange should have to reveal to a signaling eavesdropper what the inner content of a nonstandard encrypting codec is

There is no standard at the moment for a raw packetization format. So we can't refer to that format from a standard.
A goal of standardization is interoperability - that a depacketizer in one browser should be able to depacketize content that is packetized by another browser. So referring to an unspecified format is not a win here.

I thought about using a DOMString for "setPacketizer" rather than a codec description, and allowing implementations to put non-specified values in there. But I didn't find any particular advantage in doing so.

Philipel-WebRTC · 2023-10-10T14:12:40Z

It could be that I fundamentally misunderstand the goal of this proposal, but AFAICT the goal is to enable custom codecs to be negotiated via SDP, or am I wrong?

As soon as we negotiate a non-standardized codec name then we don't have any standardized packetizer to rely on, and I get that the proposed API would allow the sender and receiver to map the payload type of a custom codec to some other know (de)packetizer, but to me that seems very limiting and complex.

The way it is proposed right now, would we expect anyone to every use a packetizer other than VP9 (the only one that separates descriptor and bitstream data cleanly in the payload) for a custom codec? Again, this to me looks a bit more like the abuse side of things than a general solution (assuming I understand the goal of this PR).

alvestrand · 2023-10-10T14:58:08Z

At least one external WebRTC user has made this almost work by using the H.264 packetizer - he was advised that he would have an easier time if he appended the custom data as SEI (don't know the proper term for those). So no, VP9 is not the only option, but yes, VP9 is probably the best standard one at the moment.

It's always possible for implementations to define their own packetizers, but that's not useful for interoperability.

alvestrand · 2023-10-11T10:18:08Z

Actually, registering new codecs in the vnd. tree with IANA is a possibility that would allow a well defined way of using nonstandard packetizers.

alvestrand · 2023-10-11T11:18:51Z

@aboba you have a "request changes" marker. Can you re-review?

youennf · 2023-10-13T06:58:14Z

Following on yesterday's editor meeting, here are some thoughts related to this proposal.
I would tend to reduce the scope of the API to what is needed now, without precluding any future extension.

AIUI, we seem to get consensus on adding support to:

Enable media agnostic RTP payload formats like SFrame.
Allow the application to select one of these RTP payload formats as part of a RTCRtpSender transform.

For 1, we assume the UA implements the RTP payload format.
setCodecPreferences can be used if the UA does not negotiate it by default.

For 2, we could use payload types.
Another approach, potentially more convenient for web developers, is to reuse the name of the payload format directly.
This is similar to manipulating mimeType instead of payloadType.

partial interface RTCEncodedAudioFrame {
    attribute DOMString rtpPaylodFormat;
}
partial interface RTCEncodedVideoFrame {
    attribute DOMString rtpPayloadFormat;
}

On sender side, the application would set it to sframe, to make use of https://www.ietf.org/archive/id/draft-thatcher-avtcore-rtp-sframe-00.html. On receiver side, the application detects the use of sframe by checking rtpPayloadFormat.

We could also use that API internally to further define the interaction of SFrameTransform with the RTP layers.

alvestrand · 2023-10-13T07:59:40Z

Solving the SDP negotiation problem for built-in Sframe is not sufficient. We have to solve it for script transforms.

The proposal in #186 (comment) will not allow us to affect SDP negotiation for script transforms; SDP negotiation is only affected by things that happen before offer/answer time - there are no frames before offer/answer.

jan-ivar · 2023-10-13T19:58:18Z

Enable media agnostic RTP payload formats like SFrame.

Do you mean any other types here, or just SFrame? Do you have an example?

For 1, we assume the UA implements the RTP payload format.
setCodecPreferences can be used if the UA does not negotiate it by default.

What would that look like? Would it multiply the payload types in RTCRtpSender.getCapabilities("video")?

Allow the application to select one of these RTP payload formats as part of a RTCRtpSender transform.

Do you mean a RTCRtpScriptTransform here? What would that look like?

partial interface RTCEncodedVideoFrame {
   attribute DOMString rtpPayloadFormat;

What's the advantage or use case for choosing this per frame?

Apps can already specify SFrame ahead of negotiation, so doesn't the UA have all it needs to negotiate sframe?

const {sender} = pc.addTransceiver(track);
sender.transform = new SFrameTransform({role: "encrypt"});
// negotiate
const {codecs} = sender.getParameters();
if (!codecs.find(findSFrame)) sender.transform = null;

alvestrand · 2023-10-15T22:56:30Z

Enable media agnostic RTP payload formats like SFrame.

Do you mean any other types here, or just SFrame? Do you have an example?

You heard the SCIP (NATO) format being mentioned on the call. That's one.
A payload format to replace Google's nonstandard a=packetization-mode:raw would be another.

For 1, we assume the UA implements the RTP payload format.
setCodecPreferences can be used if the UA does not negotiate it by default.

What would that look like? Would it multiply the payload types in RTCRtpSender.getCapabilities("video")?

It would look like the API described in https://developer.mozilla.org/en-US/docs/Web/API/RTCRtpTransceiver/setCodecPreferences

Allow the application to select one of these RTP payload formats as part of a RTCRtpSender transform.

Do you mean a RTCRtpScriptTransform here? What would that look like?

From the explainer:

    metadata = frame.metadata();
    metadata.pt = options.payloadType;
    frame.setMetadata(metadata);

partial interface RTCEncodedVideoFrame {
   attribute DOMString rtpPayloadFormat;
What's the advantage or use case for choosing this per frame?

I believe Youenn described this case in details on the call; I added this FAQ question specifically to address his use case.

From the explainer:

Q: My application wants to send frames with multiple packetizers. How do I accomplish that?

A: Use multiple payload types. Each will be assigned a payload type. Mark each frame with the payload type they need to be packetized as.

Apps can already specify SFrame ahead of negotiation, so doesn't the UA have all it needs to negotiate sframe?

I particularly mentioned on the call (and have said in every presentation that I've given on it) that I want an interface that is able to support the existing, deployed, Javascript-based sframe format, which is not the same as the sframe transform that is supposed to use the (still not final) IETF SFrame format draft, or any other transform.

youennf · 2023-10-16T09:42:28Z

What's the advantage or use case for choosing this per frame?

Fair question.

I believe Youenn described this case in details on the call; I added this FAQ question specifically to address his use case.

From the explainer:

Q: My application wants to send frames with multiple packetizers. How do I accomplish that?

The use case I know is enable/disable SFrame dynamically. Otherwise, we just want to stick to whatever packetization is associated to the encoder media content and we can already change the media content via setParameters.

This change can be done either at the frame level or at the transform level (similarly to setParameters really).
If at the transform level, you need to call sender.transform = newTransform to change whether SFrame packetization would be used or not. This is slightly less flexible but might be more convenient/less error prone to web developers and could allow some optimisations on the UA side.
You loose some flexibility, except if you plan to switch packetization at a very specific frame. I am unsure whether we have a strong use case here.

It is interesting to look at receiver side though, in case a UA implements the SFrame packetization format and processing happens in a script transform. The web application might want to know:

packets were processed by the SFrame depacketizer
Which underlying media decoder will be used (info provided by the SFrame depacketizer from the RTP payload content).
It might be convenient to expose an API for both things and it makes sense to expose this at the frame level, since this might change for every frame potentially.

I would tend to expose the same API receiver and sender side, hence why I would tend to go with frame level.

Solving the SDP negotiation problem for built-in Sframe is not sufficient. We have to solve it for script transforms.

That is probably where we have a disconnect.
Script transforms have several potential use cases:

Implement SFrame or a variant of SFrame. Once we have the packetization format in the UA, I do not see a need for a new generic mechanism to negotiate it. I am ok with SFrame packetization format spec to expose an extension point in the SDP, and we would expose this in our API (say setCodecPreferences for instance).
Add metadata to media content. In this case, I believe we want encoded content to stick to the same format (e.g. H264 would stay H264 and metadata would be put in SEI). We do not need to change the packetization format and I do not see the need to negotiate anything in the SDP. RTP header extensions might be a more meaningful extension point if need be.
Plug a new codec. It seems best to solve this with a separate API, I do not think we want to enshrine this kind of support in encoded transforms. For instance, a video encoder takes a VideoFrame as input and EncodedVideoChunk as output. A script transform takes an EncodedVideoChunk both for input and output. Also, these two should be usable together: it makes sense to use a SFrame transform on a plugged-in codec simply with sender.transform = sframeTransform.

This plug a new codec API would probably need to let the web application handle both the encoding/decoding (UA does not know the content format) and the associated packetization (UA does not know the format so does not know the associated packetization format), in addition to SDP negotiation. Relying on the SFrame packetization for new codecs is probably not something we want, as per IETF feedback.

For this plug a new codec work, I would tend to start with a plug your own encoder API that could be used without packetization/SDP handling, for instance to let web apps fine tune a WebCodecs encoder setup (set QP per frame/macro-block for instance). We could then address new formats on top of this API to handle SDP negotiation/RTP packetization.
It could also be used for the metadata to media content use case since web applications may often carry VideoFrame and associated metadata together.

alvestrand · 2023-10-16T10:40:06Z

Solving the SDP negotiation problem for built-in Sframe is not sufficient. We have to solve it for script transforms.

That is probably where we have a disconnect. Script transforms have several potential use cases:

Implement SFrame or a variant of SFrame. Once we have the packetization format in the UA, I do not see a need for a new generic mechanism to negotiate it. I am ok with SFrame packetization format spec to expose an extension point in the SDP, and we would expose this in our API (say setCodecPreferences for instance).

Specific interfaces that can only be used for the SFrame transform should go into the SFrame transform specification, not here. setCodecPreferences chooses the preference by the receiver about what the sender should send, it does not enforce either what the sender sends or what the receiver will have to handle.

setDepacketizer(PT, depacketizer) is an API that addresses the use case on the receiver side.

Add metadata to media content. In this case, I believe we want encoded content to stick to the same format (e.g. H264 would stay H264 and metadata would be put in SEI). We do not need to change the packetization format and I do not see the need to negotiate anything in the SDP. RTP header extensions might be a more meaningful extension point if need be.

We don't agree here. There is no generic metadata mechanism, and a number of codecs lack functionality for adding metadata.

Plug a new codec. It seems best to solve this with a separate API, I do not think we want to enshrine this kind of support in encoded transforms.

I don't agree here. I think we want to allow for experimentation of this kind, we should actively ensure that we make it possible to try out new codecs without changing the browser.

This plug a new codec API would probably need to let the web application handle both the encoding/decoding (UA does not know the content format) and the associated packetization (UA does not know the format so does not know the associated packetization format), in addition to SDP negotiation. Relying on the SFrame packetization for new codecs is probably not something we want, as per IETF feedback.

For this plug a new codec work, I would tend to start with a plug your own encoder API that could be used without packetization/SDP handling, for instance to let web apps fine tune a WebCodecs encoder setup (set QP per frame/macro-block for instance).

Doing this without SDP handling would make the current situation, where people send content that does not conform to the SDP describing what they claim to be sending, strictly worse. I would strongly oppose adding such an API to the PeerConnection family of APIs without solving the SDP issue first.

We could then address new formats on top of this API to handle SDP negotiation/RTP packetization. It could also be used for the metadata to media content use case since web applications may often carry VideoFrame and associated metadata together.

If we have to handle SDP negotiation and RTP packetization for this use case, and we know that we have other use cases we need to handle it for, why not accept the current proposal into the WD?

jan-ivar · 2023-10-16T18:46:06Z

Plug a new codec. It seems best to solve this with a separate API, I do not think we want to enshrine this kind of support in encoded transforms. For instance, a video encoder takes a VideoFrame as input and EncodedVideoChunk as output. A script transform takes an EncodedVideoChunk both for input and output.

I agree this suggests an API mismatch. I see evidence of this in the Lyra use case as well, which had to:

Use a special "L16" null-codec in Chrome to bypass encoding and pass raw audio directly to the transform
SDP-munge ptime back to 20, to counter Chrome's drop to 10 ms from the large amount of (uncompressed) input
Reverse the 16-bit endian of the input, which is already in network order, back to platform order for lyra-js

It's an impressive feat, but I imagine these problems would be amplified with video. It might need its own API.

Also, these two should be usable together: it makes sense to use a SFrame transform on a plugged-in codec simply with sender.transform = sframeTransform.

This makes sense to me as well. E.g. spitballing:

sender.encoder = new RTCRtpScriptEncoder(worker, options);
sender.transform = new SFrameTransform({role: "encrypt"});

?

alvestrand · 2023-10-16T19:23:59Z

My preferential API for chained transforms is readable.pipeThrough(transform1).pipeThrough(transform2).pipeTo(writable).

It is possible to write chain links that permit this syntax when using RTCRtpScriptTransformer.

alvestrand · 2024-01-07T23:15:02Z

My point was really that if we want to add more attributes to addTransceiver(), we should not add more arguments.
Actually addTransceiver() already takes an RTCRtpTransceiverInit with 3 existing members - adding a transform here would be consistent with the existing interface.

jan-ivar · 2024-01-08T21:46:39Z

Actually addTransceiver() already takes an RTCRtpTransceiverInit with 3 existing members - adding a transform here would be consistent with the existing interface.

I've filed #221. My point here was merely to highlight that a transceiver is primarily the sender and receiver.

jan-ivar · 2024-01-08T21:53:40Z

... That way, in some wonderful future where one can construct RtpSender and RtpReceiver objects without SDP, you don't need RtpTransceivers to exist.

I agree, so we should stop adding new methods to RtpTransceiver if we can avoid it.

I favor SDP-agnostic options on new RTCRtpScriptTransform over the more SDP-specific transceiver.addCodec.

Is it SDP-specific for JS to declare the input/output codec of the transform it performs? — Seems useful to the worker regardless of SDP. E.g. based on the transform-based part of what @alvestrand and I have been working on:

    transceiver.sender.transform = new RTCRtpScriptTransform(worker, {
      inputCodecs: [{mimeType: "video/vp8"}],
      outputCodecs: [{mimeType: "video/custom-encrypted"}]
    });
    transceiver.receiver.transform = new RTCRtpScriptTransform(worker, {
      inputCodecs: [{mimeType: "video/custom-encrypted"}],
      outputCodecs: [{mimeType: "video/vp8"}]
    });

This nicely communicates the task to the (reused) worker (removing the need to invent a side property in my fiddle).

jan-ivar

@alvestrand Here's the transform example you asked for. Hope it looks OK and thanks for your patience!

I was able to reuse the same worker function. I didn't touch on acceptOnlyInputCodecs since the JS does that (but it might be a nice optimization).

sdp-explainer.md

Separate the two proposals better Co-authored-by: Jan-Ivar Bruaroey <jan-ivar@users.noreply.github.com>

alvestrand · 2024-01-12T10:07:40Z

After writing up some considerations for how codecs are described in w3c/webrtc-pc#2925 I am starting to feel that the transform proposal fits better - but it would be described as modifying the codec list when a transform is added.

Comments?

Co-authored-by: Jan-Ivar Bruaroey <jan-ivar@users.noreply.github.com>

Co-authored-by: Philipp Hancke <fippo@goodadvice.pages.de>

dontcallmedom-bot · 2024-02-21T07:47:28Z

This issue was mentioned in WEBRTCWG-2024-02-20 (Page 14)

alvestrand · 2024-02-27T13:36:41Z

@jan-ivar this is incomplete, but does it so far conform with what you think we've agreed on?

alvestrand · 2024-04-18T08:12:05Z

I think this is ready for an editors review now - I believe the spec changes are in a state where they are good to merge, and reflect the API used in the explainer.

Please take a look.

dontcallmedom-bot · 2024-04-23T20:11:19Z

This issue had an associated resolution in WebRTC April 23 2024 meeting – 23 April 2024 (Custom Codecs):

RESOLUTION: Consensus on on #186, discussion to continue on #202

alvestrand · 2024-04-25T14:24:09Z

index.bs

+  sequence&lt;RTCRtpCodec&gt; inputCodecs;
+  sequence&lt;RTCRtpCodec&gt; outputCodecs;
+  boolean acceptOnlyInputCodecs = false;
+};


there are specs that take an ANY and transform them to a dictionary. @jan-ivar will find an example.

From streams:

"Let underlyingSourceDict be underlyingSource, converted to an IDL value of type UnderlyingSource."

In that spec, underlyingSource is an object, vs ours which is any, so we'd probably want to check against object before doing this to avoid a TypeError here. (more below)

alvestrand · 2024-04-25T14:30:01Z

Aim to set editors can integrate on next Thursday (May 2, 2024)

jan-ivar

Sorry for the delay in reviewing this. Some questions still around how this all fits together with custom codecs, sframe etc.

jan-ivar · 2024-05-20T14:31:54Z

index.bs

+  sequence&lt;RTCRtpCodec&gt; inputCodecs;
+  sequence&lt;RTCRtpCodec&gt; outputCodecs;
+  boolean acceptOnlyInputCodecs = false;
+};


From streams:

"Let underlyingSourceDict be underlyingSource, converted to an IDL value of type UnderlyingSource."

In that spec, underlyingSource is an object, vs ours which is any, so we'd probably want to check against object before doing this to avoid a TypeError here. (more below)

jan-ivar · 2024-05-20T16:59:57Z

index.bs

@@ -130,6 +130,7 @@ The <dfn abstract-op>readEncodedData</dfn> algorithm is given a |rtcObject| as p
 1. Let |frame| be the newly produced frame.
 1. Set |frame|.`[[owner]]` to |rtcObject|.
 1. Set |frame|.`[[counter]]` to |rtcObject|.`[[lastEnqueuedFrameCounter]]`.
+1. If |rtcObject|'s |transform|'s |options| contains {{RTCRtpScriptTransformParameters/acceptOnlyInputCodecs}} with the value True, and the {{RTCEncodedVideoFrameMetadata/mimeType}} of |frame| does not match any codec in |transform|'s |options|' |inputCodecs|, execute the [$writeEncodedData$] algorithm given |rtcObject| and |frame|.


First a nit: Maybe set an [[acceptOnlyInputCodecs]] internal slot to avoid referencing the options object which gets transferred? This is meant to be an invariant, and would avoid implying cross-thread access.

Also, while reviewing this, I realized the spec is confusing atm wrt what threads the various objects are on (rtcObject is always on main-thread), so I've filed #229 which should help. I hope to do a PR there soon. — To not block on that, I think we can still merge this with the understanding that #229 might move things in the right place later, if that's OK with everyone.

But on to the substance of this line:

The writeEncodedData is the transformer.writable's writeAlgorithm, i.e. it's underlying sink. The streams spec normally manages when this is called. This says to invoke that algorithm directly, which I worry might interfere with the streams spec's view of what is happening since it owns when to call its underlying sink. E.g. this might bypass any frames queued on the writable's internal queue that the streams spec maintains.

I'd be more comfortable if we could find a more WebRTC-specific shortcut that doesn't interact with the streams spec here. If I understand correctly, the goal was to bypass the streams machinery here anyway. Otherwise this optimization doesn't seem worthwhile.

Shouldn't we also abort the remaining steps in that case? I.e. don't enqueue the frame on the readable?

jan-ivar · 2024-05-20T19:54:16Z

index.bs

@@ -146,7 +147,7 @@ The <dfn abstract-op>writeEncodedData</dfn> algorithm is given a |rtcObject| as

 On sender side, as part of [$readEncodedData$], frames produced by |rtcObject|'s encoder MUST be enqueued in |rtcObject|.`[[readable]]` in the encoder's output order.
 As [$writeEncodedData$] ensures that the transform cannot reorder frames, the encoder's output order is also the order followed by packetizers to generate RTP packets and assign RTP packet sequence numbers.
-The packetizer may expect the transformed data to still conform to the original format, e.g. a series of NAL units separated by Annex B start codes.
+The packetizer will expect the transformed data to conform to the format indicated by its {{RTCEncodedVideoFrameMetadata/mimeType}}.


I think we need to define requirements on the packetizer here, e.g. what MUST happen if it doesn't or can't conform.

jan-ivar · 2024-05-20T21:03:57Z

index.bs

+    <dt>
+        <dfn dict-member>frameId</dfn> <span class="idlMemberType">unsigned long long</span>
+    </dt>
+    <dd>
+        <p>
+            An identifier for the encoded frame, monotonically increasing in decode order. Its lower
+            16 bits match the frame_number of the AV1 Dependency Descriptor Header Extension defined in Appendix A of [[?AV1-RTP-SPEC]], if present.
+            Only present for received frames if the Dependency Descriptor Header Extension is present.
+        </p>
+    </dd>
+    <dt>
+        <dfn dict-member>dependencies</dfn> <span class="idlMemberType">sequence&lt;unsigned long long&gt;</span>
+    </dt>
+    <dd>
+        <p>
+            List of frameIds of frames this frame references.
+            Only present for received frames if the AV1 Dependency Descriptor Header Extension defined in Appendix A of [[?AV1-RTP-SPEC]] is present.
+        </p>
+    </dd>


Maybe a separate PR for this or explain its relationship? I don't get it.

jan-ivar · 2024-05-22T21:50:49Z

index.bs

+dictionary RTCRtpScriptTransformParameters {
+  sequence&lt;RTCRtpCodec&gt; inputCodecs;
+  sequence&lt;RTCRtpCodec&gt; outputCodecs;
+  boolean acceptOnlyInputCodecs = false;


Bikeshed: With "accept" I worry it's not super clear to what happens to frames that are not "accepted". Especially for E2EE, we'd want to make it super clear that frames NOT accepted are sent through untransformed.

In the explainer you refer to it as a "filter". How about filterOnInputCodecs or transformInputCodecsOnly?

Also, I forget: why can't we default this to true or even remove the option to set it to false?

jan-ivar · 2024-05-22T22:01:11Z

index.bs

+The member "packetizationMode" is added to the {{RTCRtpCodec}} definition; its value is a MIME type. When this is present, the rules for packetizing and depacketizing are those of the named MIME type. For codecs not supported by the platform, this MUST be present.
+
+<pre class='idl'>
+partial dictionary RTCRtpCodec {
+  DOMString packetizationMode;
+};


Wasn't the idea with having both inputCodecs and outputCodecs that we could infer the packetizationMode from the inputCodecs? e.g. from #186 (comment).

RTCRtpCodec is used all over the place, including setCodecPreferences, set/getParameters, and the codec match algorithm, and it's not super-clear to me when the user agent is supposed to populate this member or not, or even why it needs to be exposed when the application needs to know this a priori.

The only example uses packetizationMode: video/sframe which is a bit confusing to me. Sorry if I'm a bit rusty on this.

It's also not used in any algorithm, so it's not clear what its function is.

jan-ivar · 2024-05-22T22:15:17Z

index.bs

 ## Operations ## {#RTCRtpScriptTransform-operations}

 The <dfn constructor for="RTCRtpScriptTransform" lt="RTCRtpScriptTransform(worker, options)"><code>new RTCRtpScriptTransform(|worker|, |options|, |transfer|)</code></dfn> constructor steps are:
 1. Set |t1| to an [=identity transform stream=].
 2. Set |t2| to an [=identity transform stream=].
 3. Set |this|.`[[writable]]` to |t1|.`[[writable]]`.
 4. Set |this|.`[[readable]]` to |t2|.`[[readable]]`.
+4. Set |this|.`[[options]]?` to the result of converting |options| to an RTCRtpScriptTransformParameters dictionary.


Based on comment above, something like:

Suggested change

4. Set |this|.`[[options]]?` to the result of converting |options| to an RTCRtpScriptTransformParameters dictionary.

4. Set |this|.`[[options]]` to |options|.

5. Let |optionsObject| be |options| if |options| is of type [=object=], otherwise `{}` .

6. Set |this|.`[[optionsDict]]` to the result of converting |optionsObject| to an RTCRtpScriptTransformParameters dictionary.

alvestrand marked this pull request as draft June 15, 2023 14:39

henbos reviewed Jun 15, 2023

View reviewed changes

index.bs Outdated Show resolved Hide resolved

alvestrand mentioned this pull request Jun 15, 2023

Need to mark codec-related definitions for export w3c/webrtc-pc#2881

Open

fippo reviewed Jun 20, 2023

View reviewed changes

aboba requested changes Jun 27, 2023

View reviewed changes

jtsiskin reviewed Jun 27, 2023

View reviewed changes

jan-ivar reviewed Aug 23, 2023

View reviewed changes

jan-ivar reviewed Aug 24, 2023

View reviewed changes

alvestrand commented Aug 29, 2023

View reviewed changes

sdp-explainer.md Outdated Show resolved Hide resolved

sdp-explainer.md Outdated Show resolved Hide resolved

sdp-explainer.md Outdated Show resolved Hide resolved

alvestrand mentioned this pull request Aug 31, 2023

SDP mangling: control encoding or control packetization? #199

Closed

dontcallmedom reviewed Oct 6, 2023

View reviewed changes

index.bs Outdated Show resolved Hide resolved

index.bs Outdated Show resolved Hide resolved

alvestrand marked this pull request as ready for review October 10, 2023 12:17

jan-ivar mentioned this pull request Jan 8, 2024

Should RTCRtpTransceiverInit have sendTransform and receiveTransform members? #221

Open

jan-ivar reviewed Jan 11, 2024

View reviewed changes

sdp-explainer.md Outdated Show resolved Hide resolved

sdp-explainer.md Outdated Show resolved Hide resolved

alvestrand and others added 2 commits January 12, 2024 11:06

Update sdp-explainer.md

2958844

Separate the two proposals better Co-authored-by: Jan-Ivar Bruaroey <jan-ivar@users.noreply.github.com>

Update sdp-explainer.md

78fbfbc

Separate the two proposals better Co-authored-by: Jan-Ivar Bruaroey <jan-ivar@users.noreply.github.com>

alvestrand and others added 6 commits January 25, 2024 11:56

Update sdp-explainer.md

a975401

Co-authored-by: Jan-Ivar Bruaroey <jan-ivar@users.noreply.github.com>

Update index.bs

8aa1533

Co-authored-by: Jan-Ivar Bruaroey <jan-ivar@users.noreply.github.com>

Update index.bs

bd0c908

Co-authored-by: Jan-Ivar Bruaroey <jan-ivar@users.noreply.github.com>

Update index.bs

bbc7f6b

Co-authored-by: Jan-Ivar Bruaroey <jan-ivar@users.noreply.github.com>

Update sdp-explainer.md

0ca754b

Co-authored-by: Philipp Hancke <fippo@goodadvice.pages.de>

Fix "pre" section end

ccf6ea8

alvestrand added the Discuss at Next Meeting label Jan 30, 2024

alvestrand self-assigned this Feb 6, 2024

alvestrand added 2 commits February 27, 2024 13:34

Update sdp explainer to remove transceiver alternative

6206a46

Describe the enabling of codecs for SDP usage

78fa520

alvestrand added 2 commits April 17, 2024 15:38

Fix acceptOnlyInputCodecs

d9b390d

Added packetizationMode description

604da65

Use Codec not CodecParameters

36fd11b

dontcallmedom-bot mentioned this pull request Apr 23, 2024

IDL changes for setMetadata for 3.2.2 webrtc nv use case #202

Closed

alvestrand commented Apr 25, 2024

View reviewed changes

alvestrand removed the Discuss at Next Meeting label Apr 25, 2024

jan-ivar reviewed May 22, 2024

View reviewed changes

		1. The {{RTCPeerConnection/[[CustomSendCodecs]]}} is included in the list of codecs which the user agent
		is currently capable of sending; the {{RTCPeerConnection/[[CustomReceiveCodecs]]}} is included in the list of codecs which the user agent is currently prepared to receive, as referenced in "set a session description" steps.

-. Set |this|.`[[options]]?` to the result of converting |options| to an RTCRtpScriptTransformParameters dictionary.
+. Set |this|.`[[options]]` to |options|.
+. Let |optionsObject| be |options| if |options| is of type [=object=], otherwise `{}` .
+. Set |this|.`[[optionsDict]]` to the result of converting |optionsObject| to an RTCRtpScriptTransformParameters dictionary.

Add description of an API for controlling SDP codec negotiation #186

Are you sure you want to change the base?

Add description of an API for controlling SDP codec negotiation #186

Conversation

alvestrand commented Jun 14, 2023 • edited by pr-preview bot Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dontcallmedom-bot commented Jun 28, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jan-ivar left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alvestrand left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alvestrand left a comment

Choose a reason for hiding this comment

alvestrand commented Oct 6, 2023

dontcallmedom left a comment

Choose a reason for hiding this comment

dontcallmedom commented Oct 9, 2023

Philipel-WebRTC commented Oct 9, 2023 • edited Loading

alvestrand commented Oct 10, 2023

Philipel-WebRTC commented Oct 10, 2023 • edited Loading

alvestrand commented Oct 10, 2023

alvestrand commented Oct 11, 2023

alvestrand commented Oct 11, 2023

youennf commented Oct 13, 2023

alvestrand commented Oct 13, 2023

jan-ivar commented Oct 13, 2023

alvestrand commented Oct 15, 2023

youennf commented Oct 16, 2023

alvestrand commented Oct 16, 2023

jan-ivar commented Oct 16, 2023

alvestrand commented Oct 16, 2023

alvestrand commented Jan 7, 2024

jan-ivar commented Jan 8, 2024

jan-ivar commented Jan 8, 2024 • edited Loading

jan-ivar left a comment

Choose a reason for hiding this comment

alvestrand commented Jan 12, 2024

dontcallmedom-bot commented Feb 21, 2024

alvestrand commented Feb 27, 2024

alvestrand commented Apr 18, 2024

dontcallmedom-bot commented Apr 23, 2024

Choose a reason for hiding this comment

jan-ivar May 20, 2024 • edited Loading

Choose a reason for hiding this comment

alvestrand commented Apr 25, 2024

jan-ivar left a comment

Choose a reason for hiding this comment

jan-ivar May 20, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alvestrand commented Jun 14, 2023 •

edited by pr-preview bot

Loading

Philipel-WebRTC commented Oct 9, 2023 •

edited

Loading

Philipel-WebRTC commented Oct 10, 2023 •

edited

Loading

jan-ivar commented Jan 8, 2024 •

edited

Loading

jan-ivar May 20, 2024 •

edited

Loading

jan-ivar May 20, 2024 •

edited

Loading