-
Notifications
You must be signed in to change notification settings - Fork 379
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] MSC3898: Native Matrix VoIP signalling for cascaded foci (SFUs, MCUs...) #3898
base: main
Are you sure you want to change the base?
Changes from 1 commit
750087f
aa53398
de302cb
7474782
5cad46d
2cbc2d6
f542fcb
6f01a94
575e16c
33b1880
65faee4
9882c97
c66bbe4
1b2d740
feb064b
d96d101
d538e1e
91470a2
2ef7425
5a186e4
b461525
e49e80d
6b3fd47
bf52e02
f81dd9d
9c32b96
6f8c9d1
ecf2425
bf04b17
1896fc7
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -168,8 +168,8 @@ be able to distinguish them, this therefore build on | |
[MSC3077](https://github.com/matrix-org/matrix-spec-proposals/pull/3077) and | ||
[MSC3291](https://github.com/matrix-org/matrix-spec-proposals/pull/3291) to | ||
provide the client with the necessary metadata. Some of the data-channel events | ||
include an `m.metadata` field including a description of the stream being sent | ||
either from the SFU to the client or from the client to the SFU. | ||
include an `sdp_stream_metadata` field including a description of the stream | ||
being sent either from the SFU to the client or from the client to the SFU. | ||
|
||
Other than mute information and stream purpose, the metadata includes video | ||
track resolution. The SFU may not be able to determine the resolution of the | ||
|
@@ -195,25 +195,35 @@ in the metadata. | |
|
||
#### Event types | ||
|
||
##### Subscribe | ||
This MSC adds a few new `m.call.*` events and extends a few of the existing ones. | ||
|
||
This event is sent by the client to request a set of tracks. In the case of | ||
video tracks the client can also request a specific resolution of a given a | ||
track; this resolution is a resolution the client wishes to receive but the SFU | ||
may send a lower one due to bandwidth etc. | ||
##### `m.call.track_subscription` | ||
|
||
This event is sent to the focus to let it know about the tracks the client would | ||
like to start/stop subscribing to. | ||
|
||
Upon receiving this event, a focus should make the subscribe changes based on | ||
the `start` and `stop` arrays and respond with an `m.call.negotiate` event. | ||
|
||
In the case of video tracks, in the `start` array the client may also request a | ||
specific resolution for a given track; this resolution is a resolution the | ||
client wishes to receive but the SFU may send a lower one due to bandwidth etc. | ||
|
||
If the user for example switches from "spotlight" (one large tile) to "grid" | ||
(multiple small tiles) view, it should also send this request to let the SFU | ||
know of the resolution change. | ||
(multiple small tiles) view, it should also send this event with the updated | ||
resolution in the `start` array to let the focus know of the resolution change. | ||
|
||
Clients may request each track only once: foci should ignore multiple requests | ||
of the same track. | ||
|
||
- **TODO: how do we prove to the SFU that we have the right to subscribe to | ||
- **TODO: how do we prove to the focus that we have the right to subscribe to | ||
track?** | ||
|
||
```json | ||
{ | ||
"type": "m.subscribe", | ||
"type": "m.call.track_subscription", | ||
"content": { | ||
"m.start": [ | ||
"start": [ | ||
{ | ||
"stream_id": "streamId1", | ||
"track_id": "trackId1", | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe we should include the user ID of the user sending the track we want here? That way we're not relying on stream/track IDs being globally unique (plus it will make the the signalling much easier to understand when looking at it). The stream ID feels unnecessary in either case. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm, interesting point, perhaps There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Though I guess that if we have these we might as well leave the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That seems like a part of a discussion that we've recently had about the stream/track IDs. So far the I think we have 2 options here:
The current implementation in the SFU uses (1), which also it seems to be ok from the RFC's standpoint:
I don't have a strong opinion, but I'm always biased toward elegant and simple solutions, so my personal preference would be an option (1). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah yes, sorry - this is very similar, but github has hidden that comment as outdated. The RFC is only suggesting UUIDs as good practice though, so I'm not sure we can rely on it. Šimon's correct too in that we'd need the device ID too if we couldn't be sure that the track ID was globally unique. Another thing we could do here is specify the SFU(s?) to get the stream from? I think this would mean we wouldn't need the the connect-to_focus message? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
For cascading? - Yeah, probably, but I have not yet thought through the whole cascading thing yet (but probably we could approach the cascading topic similar to what we did with the SFU conferencing: experiment with things in code and update an MSC once we gathered more information on what works). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Actually, we shouldn't be using WebRTC track-ids at all (https://blog.mozilla.org/webrtc/the-evolution-of-webrtc/). We should identify by mids to the focus and either use this directly or make up our own ID to reference media here, mapping it to the mid on the focus with a stream_metadata. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
It's not very clear to me how that would work, tbh There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This is a good point, @dbkr. I also read this article in the past, but got confused and ignored the conclusion since I saw that the approach of using stream IDs and track IDs did seem to work for the EC despite that article from Mozilla stating that it's a no go (other, newer articles had similar conclusions). I tried to correlate the information between that particle + another article on transceivers from Mozilla + webrtcforthecurious + WebRTC API docs from Mozilla to understand what's the correct way to tackle this problem. Since my notes were rather large for a comment, I've created a discussion page for that as we agreed. Please take a look: https://github.com/vector-im/voip-internal/discussions/79 |
||
|
@@ -226,99 +236,78 @@ track?** | |
"width": 256, | ||
"height": 144 | ||
} | ||
] | ||
} | ||
} | ||
``` | ||
|
||
##### Unsubscribe | ||
|
||
If a client no longer wishes to be subscribed to a track, it should send this event. | ||
|
||
```json | ||
{ | ||
"type": "m.unsubscribe", | ||
"content": { | ||
"m.stop": [ | ||
], | ||
"stop": [ | ||
{ | ||
"stream_id": "streamId1", | ||
"track_id": "trackId1" | ||
"stream_id": "streamId3", | ||
"track_id": "trackId4" | ||
}, | ||
{ | ||
"stream_id": "streamId2", | ||
"track_id": "trackId2" | ||
"stream_id": "streamId4", | ||
"track_id": "trackId4" | ||
} | ||
] | ||
} | ||
} | ||
``` | ||
|
||
##### Offer | ||
##### `m.call.negotiate` | ||
|
||
Whenever the client/focus creates an SDP offer, it should send it over to the | ||
other side using this event. The other side should then respond with an `m.answer` | ||
event. | ||
This event works exactly like the `m.call.negotiate` event in 1:1 calls. | ||
|
||
```json | ||
{ | ||
"type": "m.offer", | ||
"type": "m.call.negotiate", | ||
"content": { | ||
"m.sdp": "..." | ||
"description": { | ||
"type": "offer", | ||
"sdp": "..." | ||
}, | ||
"sdp_stream_metadata": {...} // As specified in the Metadata section | ||
} | ||
} | ||
``` | ||
|
||
##### Answer | ||
|
||
Whenever the client/focus creates an SDP answer in response to an SDP offer, it | ||
should send it over to the other side using this event. | ||
|
||
```json | ||
{ | ||
"type": "m.answer", | ||
"content": { | ||
"m.sdp": "..." | ||
} | ||
} | ||
``` | ||
##### `m.call.sdp_stream_metadata` | ||
|
||
##### Metadata | ||
This event works very similarly to the 1:1 call `m.call.sdp_stream_metadata`. | ||
|
||
Whenever the metadata changes (e.g. mute state changes happen), the client/focus | ||
can send an `m.metadata` event which includes an `m.metadata` field. | ||
- **TODO: Spec how foci actually use this to advertise tracks** | ||
|
||
```json | ||
{ | ||
"type": "m.metadata", | ||
"type": "m.call.sdp_stream_metadata", | ||
"content": { | ||
"m.metadata": {...} // As specified in the Metadata section | ||
"sdp_stream_metadata": {...} // As specified in the Metadata section | ||
} | ||
} | ||
``` | ||
|
||
##### Keep-alive | ||
##### `m.call.keep_alive` | ||
|
||
Clients should send `alive` message to foci every so often. If the client does | ||
not send an `alive` message for 30 seconds, the focus should hang up. | ||
Clients should send an `m.call.keep_alive` event to foci every so often. If | ||
the client does not send an `m.call.keep_alive` event for 30 seconds, the | ||
SimonBrandner marked this conversation as resolved.
Show resolved
Hide resolved
|
||
focus should hang up. | ||
|
||
- **TODO: should this be configurable somehow?** | ||
|
||
```json | ||
{ | ||
"type": "m.alive", | ||
"type": "m.call.keep_alive", | ||
"content": {} | ||
} | ||
``` | ||
|
||
##### Connect to focus | ||
##### `m.call.connect_to_focus` | ||
|
||
If a user is using their SFU in a call, it will need to know how to connect to | ||
other SFUs present in order to participate in the full-mesh of SFU traffic (if | ||
any). The client is responsible for doing this using the `connect` event. | ||
If a user is using their focus in a call, it will need to know how to connect to | ||
other foci present in order to participate in the full-mesh of SFU traffic (if | ||
any). The client is responsible for doing this using the | ||
`m.call.connect_to_focus` event. | ||
|
||
```json | ||
{ | ||
"type": "m.connect_to_focus", | ||
"type": "m.call.connect_to_focus", | ||
"content": { | ||
// TODO: How should this look? | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we always respond to the
m.call.negotiate
(we may re-use the transceiver if there is such a possibility)? Maybe we can just mention that the server may reply with them.call.negotiate
if it's practical/necessary.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd stick with the current wording until we figure out something better and more specific