Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] MSC3898: Native Matrix VoIP signalling for cascaded foci (SFUs, MCUs...) #3898

Draft
wants to merge 30 commits into
base: main
Choose a base branch
from
Draft
Changes from 1 commit
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
750087f
Native Matrix VoIP signalling for cascaded SFUs
SimonBrandner Sep 25, 2022
aa53398
Update MSC number
SimonBrandner Sep 25, 2022
de302cb
Link to diagrams from MSC3401
SimonBrandner Oct 2, 2022
7474782
Use correct number for file
SimonBrandner Oct 2, 2022
5cad46d
Update sub and unsub ops
SimonBrandner Nov 11, 2022
2cbc2d6
Merge remote-tracking branch 'upstream/main' into SimonBrandner/msc/sfu
SimonBrandner Nov 11, 2022
f542fcb
Give a reason for specifying res in metadata
SimonBrandner Nov 11, 2022
6f01a94
Specify foci by `device_id` too
SimonBrandner Nov 12, 2022
575e16c
Fixup some json
SimonBrandner Nov 12, 2022
33b1880
Typo
SimonBrandner Nov 12, 2022
65faee4
Specify how to handle foci better
SimonBrandner Nov 13, 2022
9882c97
Amend TODOs
SimonBrandner Nov 13, 2022
c66bbe4
Add rationale behind usage of data channels
daniel-abramov Nov 15, 2022
1b2d740
Add TODO
SimonBrandner Dec 2, 2022
feb064b
Update event types
SimonBrandner Dec 2, 2022
d96d101
Add unstable prefixes
SimonBrandner Dec 2, 2022
d538e1e
Use `subscribe` instead of `select`
SimonBrandner Dec 6, 2022
91470a2
`op` -> `event`
SimonBrandner Dec 6, 2022
2ef7425
Fixup formatting
SimonBrandner Dec 6, 2022
5a186e4
Use `content`
SimonBrandner Dec 6, 2022
b461525
Namespace things
SimonBrandner Dec 6, 2022
e49e80d
Further namespacing
SimonBrandner Dec 6, 2022
6b3fd47
Update the events to match current Matrix
SimonBrandner Dec 6, 2022
bf52e02
Fix typo
SimonBrandner Dec 7, 2022
f81dd9d
Use `subscribe`/`unsbuscribe`
SimonBrandner Dec 7, 2022
9c32b96
Add informational section on active/preferred foci.
dbkr Dec 8, 2022
6f8c9d1
Change keepalives to ping/pong
dbkr Dec 8, 2022
ecf2425
Add empty line
SimonBrandner Dec 8, 2022
bf04b17
Fix event name
SimonBrandner Dec 9, 2022
1896fc7
Remove encryption section as it's glossing over details
SimonBrandner Dec 12, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
115 changes: 52 additions & 63 deletions proposals/3898-sfu.md
Original file line number Diff line number Diff line change
Expand Up @@ -168,8 +168,8 @@ be able to distinguish them, this therefore build on
[MSC3077](https://github.com/matrix-org/matrix-spec-proposals/pull/3077) and
[MSC3291](https://github.com/matrix-org/matrix-spec-proposals/pull/3291) to
provide the client with the necessary metadata. Some of the data-channel events
include an `m.metadata` field including a description of the stream being sent
either from the SFU to the client or from the client to the SFU.
include an `sdp_stream_metadata` field including a description of the stream
being sent either from the SFU to the client or from the client to the SFU.

Other than mute information and stream purpose, the metadata includes video
track resolution. The SFU may not be able to determine the resolution of the
Expand All @@ -195,25 +195,35 @@ in the metadata.

#### Event types

##### Subscribe
This MSC adds a few new `m.call.*` events and extends a few of the existing ones.

This event is sent by the client to request a set of tracks. In the case of
video tracks the client can also request a specific resolution of a given a
track; this resolution is a resolution the client wishes to receive but the SFU
may send a lower one due to bandwidth etc.
##### `m.call.track_subscription`

This event is sent to the focus to let it know about the tracks the client would
like to start/stop subscribing to.

Upon receiving this event, a focus should make the subscribe changes based on
the `start` and `stop` arrays and respond with an `m.call.negotiate` event.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we always respond to the m.call.negotiate (we may re-use the transceiver if there is such a possibility)? Maybe we can just mention that the server may reply with the m.call.negotiate if it's practical/necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd stick with the current wording until we figure out something better and more specific


In the case of video tracks, in the `start` array the client may also request a
specific resolution for a given track; this resolution is a resolution the
client wishes to receive but the SFU may send a lower one due to bandwidth etc.

If the user for example switches from "spotlight" (one large tile) to "grid"
(multiple small tiles) view, it should also send this request to let the SFU
know of the resolution change.
(multiple small tiles) view, it should also send this event with the updated
resolution in the `start` array to let the focus know of the resolution change.

Clients may request each track only once: foci should ignore multiple requests
of the same track.

- **TODO: how do we prove to the SFU that we have the right to subscribe to
- **TODO: how do we prove to the focus that we have the right to subscribe to
track?**

```json
{
"type": "m.subscribe",
"type": "m.call.track_subscription",
"content": {
"m.start": [
"start": [
{
"stream_id": "streamId1",
"track_id": "trackId1",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should include the user ID of the user sending the track we want here? That way we're not relying on stream/track IDs being globally unique (plus it will make the the signalling much easier to understand when looking at it). The stream ID feels unnecessary in either case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, interesting point, perhaps device_id as well? So it would be (track_id, device_id, track_id)? @daniel-abramov, what do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though I guess that if we have these we might as well leave the stream_id there for flexibility....

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems like a part of a discussion that we've recently had about the stream/track IDs.

So far the trackIDs were unique regardless of the browser we used for the tests (we even changed the code of the waterfall to only rely on trackID when subscribing to tracks and it seems to work just fine and the handling is simpler and more elegant).

I think we have 2 options here:

  1. Either use trackID only (seems to be totally fine since trackIDs are GUIDs).
  2. Or use a tuple of track_id, device_id and stream_id that @SimonBrandner suggested in the comment above.

The current implementation in the SFU uses (1), which also it seems to be ok from the RFC's standpoint:

[..] A good practice is to use a UUID [rfc4122], which is 36 characters long in its canonical form. To avoid fingerprinting, implementations SHOULD use the forms in section 4.4 or 4.5 of RFC 4122 when generating UUIDs. [..]

I don't have a strong opinion, but I'm always biased toward elegant and simple solutions, so my personal preference would be an option (1).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, sorry - this is very similar, but github has hidden that comment as outdated. The RFC is only suggesting UUIDs as good practice though, so I'm not sure we can rely on it. Šimon's correct too in that we'd need the device ID too if we couldn't be sure that the track ID was globally unique.

Another thing we could do here is specify the SFU(s?) to get the stream from? I think this would mean we wouldn't need the the connect-to_focus message?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thing we could do here is specify the SFU(s?) to get the stream from? I think this would mean we wouldn't need the the connect-to_focus message?

For cascading? - Yeah, probably, but I have not yet thought through the whole cascading thing yet (but probably we could approach the cascading topic similar to what we did with the SFU conferencing: experiment with things in code and update an MSC once we gathered more information on what works).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, we shouldn't be using WebRTC track-ids at all (https://blog.mozilla.org/webrtc/the-evolution-of-webrtc/). We should identify by mids to the focus and either use this directly or make up our own ID to reference media here, mapping it to the mid on the focus with a stream_metadata.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thing we could do here is specify the SFU(s?) to get the stream from? I think this would mean we wouldn't need the the connect-to_focus message?

It's not very clear to me how that would work, tbh

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, we shouldn't be using WebRTC track-ids at all (https://blog.mozilla.org/webrtc/the-evolution-of-webrtc/). We should identify by mids to the focus and either use this directly or make up our own ID to reference media here, mapping it to the mid on the focus with a stream_metadata.

This is a good point, @dbkr. I also read this article in the past, but got confused and ignored the conclusion since I saw that the approach of using stream IDs and track IDs did seem to work for the EC despite that article from Mozilla stating that it's a no go (other, newer articles had similar conclusions).

I tried to correlate the information between that particle + another article on transceivers from Mozilla + webrtcforthecurious + WebRTC API docs from Mozilla to understand what's the correct way to tackle this problem.

Since my notes were rather large for a comment, I've created a discussion page for that as we agreed.

Please take a look: https://github.com/vector-im/voip-internal/discussions/79

Expand All @@ -226,99 +236,78 @@ track?**
"width": 256,
"height": 144
}
]
}
}
```

##### Unsubscribe

If a client no longer wishes to be subscribed to a track, it should send this event.

```json
{
"type": "m.unsubscribe",
"content": {
"m.stop": [
],
"stop": [
{
"stream_id": "streamId1",
"track_id": "trackId1"
"stream_id": "streamId3",
"track_id": "trackId4"
},
{
"stream_id": "streamId2",
"track_id": "trackId2"
"stream_id": "streamId4",
"track_id": "trackId4"
}
]
}
}
```

##### Offer
##### `m.call.negotiate`

Whenever the client/focus creates an SDP offer, it should send it over to the
other side using this event. The other side should then respond with an `m.answer`
event.
This event works exactly like the `m.call.negotiate` event in 1:1 calls.

```json
{
"type": "m.offer",
"type": "m.call.negotiate",
"content": {
"m.sdp": "..."
"description": {
"type": "offer",
"sdp": "..."
},
"sdp_stream_metadata": {...} // As specified in the Metadata section
}
}
```

##### Answer

Whenever the client/focus creates an SDP answer in response to an SDP offer, it
should send it over to the other side using this event.

```json
{
"type": "m.answer",
"content": {
"m.sdp": "..."
}
}
```
##### `m.call.sdp_stream_metadata`

##### Metadata
This event works very similarly to the 1:1 call `m.call.sdp_stream_metadata`.

Whenever the metadata changes (e.g. mute state changes happen), the client/focus
can send an `m.metadata` event which includes an `m.metadata` field.
- **TODO: Spec how foci actually use this to advertise tracks**

```json
{
"type": "m.metadata",
"type": "m.call.sdp_stream_metadata",
"content": {
"m.metadata": {...} // As specified in the Metadata section
"sdp_stream_metadata": {...} // As specified in the Metadata section
}
}
```

##### Keep-alive
##### `m.call.keep_alive`

Clients should send `alive` message to foci every so often. If the client does
not send an `alive` message for 30 seconds, the focus should hang up.
Clients should send an `m.call.keep_alive` event to foci every so often. If
the client does not send an `m.call.keep_alive` event for 30 seconds, the
SimonBrandner marked this conversation as resolved.
Show resolved Hide resolved
focus should hang up.

- **TODO: should this be configurable somehow?**

```json
{
"type": "m.alive",
"type": "m.call.keep_alive",
"content": {}
}
```

##### Connect to focus
##### `m.call.connect_to_focus`

If a user is using their SFU in a call, it will need to know how to connect to
other SFUs present in order to participate in the full-mesh of SFU traffic (if
any). The client is responsible for doing this using the `connect` event.
If a user is using their focus in a call, it will need to know how to connect to
other foci present in order to participate in the full-mesh of SFU traffic (if
any). The client is responsible for doing this using the
`m.call.connect_to_focus` event.

```json
{
"type": "m.connect_to_focus",
"type": "m.call.connect_to_focus",
"content": {
// TODO: How should this look?
}
Expand Down