Skip to content

Commit af1fec4

Browse files
authored
Further improvements to Introduction to Elixir WebRTC tutorial (#141)
1 parent be18ebc commit af1fec4

File tree

6 files changed

+250
-173
lines changed

6 files changed

+250
-173
lines changed

guides/introduction/modifying.md renamed to guides/advanced/modifying.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Modifying the session
22

3-
So far, we focused on forwarding the data back to the same peer. Usually, you want to connect with multiple peers, which means adding
3+
In the introductory tutorials we focused on forwarding the data back to the same peer. Usually, you want to connect with multiple peers, which means adding
44
more PeerConnection to the Elixir app, like in the diagram below.
55

66
```mermaid
@@ -31,7 +31,7 @@ new negotiation has to take place!
3131
>
3232
> But what does that even mean?
3333
> Each transceiver is responsible for sending and/or receiving a single track. When you call `PeerConnection.add_track`, we actually look for a free transceiver
34-
> (that is, one that is not sending a track already) and use it, or create a new transceiver if we don' find anything suitable. If you are very sure
34+
> (that is, one that is not sending a track already) and use it, or create a new transceiver if we don't find anything suitable. If you are very sure
3535
> that the remote peer added _N_ new video tracks, you can add _N_ video transceivers (using `PeerConnection.add_transceiver`) and begin the negotiation as
3636
> the offerer. If you didn't add the transceivers, the tracks added by the remote peer (the answerer) would be ignored.
3737

guides/introduction/consuming.md

Lines changed: 122 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,27 +1,136 @@
11
# Consuming media data
22

3-
Other than just forwarding, we probably would like to be able to use the media right in the Elixir app to
4-
e..g feed it to a machine learning model or create a recording of a meeting.
3+
Other than just forwarding, we would like to be able to use the media right in the Elixir app to e.g.
4+
use it as a machine learning model input, or create a recording of a meeting.
55

6-
In this tutorial, we are going to build on top of the simple app from the previous tutorial by, instead of just sending the packets back, depayloading and decoding
7-
the media, using a machine learning model to somehow augment the video, encode and payload it back into RTP packets and only then send it to the web browser.
6+
In this tutorial, we are going to learn how to use received media as input for ML inference.
87

9-
## Deplayloading RTP
8+
## From raw media to RTP
109

11-
We refer to the process of taking the media payload out of RTP packets as _depayloading_.
10+
When the browser sends audio or video, it does the following things:
11+
12+
1. Capturing the media from your peripheral devices, like a webcam or microphone.
13+
2. Encoding the media, so it takes less space and uses less network bandwidth.
14+
3. Packing it into a single or multiple RTP packets, depending on the media chunk (e.g., video frame) size.
15+
4. Sending it to the other peer using WebRTC.
16+
17+
We have to reverse these steps in order to be able to use the media:
18+
19+
1. We receive the media from WebRTC.
20+
2. We unpack the encoded media from RTP packets.
21+
3. We decode the media to a raw format.
22+
4. We use the media however we like.
23+
24+
We already know how to do step 1 from previous tutorials, and step 4 is completely up to the user, so let's go through steps 2 and 3 in the next sections.
1225

1326
> #### Codecs {: .info}
14-
> A media codec is a program used to encode/decode digital video and audio streams. Codecs also compress the media data,
27+
> A media codec is a program/technique used to encode/decode digital video and audio streams. Codecs also compress the media data,
1528
> otherwise, it would be too big to send over the network (bitrate of raw 24-bit color depth, FullHD, 60 fps video is about 3 Gbit/s!).
1629
>
17-
> In WebRTC, most likely you will encounter VP8, H264 or AV1 video codecs and Opus audio codec. Codecs that will be used during the session are negotiated in
18-
> the SDP offer/answer exchange. You can tell what codec is carried in an RTP packet by inspecting its payload type (`packet.payload_type`,
19-
> a non-negative integer field) and match it with one of the codecs listed in this track's transceiver's `codecs` field (you have to find
20-
> the `transceiver` by iterating over `PeerConnection.get_transceivers` as shown previously in this tutorial series).
30+
> In WebRTC, most likely you will encounter VP8, H264 or AV1 video codecs and Opus audio codec. Codecs used during the session are negotiated in
31+
> the SDP offer/answer exchange. You can tell what codec is carried in an RTP packet by inspecting its payload type (`payload_type` field in the case of Elixir WebRTC).
32+
> This value should correspond to one of the codecs included in the SDP offer/answer.
33+
34+
## Depayloading RTP
35+
36+
We refer to the process of getting the media payload out of RTP packets as _depayloading_. Usually a single video frame is split into
37+
multiple RTP packets, and in case of audio, each packet carries, more or less, 20 milliseconds of sound. Fortunately, you don't have to worry about this,
38+
just use one of the depayloaders provided by Elixir WebRTC (see the `ExWebRTC.RTP.<codec>` submodules). For instance, when receiving VP8 RTP packets, we could depayload
39+
the video by doing:
40+
41+
```elixir
42+
def init(_) do
43+
# ...
44+
state = %{depayloader: ExWebRTC.Media.VP8.Depayloader.new()}
45+
{:ok, state}
46+
end
47+
48+
def handle_info({:ex_webrtc, _from, {:rtp, _track_id, nil, packet}}, state) do
49+
depayloader =
50+
case ExWebRTC.RTP.VP8.Depayloader.write(state.depayloader, packet) do
51+
{:ok, depayloader} -> depayloader
52+
{:ok, frame, depayloader} ->
53+
# we collected a whole frame (it is just a binary)!
54+
# we will learn what to do with it in a moment
55+
depayloader
56+
end
57+
58+
{:noreply, %{state | depayloader: depayloader}}
59+
end
60+
```
61+
62+
Every time we collect a whole video frame consisting of a bunch of RTP packets, the `VP8.Depayloader.write` returns it for further processing.
2163

22-
_TBD_
64+
> #### Codec configuration {: .warning}
65+
> By default, `ExWebRTC.PeerConnection` will use a set of default codecs when negotiating the connection. In such case, you have to either:
66+
>
67+
> * support depayloading/decoding for all of the negotiated codecs
68+
> * force some specific set of codecs (or even a single codec) in the `PeerConnection` configuration.
69+
>
70+
> Of course, the second option is much simpler, but it increases the risk of failing the negotiation, as the other peer might not support your codec of choice.
71+
> If you still want to do it the simple way, set the codecs in `PeerConnection.start_link`
72+
> ```elixir
73+
> codec = %ExWebRTC.RTPCodecParameters{
74+
> payload_type: 96,
75+
> mime_type: "video/VP8",
76+
> clock_rate: 90_000
77+
> }
78+
> {:ok, pc} = ExWebRTC.PeerConnection.start_link(video_codecs: [codec])
79+
> ```
80+
> This way, you either will always have to send/receive VP8 video codec, or you won't be able to negotiate a video stream at all. At least you won't encounter
81+
> unpleasant bugs in video decoding!
2382
2483
## Decoding the media to raw format
2584
26-
_TBD_
85+
Before we use the video as an input to the machine learning model, we need to decode it into raw format. Video decoding or encoding is a very
86+
complex and resource-heavy process, so we don't provide anything for that in Elixir WebRTC, but you can use the `xav` library, a simple wrapper over `ffmpeg`,
87+
to decode the VP8 video. Let's modify the snippet from the previous section to do so.
88+
89+
```elixir
90+
def init(_) do
91+
# ...
92+
serving = # setup your machine learning model (i.e. using Bumblebee)
93+
state = %{
94+
depayloader: ExWebRTC.Media.VP8.Depayloader.new(),
95+
decoder: Xav.Decoder.new(:vp8),
96+
serving: serving
97+
}
98+
{:ok, state}
99+
end
100+
101+
def handle_info({:ex_webrtc, _from, {:rtp, _track_id, nil, packet}}, state) do
102+
depayloader =
103+
with {:ok, frame, depayloader} <- ExWebRTC.RTP.VP8.Depayloader.write(state.depayloader, packet),
104+
{:ok, raw_frame} <- Xav.Decoder.decode(state.decoder, frame) do
105+
# raw frame is just a 3D matrix with the shape of resolution x colors (e.g 1920 x 1080 x 3 for FullHD, RGB frame)
106+
# we can cast it to Elixir Nx tensor and use it as the machine learning model input
107+
# machine learning stuff is out of scope of this tutorial, but you probably want to check out Elixir Nx and friends
108+
tensor = Xav.Frame.to_nx(raw_frame)
109+
prediction = Nx.Serving.run(state.serving, tensor)
110+
# do something with the prediction
111+
112+
depayloader
113+
else
114+
{:ok, depayloader} -> depayloader
115+
{:error, _err} -> # handle the error
116+
end
117+
118+
{:noreply, %{state | depayloader: depayloader}}
119+
end
120+
```
121+
122+
We decoded the video and used it as an input of the machine learning model and got some kind of prediction - do whatever you want with it.
123+
124+
> #### Jitter buffer {: .warning}
125+
> Do you recall that WebRTC uses UDP under the hood, and UDP does not ensure packet ordering? We could ignore this fact when forwarding the packets (as
126+
> it was not our job to decode/play/save the media), but now packets out of order can seriously mess up the process of decoding.
127+
> To remedy this issue, something called _jitter buffer_ can be used. Its basic function
128+
> is to delay/buffer incoming packets by some time, let's say 100 milliseconds, waiting for the packets that might be late. Only if the packets do not arrive after the
129+
> additional 100 milliseconds, we count them as lost. To learn more about jitter buffer, read [this](https://bloggeek.me/webrtcglossary/jitter-buffer/).
130+
>
131+
> As of now, Elixir WebRTC does not provide a jitter buffer, so you either have to build something yourself or wish that such issues won't occur, but if anything
132+
> is wrong with the decoded video, this might be the problem.
133+
134+
This tutorial shows, more or less, what the [Recognizer](https://github.com/elixir-webrtc/apps/tree/master/recognizer) app does. Check it out, along with other
135+
example apps in the [apps](https://github.com/elixir-webrtc/apps) repository, it's a great reference on how to implement fully-fledged apps based on Elixir WebRTC.
27136

guides/introduction/forwarding.md

Lines changed: 29 additions & 79 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ The `packet` is an RTP packet. It contains the media data alongside some other u
2121
> RTP is a network protocol created for carrying real-time data (like media) and is used by WebRTC.
2222
> It provides some useful features like:
2323
>
24-
> * sequence numbers: UDP (which is usually used by WebRTC) does not provide ordering, thus we need this to catch missing or out-of-order packets
24+
> * sequence numbers: UDP (which is usually used by WebRTC) does not provide packet ordering, thus we need this to catch missing or out-of-order packets
2525
> * timestamp: these can be used to correctly play the media back to the user (e.g. using the right framerate for the video)
2626
> * payload type: thanks to this combined with information in the SDP offer/answer, we can tell which codec is carried by this packet
2727
>
@@ -39,45 +39,36 @@ flowchart LR
3939
WB((Web Browser)) <-.-> PC
4040
```
4141

42-
The only thing we have to implement is the `Forwarder` GenServer. Let's combine the ideas from the previous section to write it.
42+
The only thing we have to implement is the `Forwarder` process. In practice, making it a `GenServer` would be probably the
43+
easiest and that's what we are going to do here. Let's combine the ideas from the previous section to write it.
4344

4445
```elixir
45-
defmodule Forwarder do
46-
use GenServer
47-
48-
alias ExWebRTC.{PeerConnection, ICEAgent, MediaStreamTrack, SessionDescription}
49-
50-
@ice_servers [%{urls: "stun:stun.l.google.com:19302"}]
51-
52-
@impl true
53-
def init(_) do
54-
{:ok, pc} = PeerConnection.start_link(ice_servers: @ice_servers)
55-
56-
# we expect to receive two tracks from the web browser - one for audio, one for video
57-
# so we also need to add two tracks here, we will use these to forward media
58-
# from each of the web browser tracks
59-
stream_id = MediaStreamTrack.generate_stream_id()
60-
audio_track = MediaStreamTrack.new(:audio, [stream_id])
61-
video_track = MediaStreamTrack.new(:video, [stream_id])
62-
63-
{:ok, _sender} = PeerConnection.add_track(pc, audio_track)
64-
{:ok, _sender} = PeerConnection.add_track(pc, video_track)
65-
66-
# in_tracks (tracks we will receive media from) = %{id => kind}
67-
# out_tracks (tracks we will send media to) = %{kind => id}
68-
out_tracks = %{audio: audio_track.id, video: video_track.id}
69-
{:ok, %{pc: pc, out_tracks: out_tracks, in_tracks: %{}}}
70-
end
71-
72-
# ...
46+
def init(_) do
47+
{:ok, pc} = PeerConnection.start_link(ice_servers: [%{urls: "stun:stun.l.google.com:19302"}])
48+
49+
# we expect to receive two tracks from the web browser - audio and video
50+
# so we also need to add two tracks here, we will use them to loop media back t othe browser
51+
# from each of the web browser tracks
52+
stream_id = MediaStreamTrack.generate_stream_id()
53+
audio_track = MediaStreamTrack.new(:audio, [stream_id])
54+
video_track = MediaStreamTrack.new(:video, [stream_id])
55+
56+
{:ok, _sender} = PeerConnection.add_track(pc, audio_track)
57+
{:ok, _sender} = PeerConnection.add_track(pc, video_track)
58+
59+
# in_tracks (tracks we will receive from the browser) = %{id => kind}
60+
# out_tracks (tracks we will send to the browser) = %{kind => id}
61+
in_tracks = %{}
62+
out_tracks = %{audio: audio_track.id, video: video_track.id}
63+
{:ok, %{pc: pc, out_tracks: out_tracks, in_tracks: in_tracks}}
7364
end
7465
```
7566

7667
We started by creating the PeerConnection and adding two tracks (one for audio and one for video).
7768
Remember that these tracks will be used to *send* data to the web browser peer. Remote tracks (the ones we will set up on the JavaScript side, like in the previous tutorial)
7869
will arrive as messages after the negotiation is completed.
7970

80-
> #### Where are the tracks? {: .tip}
71+
> #### What are the tracks? {: .tip}
8172
> In the context of Elixir WebRTC, a track is simply a _track id_, _ids_ of streams this track belongs to, and a _kind_ (audio/video).
8273
> We can either add tracks to the PeerConnection (these tracks will be used to *send* data when calling `PeerConnection.send_rtp/4` and
8374
> for each one of the tracks, the remote peer should fire the `track` event)
@@ -96,39 +87,14 @@ will arrive as messages after the negotiation is completed.
9687
>
9788
> If you want to know more about transceivers, read the [Mastering Transceivers](https://hexdocs.pm/ex_webrtc/mastering_transceivers.html) guide.
9889
99-
Next, we need to take care of the offer/answer and ICE candidate exchange. As in the previous tutorial, we assume that there's some kind
100-
of WebSocket relay service available that will forward our offer/answer/candidate messages to the web browser and back to us.
101-
102-
```elixir
103-
@impl true
104-
def handle_info({:web_socket, {:offer, offer}}, state) do
105-
:ok = PeerConnection.set_remote_description(state.pc, offer)
106-
{:ok, answer} = PeerConnection.create_answer(state.pc)
107-
:ok = PeerConnection.set_local_description(state.pc, answer)
108-
109-
web_socket_send(answer)
110-
{:noreply, state}
111-
end
112-
113-
@impl true
114-
def handle_info({:web_socket, {:ice_candidate, cand}}, state) do
115-
:ok = PeerConnection.add_ice_candidate(state.pc, cand)
116-
{:noreply, state}
117-
end
118-
119-
@impl true
120-
def handle_info({:ex_webrtc, _from, {:ice_candidate, cand}}, state) do
121-
web_socket_send(cand)
122-
{:noreply, state}
123-
end
124-
```
90+
Next, we need to take care of the offer/answer and ICE candidate exchange. This can be done the exact same way as in the previous
91+
tutorial, so we won't get into here.
12592
126-
Now we can expect to receive messages with notifications about new remote tracks.
93+
After the negotiation, we can expect to receive messages with notifications about new remote tracks.
12794
Let's handle these and match them with the tracks that we are going to send to.
12895
We need to be careful not to send packets from the audio track on a video track by mistake!
12996
13097
```elixir
131-
@impl true
13298
def handle_info({:ex_webrtc, _from, {:track, track}}, state) do
13399
state = put_in(state.in_tracks[track.id], track.kind)
134100
{:noreply, state}
@@ -138,7 +104,6 @@ end
138104
We are ready to handle the incoming RTP packets!
139105

140106
```elixir
141-
@impl true
142107
def handle_info({:ex_webrtc, _from, {:rtp, track_id, nil, packet}}, state) do
143108
kind = Map.fetch!(state.in_tracks, track_id)
144109
id = Map.fetch!(state.out_tracks, kind)
@@ -154,28 +119,13 @@ end
154119
> change between two tracks, the payload types are dynamically assigned and may differ between RTP sessions), and some RTP header extensions. All of that is
155120
> done by Elixir WebRTC behind the scenes, but be aware - it is not as simple as forwarding the same piece of data!
156121
157-
Lastly, let's take care of the client-side code. It's nearly identical to what we have written in the previous tutorial.
122+
Lastly, let's take care of the client-side code. It's nearly identical to what we have written in the previous tutorial,
123+
except for the fact that we need to handle tracks added by the Elixir's PeerConnection.
158124

159125
```js
160-
const localStream = await navigator.mediaDevices.getUserMedia({audio: true, video: true});
161-
const pc = new RTCPeerConnection({iceServers: [{urls: "stun:stun.l.google.com:19302"}]});
162-
localStream.getTracks().forEach(track => pc.addTrack(track, localStream));
163-
164-
// these will be the tracks that we added using `PeerConnection.add_track`
126+
// these will be the tracks that we added using `PeerConnection.add_track` in Elixir
127+
// but be careful! event for the same track, the ids might be different for each of the peers
165128
pc.ontrack = event => videoPlayer.srcObject = event.stream[0];
166-
167-
// sending/receiving the offer/answer/candidates to the other peer is your responsibility
168-
pc.onicecandidate = event => send_to_other_peer(event.candidate);
169-
on_cand_received(cand => pc.addIceCandidate(cand));
170-
171-
// remember that we set up the Elixir app to just handle the incoming offer
172-
// so we need to generate and send it (and thus, start the negotiation) here
173-
const offer = await pc.createOffer();
174-
await pc.setLocalDescription(offer)
175-
send_offer_to_other_peer(offer);
176-
177-
const answer = await receive_answer_from_other_peer();
178-
await pc.setRemoteDescription(answer);
179129
```
180130

181131
And that's it! The other peer should be able to see and hear the echoed video and audio.

guides/introduction/intro.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,4 +30,4 @@ your web application. Here are some example use cases:
3030
In general, all of the use cases come down to getting media from one peer to another. In the case of Elixir WebRTC, one of the peers is usually a server,
3131
like your Phoenix app (although it doesn't have to - there's no concept of server/client in WebRTC, so you might as well connect two browsers or two Elixir peers).
3232

33-
This is what the next section of this tutorial series will focus on - we will try to get media from a web browser to a simple Elixir app.
33+
This is what the next tutorials will focus on - we will try to get media from a web browser to a simple Elixir app.

0 commit comments

Comments
 (0)