A few weeks ago I shared my experience studying and implementing QUIC from scratch: in shortly less than a month, I went from not knowing anything about QUIC, to have a basic stack (with WebTransport support) I could use for interacting with other implemenentations on simple scenarios. Getting to that point was bumpy (QUIC is not an easy protocol to implement!) but exciting as well, and I really couldn’t wait to start looking into some of the media related protocols that people are using QUIC for.
In this blog post I’ll share some details on the work I did on one of those efforts, and specifically the ongoing standardization efforts on RTP Over QUIC, or RoQ for short, and the interop tests I made at the IETF 120 Hackathon. I’ve been working on MoQ (Media Over QUIC) as well these past few weeks, but there’s a lot more to say about that, so I’ll leave it to a future blog post: stay tuned for that!
Where were we?
As anticipated in the intro, in my previous post I described my first journey with QUIC, studying how the protocol works and prototyping what I was looking at. Eventually, I managed to implement the fundamental features of the protocol, and even though I did cut a lot of corners in this initial efforts (there’s no support yet in my library for retransmissions, flow control and congestion control, for instance), it did allow me to make simple tests using both “plain” QUIC and WebTransport. The WebTransport stuff ended up in a new Janus plugin as well, so that I could demonstrate a basic interaction between WebRTC users (via data channels) and a browser using WebTransport.
This was an important milestone, as while the QUIC part itself still needs considerable work, it also laid the foundations for building more complex, and interesting, scenarios on top of all that. In fact, as explained in my original blog post, QUIC itself is “just” a transport protocol: it won’t give you anything more than that out of the box. It’s up to you to then use that transport to do cool things, e.g., negotiating custom protocols for different use cases.
Due to my background, what interests me the most is obviously real-time media, and there’s a lot happening in the QUIC world on that aspect. Specifically, there’s at least two efforts that are quite relevant:
- RTP Over QUIC (RoQ)
- Media Over QUIC (MoQ)
RoQ, as the name suggests, is mostly an effort to try and figure out how to use QUIC as a transport for RTP sessions (where normally you’d use plain UDP, TCP or TURN for that). MoQ, on the other end, is a much more complex and comprehensive effort to create a generic streaming and broadcasting mechanism for real-time delivery, that could be used for different things like streaming or conferencing.
I worked a lot on both, recently, but I did indeed start from RoQ, for a few different reasons: first of all, it was “closer” to where I was coming from (VoIP and WebRTC both use RTP heavily, so it’s a protocol I’m intimately familiar with already), and besides, from an implementation point of view it was a much easier entry point to QUIC and real-time media for me. As such, I rolled up my sleeves and started reading the latest version of the draft.
Having a look at the IETF draft
As with everything related to RTP, this new effort is currently documented in an internet draft in the AVTCORE Working Group (Audio/Video Transport Core Maintenance) at the IETF. At the time of writing, the draft is at its 11th revision, even though the version I used as a reference for my implementation was the version before that (-10). The main author of the draft, Mathis Engelbart, is actually well known to WebRTC developers, especially those familiar with the Pion community: if you attended past editions of CommCon, for instance, you may remember Mathis’ presentation on how he added Bandwidth Estimation to Pion.
While the document is quite long, the part that covers what actually goes on the wire (so how you encapsulate RTP in QUIC) is relatively short and straightforward. An important point it makes, for instance, is that, while in WebRTC you’d use SRTP (and DTLS-SRTP) to secure packets, on QUIC this is unneeded, as QUIC connections are always encrypted already. Apart from that, most of the document is then focused on addressing and providing guidance on how to leverage QUIC for things you normally do with RTP/RTCP: for instance, most of the statistics and information you’d usually use RTCP for, are actually made available out of the box by QUIC itself, which means there are mappings you can make accordingly to avoid exchanging unneeded RTCP messages. Similar considerations can be made for bandwidth estimation and control too, again thanks to the functionality that QUIC provides as a transport protocol.
To start working on the specification, I decided to focus on RTP alone, which meant how to physically put RTP packets on top of QUIC, and be able to exchange them between client and server endpoints. In that regard, the document explains there are basically three ways you can multiplex/send RTP packets:
- using QUIC datagrams;
- sending each RTP packet on its own QUIC
STREAM
; - sending multiple RTP packets on the same
STREAM
.
Using QUIC datagrams is the closest thing there is to sending RTP on plain UDP, as it leverages the unreliable datagram extension to QUIC, where everything you need you can find in the QUIC packet you receive. Using STREAM
is a different matter, instead, since it’s based on the main way QUIC has of exchanging media: in both cases (one stream per packet vs. multiple packets in the same stream), RoQ mandates the usage of unidirectional streams. Choosing one approach over the other depends on the requirements the application has, and how much head-of-line blocking may be considered an issue on the same stream.
No matter the multiplexing mode, RoQ adds the important concept of “flow identifiers” as a mechanism for “multiplexing multiple RTP and RTCP ports on the same QUIC connection to conserve ports, especially at NATs and firewalls“. Different flow IDs can obviously be used over the same QUIC connection, e.g., to group all RTP (and maybe RTCP) packets related to a specific audio stream on one flow ID, and a video stream on another.
Another important requirement is obviously framing. RTP itself doesn’t have a “length” property, since RTP packets are usually sent on, and self-contained in, specific datagrams. When sending RTP over TCP, which is not message based but stream based, each RTP packet is prefixed by a two-byte length field to allow recipients to extract the different packets from the stream of data. When using QUIC streams the requirement is the same, which explains why RoQ envisages framing (even though not the same framing RTP over TCP employs) as part of the serialization that happens over QUIC.
More specifically, when sending RTP over a QUIC datagram, no framing/length info is needed, but just the flow ID (as a variable length integer) the packet is associated with. When using streams, instead, the approach is different, since in that case you do need info on how long each packet is to know when it ends in the incoming stream of data: in that case, the RTP packet is also prefixed by its length (again as a variable length integer, which is a common representation on QUIC).
A summary of the different multiplexing modes is shown here:
As depicted in the diagram above, the main difference between the two STREAM
-based multiplexing modes is in how data is serialized. When putting a single RTP packet on its own stream, then each RTP payload is prefixed by a flow ID; when grouping multiple RTP packets over the same stream, instead, we just need to write the flow ID once, at the beginning of the stream, as all packets that follow on that stream will share it. Each RTP packet can then be written one after the other on the stream, since the framing information would provide the means to encapsulate/decapsulate them.
Again, the document goes in great detail about many other aspects, but in a nutshell this is all you need to know if you want to serialize RTP over QUIC in its most basic form.
Looks easy enough, let’s start hacking!
Looking at how RTP encapsulation works in RoQ, this is basically what we need from a QUIC stack:
- ability to negotiate the RoQ ALPN (
roq-10
in the case of the v10 revision of the draft); - optional support for QUIC datagrams, if we want to use that multiplexing mode;
- ability to create unicast stream and send/receive data.
These are all very basic requirements, and features my simple library testbed supported already. This means that I could indeed start building a simple RoQ stack on top of that. I decided to start working on a couple of very basic endpoints, that is, a RoQ client to generate RTP packets, and a RoQ server to receive them via QUIC.
For the RoQ client, I left RTP generation to a separate application. Very simply, I configured an external GStreamer pipeline to encode an audio and a video stream, and send it via RTP to a specific address the RoQ client was listening to. The RoQ client would then create a QUIC connection to a remote QUIC endpoint, and when connected it would encapsulate the RTP packets coming in via UDP on the QUIC connection instead, using different (and configurable) flow IDs for the audio and video streams. In order to test the different multiplexing modes, I made it configurable, so that I could test sending packets using any of them. The options I exposed in this tiny application are the following:
lminiero@lminiero imquic $ ./examples/imquic-roq-client -h
Usage:
imquic-roq-client [OPTION?]
Help Options:
-h, --help Show help options
Application Options:
-d, --debug-level=1-7 Debug/logging level (0=disable debugging, 7=maximum debug level; default=4)
-a, --audio-port=port Port to bind to for incoming audio RTP packets (default=none)
-A, --audio-flow=number Flow ID of the audio RTP stream (default=none)
-v, --video-port=port Port to bind to for incoming video RTP packets (default=none)
-V, --video-flow=number Flow ID of the video RTP stream (default=none)
-m, --multiplexing=mode RTP multiplexing (datagram, stream or streams; default=datagram)
-p, --port=port QUIC port to bind to (default=0, random)
-r, --remote-host=IP QUIC server to connect to (default=none)
-R, --remote-port=port Port of the QUIC server (default=none)
-w, --webtransport Whether WebTransport should be used for the RoQ connection or not (default=no)
-H, --path=HTTP/3 path In case WebTransport is used, path to use for the HTTP/3 request (default=/)
-c, --cert-pem=path Certificate to use (default=none)
-k, --cert-key=path Certificate key to use (default=none)
-P, --cert-pwd=string Certificate password to use (default=none)
-s, --secrets-log=path Save the exchanged secrets to a file compatible with Wireshark (default=none)
For the RoQ server part, instead, I kept it even simpler. I made it so that it would wait for an incoming connection from a QUIC client, and then handle incoming packets, making them available to the application level as an RTP packet no matter how they were multiplexed on QUIC. As a proof of concept, I coded the RoQ server to only print the information contained in the RTP headers, to ensure that what was I was sending on one end was indeed the same as what I was receiving on the other. Due to the limited functionality when compared to the client, the number of options is limited as well:
lminiero@lminiero imquic $ ./examples/imquic-roq-server -h
Usage:
imquic-roq-server [OPTION?]
Help Options:
-h, --help Show help options
Application Options:
-d, --debug-level=1-7 Debug/logging level (0=disable debugging, 7=maximum debug level; default=4)
-p, --port=port QUIC port to bind to (default=9000)
-w, --webtransport Whether WebTransport should be used for the RoQ connection or not (default=no)
-c, --cert-pem=path Certificate to use (default=none)
-k, --cert-key=path Certificate key to use (default=none)
-P, --cert-pwd=string Certificate password to use (default=none)
-s, --secrets-log=path Save the exchanged secrets to a file compatible with Wireshark (default=none)
The end result is something like you can see in the animated image below:
Very simply, in the example captured above I started the RoQ server first, then had the RoQ client connect to the server, and finally started the GStreamer pipeline to feed RTP packets to the client via plain UDP. As soon as that happened, the RoQ client started sending packets via QUIC (in this specific instance, using a stream per packet, since the streams
property was passed via command line as the multiplexing mode to use). As you can see in the animation, the flow IDs (0 for audio, 1 for video) and RTP headers match both on the way out and the way in: success!
Integrating RoQ in the Janus QUIC plugin
Getting the basic RoQ client and server to interoperate with each other was exciting, but at this point I wanted to try and do something a bit more complex and interesting. And since WebRTC uses RTP, why not try to get RoQ and WebRTC to talk to each other for a little bit? Of course, just as I did in the previous blog post for WebTransport, I decided to use Janus for the purpose, and more precisely extend the QUIC plugin I had already started working on to also optionally support RoQ.
The approach I chose basically mirrored the client/server approach I used in the command line demos, but involving the Janus API and WebRTC PeerConnections instead, so that I could send or receive actual RTP packets accordingly, within the context of a WebRTC session. As such, I modified the plugin I created in the previous post to support two additional modes:
- creating a RoQ client associated to WebRTC
sendonly
PeerConnections, and - creating a RoQ server associated to WebRTC
recvonly
PeerConnections.
This way, for WebRTC sendonly
PeerConnections my plugin would be able to get incoming RTP from the user via WebRTC, and send it as a RoQ client via QUIC somewhere else. At the same time, for WebRTC recvonly
PeerConnections my plugin would create a dedicated RoQ server, and relay RTP packets coming from there back to the user via WebRTC.
Of course, considering I currently don’t have any signalling around my RoQ stack, I had to make a few assumptions in order to make this work, mainly around how to use flow IDs. We mentioned how they logically convey a mapping to a specific RTP context, so the easiest way to do it for me was to map the flow ID to the m-line of the WebRTC SDP. To make a practical example, for a WebRTC PeerConnection negotiating an audio (first m-line) and a video (second m-line) stream, I’d use flow ID 0 for the audio stream (index of the first m-line) and 1 for video. This works nicely for the client part, where we control the mapping, but it’s a bit more tricky on the receive side, where we create the PeerConnection before we know what we’re receiving: this means that, again, I made assumptions in my demo (in particular on the flow IDs employed by the remote RoQ client), simply translating an incoming flow ID to the related m-line index hoping media will match. These assumptions for both scenarios are summarized in the diagrams below:
That said, for a simple demo these assumptions were not a big deal to me (especially considering I controlled all endpoints), so I created a couple of demo pages (one for the sending part, and one for the receive part) to test how this would work in practice.
The animation below shows Janus acting as a RoQ server: the command line RoQ client is configured to connect to that server, and relay the RTP coming from GStreamer via RoQ. Janus, acting as a RoQ server, demultiplexes the RTP traffic, and sends it via WebRTC according to the flow ID/m-line index mapping. I obviously used the “Be quick or be dead” video by Iron Maiden as a source, because what’s more appropriate than that for QUIC demos?
This other animation, instead, shows the reverse scenario. In this case, Janus is acting as a RoQ client on behalf of a user publishing audio and video via WebRTC (me with my new ugly haircut). The command line RoQ server is the endpoint Janus connects to via QUIC, and as before it just displays what it’s receiving and demultiplexing accordingly.
What about having Janus act both as a RoQ client and a RoQ server? Let’s give it a try!
Tobia, the Meetecho CEO, tried to highjack my cool dance in the video by showing his hand in the video, but luckily that didn’t interfere with the success of the demo: these were two independent WebRTC PeerConnections bridged via a RoQ connection. Pretty cool, huh?
IETF 120 Hackathon!
Now that I had something working, the next step was to figure out if I had done everything correctly. Testing against yourself is useful and works for ironing out obvious issues, but won’t help you figure out if you did something wrong in general, since the same broken assumptions would probably appear on both the client and server side. This is where the IETF Hackathon kicked in!
I actually got the idea of working on RoQ from a response I got from Lucas Pardue (main author of the Cloudflare Quiche QUIC library) on Twitter to my previous post:
I had already attended previous editions of IETF Hackathons, and enjoyed them a lot: it was during one of those sessions, for instance, that I managed to improve simulcast support in Janus a lot, since I was sitting at the same table of basically all major browser developers. When you’re sitting right next to people working on other projects, discussing and testing stuff is much easier and quicker, and allows for faster iterations until problems are figured out and solved. As such, I was indeed eager to make some interop tests with other RoQ (and, why not, MoQ) implementations at the IETF meeting in Vancouver.
The session was indeed very useful, as it helped me fix some problems I had not only in my RoQ stack, but also in the QUIC stack itself (again, I won’t mention my MoQ efforts here instead; I’ll leave that to a future post). More precisely, Mathis pointed me to his RoQ implementation I should use for testing, so I compiled it and started making some tests. In particular, I focused on two examples that Mathis had added to the repo:
savetodisk
, a RoQ server that saves the frames extracted from incoming RTP packets to an IVF file;playfromdisk
, a RoQ client that reads an IVF file and packetizes each of its frames in RTP to send over QUIC.
As soon as I tried getting my RoQ client to talk to his savetodisk
RoQ server, my client crashed horribly… The root cause was that my client was advertising support for the roq-10
ALPN, while his server used the roq-09
ALPN instead, and so sent me a CONNECTION_CLOSE
frame right away. This abrupt disconnection was not handled properly in my QUIC stack, which is what caused the crash. This was an easy fix, but an important one! First success as far as I was concerned!
After getting savetodisk
to negotiate the roq-10
ALPN, I managed to go further, and to have the server save data to the IVF file. Attempting to play the IVF file, though, didn’t work, meaning that for some reason what I sent had not been processed correctly. Looking at the code of the server, I noticed it only cared about the first flow that it saw, assuming it was video: considering my client was sending both audio and video as separate flow IDs (0 and 1 respectively), the problem was that savetodisk
was saving the payloads of flow ID 0
(audio) to the IVF file, and ignoring the other flow ID entirely. As a result, the IVF video file assumed the content was VP8 video, but it was actually Opus audio, since that was the content associated to flow ID 0
. Launching my RTP client so that it would only send video and not audio fixed that issue, and trying to play the IVF file did show something.
At this point, though, a different problem occurred. Depending on which multiplexing mode my client used, the video would or would not have artifacts or be partially broken. Specifically, when I used DATAGRAM
or a STREAM
per packet, video would look fine, while when I used a single STREAM
for the video flow, the resulting video would be partially broken. Discussing this with Mathis, it turned out to be a problem in his implementation, where he was incorrectly reusing the same buffer causing broken frames to be saved to file. After he fixed that, video looked fine using all multiplexing modes: eureka!
Now, during these tests I had noticed another weird behaviour: video seemed to be transferred correctly, but I saw a lot of decrypt errors on my side of things, suggesting for some reasons I wasn’t receiving his ACKs. Considering that the delivery was working, I assumed some quirkiness in my code, but ignored it for the moment. I couldn’t ignore it much longer, though, because after the success interoperating with savetodisk
, I started testing Mathis’ playfromdisk
RoQ client with my own RoQ server, and the problem became more apparent. In a nutshell, his RoQ client could connect successfully to mine and send me packets for a few seconds, but then the same decrypt errors on my end would appear again, and I wouldn’t detect other packets, even though I could see from Wireshark that packets were still coming in, and properly encrypted. Looking at the traffic, I noticed that the problem started appearing around packet number 100, which seemed suspiciously round as a number, and most importantly systematic. Discussing this with Mathis, he remembered that quic-go, the QUIC library his implementation uses, apparently initiates a key update after the first 100 sent packets or so: I didn’t have support for key updates in my barebones QUIC stack yet, so that was definitely the issue. Looking at how key update works in QUIC, I figured out the changes I needed to implement, and as soon as I did that, the decrypt errors disappeared in both scenarios: eureka again!
At this point, I noticed a different issue: looking at the RTP packets I was receiving from playfromdisk
in my RoQ server, I noticed that the timestamps seemed to be increasing by one, as sequence numbers, instead of growing in relation to the clock rate. Discussing this with Mathis, this turned out to be a different problem in his implementation, which he fixed promptly: as soon as he did that, timestamps were as they were supposed to be, and everything worked as expected.
Solving those issues on both ends (admittedly more on my end, which was more broken ), we managed to have working interoperability between both our RoQ clients and RoQ servers, which IMHO was a great result, and a confirmation that the specification is clear and easy to implement (especially considering basically all of the bugs we found out seemed to be related to other issues, like buffering or QUIC stack issues, rather than RoQ ones). These considerations were shared by Mathis in his presentation on RoQ at the AVTCORE session:
He summarized the different client/server interop tests we had done, showing how we managed to indeed interoperate with each other after solving the issues (mainly my lack of QUIC key update support, that I addressed during the hackathon). During this presentation, he also mentioned a different RoQ implementation by BBC I wasn’t aware of: looking at the repo, it looks like this is an implementation of RoQ as a GStreamer plugin, which seems a very interesting project I’ll definitely need to test and interoperate with.
What’s next?
As far as RoQ is concerned, not much, I think. I’ll probably tinker with the BBC RoQ implementation, and I’ll definitely need to keep up-to-date with the changes in the specification, but apart from this there’s not much more to do at the moment. What I’ll probably have to do is streamline how I use RoQ in my demos, especially the Janus integration: it could be nice to have RoQ as an alternative transport for Janus RTP forwarders, for instance, or to allow the Streaming plugin to optionally act as a RoQ server as well, to provide a more organized RoQ-to-WebRTC translation functionality. The command line applications could be improved as well, e.g., to do more than just show what they’re doing: a RoQ server saving packets to an MJR file could be cool, for instance, but there’s many other things it could do (e.g., decoding and displaying media, forwarding it somewhere else, etc.). That’s something I’ll probably leave as an excercise to users, though, as soon as the QUIC library I’m working on will be released as open source: you can’t expect ME to do all the work, after all!
Apart from that, I think I’ll start focusing more on MoQ (Media Over QUIC) instead. I mentioned how I have been working on both, recently, but even though I got to a very good point, more work will be needed on MoQ before I’ll be able to share something meaningful and useful.
That’s all, folks!
This was a fun post to write, and considering it was more practical, I hope you enjoyed it. I’m looking forward to your thoughts on the subject, and should you have a RoQ implementation you want me to interoperate with, please do let me know!