QUIC has been on my mind for quite some time. I remember mentioning it as part of my “what’s next?” slides at the first edition of JanusCon, 5 years ago, and of course it was very much a topic in my opening slides at the latest JanusCon as well. For one reason or another, though, I never actually started looking into it, despite being something that, from a distance, I found quite interesting, especially for all the ongoing work going on in the Media Over QUIC Working Group at the IETF.
That doesn’t mean we’ve been idle, though. A few colleagues at Meteecho, for instance, have been hard at work on QUIC for a few months already, within the context of an ESA (European Space Agency) project called QUICoS, whose aim is to study and possibly improve how QUIC works on satellite-based networks. That said, this is something I haven’t worked on personally, and so I felt the need to start filling the gap myself as well.
As such, about a month ago I decided to give it a go, in order to start studying the specification, the protocols, the implementations and so on, and maybe, why not, write down some code to go with it. I’ll talk about all this in more detail at the upcoming RTC.ON in Krakow (which you shouldn’t miss!), but in the meanwhile, is a month too much or not enough to master QUIC? And why try to implement a new thing, instead of just using an existing stack, you ask? Only one way to find out: keep reading!
What’s QUIC, and why does it matter?
QUIC is a general purpose and connection-oriented transport protocol, a bit like TCP, if you will. It has some considerable differences from TCP, though: first of all, it’s an application level protocol that’s built on top of UDP, and so is not expected to be implemented as part of the kernel (as TCP and UDP are, for instance). The main reasons for these choices were to improve latency, bandwidth estimation, and solve the head-of-line blocking that’s typical of TCP-based connections, all while allowing QUIC (and its implementations) to evolve much faster without the need for kernel updates. This is really a tiny summary of what QUIC is from a bird’s view, and may also be partly incorrect: you may want to check some introductory content on QUIC itself for a more comprehensive overview of its strenghts.
What’s relevant for this blog post is that, by itself, QUIC is a transport protocol that can carry pretty much whatever you want. It’s used as a foundation for HTTP/3, for instance. Why does it matter if we’re so much focused on WebRTC, then? After all, WebRTC is not that generic at all: it has ways to transport more generic data (e.g., using data channels), but even in that case, it can hardly be called a “transport protocol”. It’s very much an ecosystem for real-time multimedia protocols and applications. Why should we care about QUIC, then?
Well, the short answer is that QUIC is slowly becoming the de-facto building block for the future protocols of the internet. We mentioned HTTP/3, DNS-over-QUIC is a reality too, and there’s a ton of different scenarios that people within the IETF are starting to look at QUIC for. One important requirement is figuring out how to use QUIC for transporting real-time media as well, and for different use cases, ranging from broadcasting to conferencing and beyond. Does this ring a bell, now?
Being UDP-based, QUIC does indeed seem to be a good fit. After all, WebRTC does the same. QUIC is also always encrypted (as is WebRTC), and comes with pluggable flow and congestion control algorithms (something that, in part, comes with WebRTC as well). That’s at the basis of the previously mentioned Media Over QUIC (MOQ) effort: developing “a simple low-latency media delivery solution for ingest and distribution of media”. The main rationale for working on this new effort, rather than continuing the work on WebRTC, is apparently the greater flexibility and adaptability that MOQ would be able to provide, since it would map its features on QUIC functionality, and most importantly would not address ultra-low latency playback alone, but also “near-live and VOD playback” (more info on this interesting blog post on the IETF website).
As such, you’ll probably agree with me that, even though MOQ may not be really usable in production right now (the specification has not been frozen yet, and existing implementation are evolving over time), it will definitely become an important alternative or solution for real-time media in the future.
That said, MOQ is based on QUIC and, up until a month ago, I had no idea how QUIC even looked like… So any effort aimed at targeting MOQ necessarily needed to start from QUIC itself. How hard could that be?
Where do you even start!
Normally, when you want to study a new standard protocol or specification, you grab the related RFC and read it from its first word to the last. That’s unfortunately not as easy when it comes to QUIC! In fact, even in its most basic form, there’s actually four different RFCs you need to read if you want to understand the whole thing:
- RFC 8999 introduces the so called “Version-Independent Properties of QUIC”, so all the concepts that will be common to all versions of QUIC that will come out in the future. It’s where concepts like connection IDs or long/short header packets are first introduced, and the importance of variable length properties (say goodbye to those nice structures with bit mappings to quickly parse packets, as we all liked to do with RTP and RTCP for instance!).
- RFC 9000 is where QUIC version 1 itself is introduced, with all its concepts, packets, frames, error codes, states, etc. While a quite comprehensive document, though, it is not enough if you want to implement QUIC, since there’s fundamental information that’s actually provided in other documents.
- RFC 9001 explains how you use TLS to secure QUIC. In QUIC, there’s no such thing as “no encryption”: everything is always encrypted, and even for the parts that aren’t (its header), there are properties that are protected/obfuscated. This underlines how important this document is, especially considering QUIC doesn’t use TLS as you may be used to (e.g., compared to how TLS-over-TCP works).
- Finally, RFC 9002 focuses on how to deal with loss detection, and how to implement congestion control on QUIC. While this is obviously very important if you want to use QUIC on anything that isn’t your own laptop alone, for the moment I haven’t delved much into the details of this specific document (I plan to do this at a later stage), since I wanted to understand the basics of the protocol first.
As you can understand, trying to study QUIC just from the RFCs is a titanic endeavour. It’s by no means impossible, but it’s very hard: information is spread across different documents, that reference each other all the time, which means you never know where to start from, how much is given for granted, and so on and so forth.
Luckily, there are plenty of resources to dip your toes in QUIC as a complete newbie (as I arguably was). One in particular that I found incredibly helpful was a couple of blog post that Andy Pearce wrote on the subject: specifically, Andy wanted to implement his own HTTP/3 stack, and so wrote a very detailed blog post introducing how QUIC works first, followed then by another very useful blog post on HTTP/3 itself (and its mapping to QUIC).
The first post really is a golden trove of information, that I can’t recommend enough to people completely new to QUIC. While high level enough when it needs to be, it’s also very detailed any time it goes into specifics, e.g., in terms of how connections are established in the first place and how encryption/protection of all packets works (spoiler alert: it will give you a headache!), how ACK
work in QUIC, the basics of STREAM
for sending and receiving data, and so on and so forth. I won’t delve too much into the details of this (again, I strongly recommend you read Andy’s post to have a better understanding of those parts). Suffice it to say that, after reading it, I felt confident enough to try and implement a very basic QUIC message parser, and so write my very first QUIC related code ever.
Writing code already? Why?!
This is what some people may find surprising, or confusing: why implement something, instead of just keeping to study it, or maybe having a look at existing stacks? Well, the short answer is that, for good or bad, I’ve always found that I learn faster and better when I actually implement the thing: there are things you only understand once you get your hands dirty, and it’s the main reason why I know WebRTC so well, for instance. About other stacks, that’s indeed where I started from, initially, but what can you really learn about a protocol when you’re just looking at things happening? Besides, studying and understanding code implementing a protocol is, more often than not, much more difficult than studying the protocol itself.
This is what eventually led me to start implementing the basic library I’m writing to learn the protocol. Whether this will ever see the light of day (should this become good enough, I’ll definitely release it as open source), it’s an invaluable tool for me to learn the protocol by actively working on it. Once I get a much better understanding of the protocol, I may or may not decide to spend more time on more mature implementations (e.g., to start writing cool stuff with them).
Until then, writing some code it is
A basic parser: how hard can it be!
Well, spoiler alert, it’s not easy, especially if you’ve literally just found out some info on how QUIC works and looks like. What I tried to do was getting access to some pcapng captures of QUIC traffic that I could look at, so that I could then save some of those packets locally and try parsing them manually myself. That’s when you start realizing that, when they say that everything in QUIC is always encrypted/protected, they’re not fooling around!
When you look at QUIC captures in Wireshark, you’ll notice that, under normal circumstances, Wireshark will show you the contents of the first exchanged packets, and then for the others it will only be able to display a handful of properties in the header and nothing else. This might give you the illusion that those first packets are unencrypted, but that’s not really the case: they still are encrypted, they’re just encrypted with information that the first packet itself contains (which allows Wireshark to decrypt it on its own). As a consequence, any attempt to parse even that initial packet going blind will stop you in your tracks pretty soon, since there are parts of the unencrypted header that are obfuscated (like the packet number, for instance).
Figuring out how that part works did take me a bit more time than I hoped, but once you figure that out, it’s relatively straightforward to understand. In a nutshell, QUIC has different encryption levels, each with their own packet number space: you normally start with Initial
(where a connection and a first TLS handshake is attempted), then move to Handshake
(where the TLS handshake is continued up to its completion) and finally you move to Application
, which is where you finally start exchanging data and using QUIC for what it’s needed. There’s a different encryption level for early-data, but that’s a topic for an entirely different blog post (especially considering I haven’t tackled it yet ).
Now, for each of those encryption levels, the process is always the same:
- when you send packets, you encrypt the payload with the key of that level first, and then protect the payload (appending a tag at the end) and the header (obfuscating some properties) with the protection key of that level;
- when you receive packets, you unprotect first, and then decrypt the payload.
The main complexity lies in how you obtain the cryptographic information that you’ll need at each level. This is particularly interesting (and tricky) for the Initial
stage, since a TLS session hasn’t even been established yet: we’re just starting! As I anticipated before, the way this works is that you basically derive some initial keys from information that are contained in the first packet, and specifically the Destination Connection ID
. The exact process is documented in RFC 9001 (so I’ll omit it for brevity), but suffice it to say that this will give you access to some initial secrets that can be used for the task: from those secrets, exactly as you’d do at the other levels (which we’ll talk about later), you can then expand/extract the encryption key, IV and protection using HKDF. A detailed and annotated description of the process is also available on this very useful website.
Following these steps, I was able to properly parse the first messages I saved to disk from those QUIC captures. In fact, once dealt with protection and encryption, the process of parsing QUIC frames was mostly a matter of checking the format of the different frames, and implementing support for parsing them.
That said, after the first initial packets, any attempt to parse those packets would fail. In fact, as anticipated the TLS exchange happens at higher encryption levels (Handshake
first, and then Application
), which means that those initial secrets we derived are not good anymore. This meant starting to implement an actual, although tiny, UDP server I could use to start exchanging packets with an actual QUIC endpoint, in order to try and perform an actual handshake and figure out the next steps.
First roadblock: what encryption library?
All WebRTC developers are more or less familiar with the different libraries we can take advantage of for encryption. We know, for instance, that although SRTP is used for actually encrypting RTP packets, the cryptographic information to initialize SRTP is exchanged via DTLS first, and that typically requires a library of its own (unless you’re crazy enough to implement your own flavour). Common choices include OpenSSL, BoringSSL, LibreSSL and others.
Now, you may think that, considering OpenSSL is without a doubt the most widely deployed library for the job, that’s what you should go for to implement encryption for QUIC too. Unfortunately, that’s not that easy… In fact, I briefly anticipated before how TLS on QUIC does not work as it does on, for instance, TCP (or how DTLS does on UDP, for that matter). We’ve mentioned how QUIC has some parts that are encrypted, some in the clear, and how the whole thing is protected (and partly obfuscated). This means that QUIC actually “owns” the TLS handshake, acting as a transport for it, and then uses the exchanged secret to extract the info it will use to encrypt stuff itself. This means that “normal” usages of a TLS session provided by OpenSSL would not do the trick, as TLS would not be used to encrypt data (just as we don’t use DTLS to encrypt RTP, but only to exchange the keys).
As such, while some tricks exist with some key limitations (as the one described here), a different API would help use the encryption features more easily when used for QUIC. This is exactly what BoringSSL, the OpenSSL fork Chrome maintains (and that it uses for WebRTC purposes as well), added new APIs specifically for that purpose. I couldn’t find specific references to the APIs that wouldn’t be just code, but this post I linked to before explains them in quite some detail.
Now you would have expected OpenSSL to adopt these APIs as well, or something similar. Well, that didn’t happen, unfortunately… someone attempted to contribute support for the BoringSSL QUIC API to OpenSSL, but that was rejected because of a weird decision from OpenSSL itself: they’re planning to write their own QUIC stack instead! If you’re puzzled by this enigmatic decision, you’re not the only one (you can read what Daniel Stenberg, the main author of curl, has to say about that), and in fact this quickly led other libraries to adopt that API instead, as it would give much more flexibility to QUIC developers (especially when integration cryptographic functionality in an external QUIC library). Even Akamai and Microsoft joint efforts and forked OpenSSL to create quictls instead.
Long story short (I know, too late!), for my effort I initially started using BoringSSL for the job, and then moved to quictls since it made it much easier to integrate my library in other projects (spoiler alert: like Janus!).
Starting to write a basic QUIC server
I won’t bother you with all the details, but now that I had all the basic requirements down, I could finally start writing a basic UDP server I could use to exchange QUIC messages with a client. I initially conceived this as a standalone application, with the idea of transforming that to a library further along the process. This meant not only addressing the handshake part (including the negotiation of QUIC transport parameters), but also crafting different types of QUIC packets, containing different QUIC frames.
As to the client part to test this with, I tried a few different options, but the easiest to work with was without a doubt aioquic. In fact, it comes with a couple of client and server implementations that are very easy to get up and running: specifically, I used the DNS-over-QUIC client example as a way to try and get a QUIC connection working in my dumb server.
Properly implementing encryption and protection at the different levels took longer than expected (most of the considerations I provided in the previous sections actually came from these experimentations), but eventually I got them working, and aioquic proved quite helpful here for a couple of different reasons:
- The test client supports saving the exchanged secrets to a keylog file, which allows Wireshark to unprotect and decrypt QUIC traffic as you’re capturing it, making it an invaluable tool to ensure you’re doing things correctly. This was so useful that I added the same feature to my own code.
- The test client can also generate qlog files for all sessions it handles, where the library generates a detailed log with info on what happened during a QUIC session. This was particularly helpful to figure out why, for instance, aioquic would reject or ignore some of my messages (was it an encryption error? a wrong tag? a missing field? etc.). This is something that I’d like to implement myself as well, sooner or later, but that can wait for the moment.
This allowed me to address any mistake I made when dealing with the TLS handshake process. We’ve seen in the previous section how some initial secrets are derived from the first Initial
packet: as you can imagine, the secrets to use at the other levels are not derived that way anymore, but from the TLS handshake instead. Specifically, thanks to the BoringSSL QUIC API and its callbacks, as an application we’re notified about the secrets to use (both for incoming and outgoing messages), so that the key, IV and protection properties can be extracted. The right hashing function is discovered as well, as while it’s always SHA256 when deriving the initial secrets, the cipher negotiated via TLS for the higher levels may dictate a different one.
That said, without boring you with more details than that, I eventually got to the point where I could finalize the TLS handshake and start exchanging data. The aioquic DOQ client then simply sends a STREAM
containing a DNS request, and expects a DNS result in response. To keep things simple and experiment with sending data as well, I simply implemented had my implementation acting as an echo server, meaning that what it got via STREAM
, it would send back via the same STREAM
. Surprisingly, after a few tweaks, eureka, this worked!
Now I could try tackling the next step, that is WebTransport support…
Wait, what’s WebTransport now?
In a nutshell, as the documentation says, “WebTransport API provides a modern update to WebSockets, transmitting data between client and server using HTTP/3”. Unlike WebSockets, since it’s based on HTTP/3 (and so QUIC), it can implement both reliable and unreliable transports (a bit like you can do the same with WebRTC data channels, if you will). Reliable transport is achieved using QUIC streams, while unreliable transport via UDP-like datagrams.
You may wonder what this has to do with anything, and the response is quite simple: WebTransport will actually be at the foundation of a great deal of web functionality, in the future, including (but not limited to) MOQ itself. As such, this means that if I wanted to start experimenting with MOQ, I’d have to go through the steps of implementing WebTransport first.
When you have a look at the WebTransport draft (of which browsers currently only implement version -04
, though), it explains a few important things:
- As we knew, it’s based on HTTP/3, which is only used to “upgrade” the QUIC session to WebTransport, though (a bit like you can upgrade an HTTP session to WebSockets). Specifically, the
CONNECT
request is used for the task, with specific headers to signal WebTransport support. - Once a WebTransport has been established, you can exchange data unreliably via QUIC datagrams, or reliably via QUIC streams.
- Bidirectional streams can only be created by the client; unidirectional streams can be created by both parties.
Exchanging data is not very different from how it works in “pure” QUIC (you just need to pay attention to some custom control codes used in WebTransport to signal what a stream will be for), which means that the tricky part is indeed the connection establishment part, and so the (although very limited) HTTP/3 support that would be needed to make that happen.
To study it in a more proactive way, I once more relied on aioquic, and more specifically on its HTTP/3 server example, that supports WebTransport too. Besides studying the specification, I used this WebTransport client demo to connect to my local aioquic instance (after setting the --ignore-certificate-errors-spki-list
option accordingly to make Chrome happy) and look at the exchanges in practice, which was hugely helpful.
Having a look at HTTP/3
HTTP/3, as the name suggests, is the third major version of the HTTP protocol, and specifically the one that specifies how to map HTTP semantics to the QUIC protocol. I won’t go much into detail on this (especially considering it’s only partly relevant to this blog post), but in a nutshell it revisits the already major refactoring that had been done for HTTP/2 (e.g., in terms of header compression, multiplexing of requests, server push, etc.) and maps that or similar functionality to what QUIC provides out of the box.
Header compression was a big feature of HTTP/2, and is mandatory in HTTP/3. Specifically, a compression format called QPACK is used for the job. While HTTP/3 as a whole is a very complex specification, we only need it to handle a CONNECT request if WebTransport is all we want, which means that we can summarise the initialization in the following steps:
- We setup QUIC as usual, negotiating the
h3
ALPN. - Client and server create three unicast streams each, for the purpose of creating a Control Stream (that will be used to exchange
SETTINGS
) and two more streams to exchange QPACK encoder and decoder data. - The client creates a bidirectional stream to send an HTTP/3 request, that will be the CONNECT we need to establish the WebTransport session.
- The server replies to that request on the same bidirectional stream with an HTTP/3 success response.
- At this point, data can be exchanged.
Now, all this looks a bit convoluted, but it’s not that hard to understand. What’s really a nightmare (to me) is QPACK, which after a month I still don’t understand I initially planned to implement QPACK support myself, but after having a look at the specification, Huffman encoding theory and so on, I quickly figured out that if I wanted something done quickly, implementing it myself was out of the question. As such, to speed up the prototyping process, I decided to leverage the QPACK support in nghttp3 instead, as it comes with some relatively easy to use QPACK encoder and decoder implementations. In the future I may change this, but for the time being there really isn’t any need for that.
Habemus WebTransport!
After taking care of QPACK, a few iterations were all I needed to finally get a WebTransport up and running! Datagrams were what I implemented first, since they were the easiest to address, and then I moved to getting proper streams support working, since all I had done so far was simply echoing back STREAM
data like I had done for the DOQ client demo. This included implementing a proper buffer that would be aware of offsets and gaps, and at the same time knowing when to process the stream data ourselves (e.g., when handling an HTTP/3 control or QPACK encoder/decoder stream) and when to pass the data to the application itself (WebTransport data).
For this initial test, I still instructed my QUIC (now WebTransport) server to echo back the data it received, meaning it would echo back whatever would be sent by clients as a datagram or a stream (of course only sending the message back on bidirectional streams, and not unidirectional ones). The end result is the demo you can see below:
When I first got it working, as you can guess, I was pretty excited! I had some first building blocks to try and do something cool. The first thing I wanted to try and play with was an integration of this library (because it had stopped being a standalone application, by then) in Janus, and more precisely in a new plugin where I could experiment with some basic WebRTC-to-QUIC communication.
Bringing Janus (and WebRTC) in the picture
A dream demo would of course be something like WebRTC audio/video being translated to MOQ and viceversa, but let’s face it, I’ve just started tinkering with QUIC, let’s do some baby steps first
The simplest idea that came to mind was bridging data channels, and so something like the following:
- A WebRTC user creates a data channel-only PeerConnection (no need playing with audio and video yet) with this new plugin.
- The plugin creates a QUIC server associated with that PeerConnection.
- Data that the WebRTC user sends via data channels is relayed as a
DATAGRAM
on the associated QUIC connection, and incomingDATAGRAM
data is relayed via data channels.
As a consequence, this meant implementing a new and very simple plugin that only needed to be able to negotiate (and relay) data channels, while at the same time acting as a controlling application for my barebones QUIC/WebTransport library.
This is actually when I switched from BoringSSL (which is the library I had started from, for the cryptographic functionality of my library) to quictls. In fact, whatever dependency I’d use for my library would have needed to be the same as the one used in Janus, or issues would have occurred (OpenSSL doesn’t know anything about the BoringSSL QUIC API, as we pointed out). I initially thought that using BoringSSL would be the “safest” choice, since we support it in Janus already, but in practice, due to the need of having a shared object version of BoringSSL for my library, this only caused problems, mostly due to stuff that BoringSSL apparently stripped down from its OpenSSL parent. Switching to quictls ended up being much easier, actually, as although the BoringSSL QUIC API it implements is a bit more outdated, it is a fork of OpenSSL (same features) with only the addition of those APIs. As such, all I needed to do was making the configure process in Janus become aware of quictls, so that I could use it for the Janus core too, and I was ready to go.
Implementing the plugin was quite easy as well. As anticipated, I only needed a basic data channel negotiation as far as WebRTC was concerned, and even in terms of plugin APIs, I needed something very simple. I ended up envisaging a single request that a WebRTC user of the plugin can use to specify which port to bind to on the QUIC side, a path to the certificate and key to use, and I was done: I’d then create the QUIC server when negotiating a new PeerConnection, and using the library methods and callbacks, “bridge” the messages back and forth between the two different protocols. The end result is what you can see below, where I used a basic demo page for the WebRTC part, and the same WebTransport demo we saw before for the QUIC part, in order to showcase them interacting with each other:
This is a very basic demo, but you can see how, when we first try to connect to the WebTransport server at that address, it doesn’t work as there isn’t any. As soon as the PeerConnection is created, a WebTransport server is created as well and associated with that, and we can see data channel and WebTransport exchanging messages via Janus and my simple library. Simple, and yet effective!
Cool! What’s next?
Well, a LOT, actually…
While the QUIC related code I wrote “works”, I did cut a lot of corners to get there as soon as possible. I implemented the building and parsing of pretty much every QUIC message out there, but I’m not properly using or reacting to them as I should, or at least not always. The way I implemented ACK is sketchy at best, for instance, and I most definitely don’t support any retransmission yet. The whole loss detection and congestion control part (the famous RFC 9002) I haven’t even started tackling yet, and error management is pretty much non existent as well. Last but not the least, in an ideal world I’d like my library to be able to act as a QUIC client as well (even though I’d mostly use it as a server, obviously), but I haven’t written a single line of code to account for that part, if we exclude the parts that servers and clients share.
And that’s the QUIC stuff alone. For the rest, there’s a lot to work on as well, like breaking the “single thread” paradygm I’ve been using so far to keep things simple: that would not only be important for performance, but for reliability and stability as well. I’ll need to start implementing proper thread safeness (which I’ve pretty much disregarded in my simple tests), better event loop management, mutexes, reference counting, etc., so pretty much everything that made Janus the so robust and efficient tool we all know.
This is why I pointed out, at the beginning, that I’m not ruling out the possibilty that you’ll never see this code brought to the light. Depending on my next steps, I may decide to focus my efforts on existing and more tested implementations instead: time will tell! That said, I definitely like the idea of having our own QUIC implementation out there, and, in time, working on getting it up there with those excellent alternatives that already exist.
Of course, another important next step is starting to have a look at MOQ as well. It’s still an ongoing specification with a lot of moving parts, and yet there’s a lot of movement around it already: there’s an excellent source of information (and demos) on this website, for instance, and big companies like Meta are experimenting with the protocol a lot too (both on the server and client sides). Being able to come up with a MOQ implementation, no matter how incomplete or rudimentary, would be an excellent way to start experimenting with this new technology, possibly even within the context of IETF Hackathons, where most of the work on this (and other QUIC efforts) often takes place.
That’s all, folks!
I hope I didn’t bore you to death with this very long, and possibly too technical (or maybe not technical enough?), blog post. Despite the featured image taken straight from a horror movie, my first month with QUIC was all but horrific: very challenging at times, and sometimes a frustrating experience (especially in those initial weeks trying to figure out the encryption part!), but very exciting at the same time, if we consider the opportunities it opens up. QUIC is definitely not an easy protocol to get into, but when you do, it really is a lot of fun, and I plan to keep on having fun in the months that’ll follow as well!
As anticipated at the beginning, I’ll actually talk about all this in more detail (and probably with some important updates) at the upcoming RTC.ON event, which will take place in Krakow in September: I hope to see you all there!