It’s been a few weeks since we went quickly through most of the things cooking in Janus. We did a quick overview on a lot of different changes and enhancements, but I felt it was time to dig a bit deeper in one topic in particular, that is the efforts we’ve spent on getting Janus to support AV1 and H.265.
A new codec war?
If you’re as old as I am (in WebRTC years, at least), you’ll probably remember the “First Codec War”: the one that had VP8 and H.264 face each other in a battle to the death. Eventually, neither died, though… both actually ended up becoming MTI (mandatory-to-implement), which means all WebRTC endpoints are supposed to implement both of them. And, believe it or not, it really happened! Pretty sure very few people expected Apple to actually add support for VP8, for instance (I for sure didn’t), and yet here we are.
That said, while VP8 and H.264 are “friends”, now, they’re both “old” codecs. When it comes to new functionality, for efficiency, support for features like SVC and things like that, most people are actually looking at newer codecs instead, and the new boys in town are mainly two: AV1 and H.265 (or HEVC if you will). And just as VP8 and H.264, one was conceived to be royalty free (AV1), and the other is a hellish patent-encumbered nightmare (H.265). Wonder which one I prefer?
Needless to say, both codecs became of interest to WebRTC implementers all around the world. I won’t delve into more details about the codecs themselves, on their main features or their status: you can find some excellent information already in a few blog posts, like this overview by Dr. Alex, or this analysis by Tsahi. The focus of this blog post will be on their integration in our Janus WebRTC Server, and what we had to change in order to get them working properly.
Codecs and Janus
Janus is a general purpose WebRTC server, which means that different plugins may actually have different requirements when it comes to different codecs. Some may just need the ability to negotiate a codec and/or route packets, while others may actually need to go deeper than that.
Most of the Janus plugins don’t transcode media packets, but just route them around depending on the logic each plugin implements: in that case, Janus mostly needs to be aware of the codec, which roughly translates to being able to properly negotiate the codec in the SDP and allow the codec to be used in a PeerConnection. Everything else is typically up to plugins themselves (e.g., the AudioBridge actually decoding audio packets, or generating audio packets of its own).
At the same time, though, even when just relaying media a bit more awareness may be needed when it comes to video codecs. More specifically, any time we work on adding support for a new codec we take care of two additional steps:
- some way of detecting a keyframe, by looking at the RTP packets that go through the server;
- a way to extract the media frames from the RTP packets (RTP depacketization).
The former is mostly needed to know when some actions can be performed. If the media source of a PeerConnection is changing, for instance (e.g., because simulcast is in use), we may need to know when a keyframe has arrived, so that we can actually perform the switch, and thus allow the recipient to properly decode the new stream. The latter is only needed for recordings: in Janus, recordings are basically a structured dump of RTP packets, which we can then “post-process” via an RTP depacketization that, without touching the media at all, can transform the recorded packets to a playable media file (e.g., an mp4 file).
In a nutshell, both features are actually quite important for a few reasons, and both have their own challenges. This is exactly what we had to deal with when we decided to add support for both AV1 and H.265, which we did in a dedicated branch. While neither codec is actually widespread right now, or even really usable at the moment, we know they’ll be in the future, so we felt it made sense to anticipate that effort and be ready for when they’d be.
As anticipated, AV1 has been conceived from the outset as a royalty free codec. If you want to learn more about the codec and who’s behind it (big names, as you’ll see!) you can have a look at the Get Started page on the AOMedia website.
What’s relevant for this blog post is how AV1 related to WebRTC. As all codecs, in order to be used within the context of WebRTC AV1 first of all needed a way to be negotiated in the SDP and a set of RTP packetization rules. Normally, this is a process that happens within the context of IETF activities (in the MMUSIC and AVTCORE working groups, specifically). That said, the process was a bit different for AV1 so far: while a draft that specs both exists, it’s currently being worked on in a separate context; you can follow the development on the related GitHub repo. That said, that info was enough to cover the first requirement, that is making the Janus SDP stack be aware of AV1 as a codec, when offered or negotiated.
Of course, a set of rules alone isn’t enough to get AV1 working in WebRTC. You also need those rules to be actually implemented, e.g., in a browser or another endpoint. Luckily for us all, our pals at CoSMo Software, who were involved in the AV1-RTP design process from the get-go, shared a lot of information and code they worked on to help make that happen. Specifically, they shared a preliminary integration of AV1 in libwebrtc, and made that available both in a custom Chromium build, and in the libwebrtc peerconnection examples. Unfortunately, the Chromium build didn’t work for me: luckily enough, though, the precompiled examples did, and as I’ll explain they proved invaluable in testing and prototyping my AV1 related efforts, so kudos to the folks at CoSMo for making that available!
More precisely, I took advantage of the precompiled
peerconnection_server examples libwebrtc provides out of the box: in this custom build, these examples were modified by CoSMo to offer AV1 as well. The way these examples work is quite simple: the server acts as a basic signalling server (using HTTP long polling for bidirectional communication), and one or more clients can register to the server; then at any time one of the clients can decide to call another, which will have them exchange SDP and candidates through the server. The end result is a native window on both sides presenting the local and remote videos using a picture-in-picture layout.
Of course, what I wanted to test was getting one of these clients to setup an AV1 PeerConnection with Janus, rather than another precompiled client: this meant that, in order to leverage these implementations, I had to reverse-engineer the signalling they implemented, to get them to talk to Janus instead. What I ended up doing was creating a “fake”
peerconnection_server implementation in node.js: this “fake” server would allow real
peerconnection_client instances to register, but would at the same time always only advertise the presence of a single, fake as well, user in the list, which would be mapped to an EchoTest session in Janus. This way, any time one of the clients decided to start a call with the “fake” user, a Janus API session could be established: SDPs and candidates would be exchanged back and forth, thus allowing the real
peerconnection_client instance to create a PeerConnection in Janus.
This allowed me to do a few important things: first of all, it allowed me to confirm the integration was working (as you can see in the screenshot below), but most importantly to also perform an unencrypted capture of a real AV1 session on RTP for further testing.
Having access to actual AV1/RTP packets was actually fundamental for the next step, that is figuring out how to detect keyframes and, most importantly, convert the captured packets to a playable media file. The latter in particular was quite challenging, due to some ambiguities in the AV1/RTP specification and the cryptic (to me) nature of AV1 Sequence Headers (which is where some metadata like the resolution of the bitstream are stored), but eventually I got it working. The screenshot below, for instance, shows the mplayer application playing a media file that was converted using the Janus post-processor tool out of an AV1 recording.
Eureka! AV1 in Janus, with support for recordings too! Does this mean we’re done? Well, maybe for now, but actually there will be more work to do in the future… More precisely, the AV1/RTP specification documents a new companion RTP extension that should make it easier for SFUs and other intermediaries to deal with AV1 streams: the existing implementations don’t support it yet, but they will, which means we’ll have to take care of that part too. Besides, one of the key factors of the importance AV1 will have in WebRTC is its native SVC support: that will be quite a challenge of its own (one that we partly dealt with for VP9, some time ago), and needs to be implemented in the client side as well before we’ll have to worry about it.
Yeah, yeah, but what about H.265?
If you love patents, you may interested to learn more about the state of H.265 in WebRTC as well. There’s actually been quite a lot of movement on that too.
As explained in this blog post by Dr. Alex, it looks like Intel had actually a partial implementation already, which was translated to a better integration in libwebrtc as part of a joint effort from CoSMo, Apple and Intel itself during a recent IETF Hackathon (god, I miss those!). As a result, H.265 is now available in WebRTC on Safari, as long as you use the TP (Technology Preview) and you enable it manually in the experimental settings.
When it comes to how that works, it’s worth mentioning that first of all, unlike AV1, the standardization process for the RTP packetization of H.265 actually took place within the IETF, and there’s an RFC available (RFC 7798) documenting it. Besides, negotiating a basic H.265 session in the SDP is actually quite similar to how it works for H.264, so that part was easy enough to take care of.
That said, despite an implementation being available, I unfortunately could not test it myself. I’m a Linux user and don’t have access to a Mac OS machine, and being stuck at home due to the global pandemic we’re all living in right now, I couldn’t get the one we usually use for testing at the office either. Luckily for me, the Janus community came to the rescue, and did some testing for me: someone confirmed the EchoTest worked as expected when using Safari and forcing H.265, and also provided me with an unencrypted capture of the exchanged RTP packets I could dig into.
As for the AV1 sample, this capture was quite precious in experimenting with an example of an actual H.265 RTP packetization, and eventually allowed me to take care of post-processing H.265 recordings as well: I won’t share a snapshot of such a recording because it’s not a video I took myself and the user may disagree, but you can trust me if I say it works! And if you don’t, just compile the branch yourself and give it a try.
In case you’re curious (I know you’re not, but I have to share this), exactly as with the AV1 Sequence Header, and the H.264 SPS before that, most of the time was actually spent in trying to make sense of the H.265 SPS, which is where information like the video width and height are (which we need for filling in the metadata of the playable files we generate). Not sure who came up with such an abstruse format, or why they keep on using it in different shapes, but I’m pretty sure that whoever that is must hate humanity quite badly…
Considering the current status of both codecs in the existing client side implementations, the Janus branch is ready, so I’m planning to merge it soon: we also made sure most, if not all, plugins are aware of them, which means you could potentially use both of them in the VideoRoom SFU, for instance. That said, they very likely won’t be used much for quite some time (if not because of their limited availability in browsers out there), but it will be useful to have them supported for then people will start playing with them. Of course, the plan is to keep track of how they evolve: we mentioned the custom extension for AV1, or SVC, and both are more than worth having ready as soon as they’re available.
Until that happens, give it a go, and happy hacking!