Audio redundancy in Janus via RED

June 4, 2021 Lorenzo Miniero

A few months ago, I read a very interesting blog post by Philipp Hancke on how Chrome had added experimental support for audio redundancy via RED, and how it actually helped improve audio quality a lot in lossy networks. That post was followed a few weeks later by another interesting article, written by Boriz Grozev from the Jitsi team, on how RED could be used within an SFU context. I was quite intrigued by both writeups since then, but I only found time to look into this myself recently. This post tries to summarize my efforts in that regard and my experience with it, with emphasis on where we are in Janus in that regard, and where we may be going next. You can find the code this blog post refers to in this pull request on the Janus repo.

Wait, didn’t you talk of RED already?

I did! This dates back to a couple of years ago, when I wrote a blog post talking about WebRTC integration of SIP-based real-time text services. In that context, RED is often used to provide redundancy on real-time text streams, and so it made sense for us to implement it in the SIP plugin for the purpose: the same wasn’t needed on the WebRTC side, instead, as text would flow on data channels, and would as such be automatically be transmitted reliably if configured accordingly.

In this case, instead, we’re talking about how to use RED for audio streams, and so on the WebRTC side as well. As such, even though there were parts of code I could partly re-use (e.g., the RED payload parser), the integration effort was considerably different, as it required updates both to the Janus core, and to the initial set of plugins that needed support for the feature.

What’s RED?

RED was standardized more than 20 years ago in RFC 2198, and was initially conceived as a simple RTP payload format to implement Redundant Audio Data. Since then, due to its simplicity and flexibility it ended up being used for way more than just audio: we mentioned text streams already, but it’s also used in the Chrome ulpfec implementation for video, for instance. That said, audio is what it was born for, and so it made sense to see if it could still be of help in the WebRTC ecosystem as well.

As a format, it’s quite trivial: it basically allows you to packetize, within the payload of a single RTP packet, multiple frames that you’d normally send in different RTP packets instead. It performs that by listing a series of block headers at the beginning, each containing some relevant information (including the block length), which are then followed by the actual frame payloads in sequence. The following diagram from the RFC shows this more clearly in a visual way:

    0                   1                    2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3  4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |V=2|P|X| CC=0  |M|      PT     |   sequence number of primary  |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |              timestamp  of primary encoding                   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |           synchronization source (SSRC) identifier            |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |1| block PT=7  |  timestamp offset         |   block length    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |0| block PT=5  |                                               |
   +-+-+-+-+-+-+-+-+                                               +
   |                                                               |
   +                LPC encoded redundant data (PT=7)              +
   |                (14 bytes)                                     |
   +                                               +---------------+
   |                                               |               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+               +
   |                                                               |
   +                                                               +
   |                                                               |
   +                                                               +
   |                                                               |
   +                                                               +
   |                DVI4 encoded primary data (PT=5)               |
   +                (84 bytes, not to scale)                       +
   /                                                               /
   +                                                               +
   |                                                               |
   +                                                               +
   |                                                               |
   +                                               +---------------+
   |                                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

In the diagram above, two frames actually encoded in different ways are packed together, where the first packet in the list is actually an older packet that was transmitted previously (the timestamp offset in the related block header tells how old it is specifically, relatively to the current packet), while the second and last packet contains the current frame instead.

This helps understanding how easy it is, then, to use RED for providing redundancy for audio streams within the context of WebRTC sessions as well. In fact, it allows an endpoint to always attach one or more previously sent audio frames to the one they’re sending now, thus allowing the recipient to easily recover missing frames in case any packet was lost previously: if packet 5 was missed, its payload will be available as part of the RED payload of packet 6 as well; if RED packets contain redundancy for more packets, packet 7 and/or later ones may contain the payload of packet 5 too. As such, it’s clear that RED does provide a powerful mechanism to implement redundancy on audio streams, that may be particularly helpful on lossy networks. The previously mentioned blog post on webrtcHacks contains interesting metrics in that regard, specifically when using different distances (i.e., roughly speaking, how many redundant frames you add to each RED payload) and in different conditions.

At the same time, it’s also clear that this comes at a price, in this case additional overhead, and so increased requirements on bandwidth. In particularly problematic networks, the additional overhead introduced by RED could further contribute to the congestion, or “steal” precious bandwidth that could be used for video streams instead. This is why, in this experimental integration Chrome currently provides, the mechanism is negotiated but not enabled by default, thus leaving it to implementors to decide whether to activate it or not. To test this, at the time of writing you do need to launch Chrome with a specific field trial enabled:

--force-fieldtrials=WebRTC-Audio-Red-For-Opus/Enabled/

Notice that you can apparently configure how many blocks to use in RED by changing the Enabled above so something like, e.g., Enabled-3, but I didn’t test this. While Chrome developers are considering sticking to a distance on 1 (a single redundant packet added to each RED payload), these tests were made with Chrome using a distance of 2.

Integrating RED in Janus

When it came to figuring out how to implement support for RED in Janus, it soon became obvious that this would require changes to different parts of the code, both in the core and in plugins.

First of all, we needed to add a way to negotiate RED. This did require some changes to our SDP management, as we normally assume that, once both peers agree on a codec, that’s the payload type that will be used: with RED involved it’s not that easy, as while you do negotiate RED as a “codec” in SDP, it needs to be associated with another actual codec, since RED simply “packetizes” frames that need to be properly encoded anyway. For instance, SDP attributes like this:

m=audio 9 UDP/TLS/RTP/SAVPF 111
a=rtpmap:111 opus/48000/2

would mean “let’s use Opus to talk”, and use payload type 111 for the purpose, while SDP attributes like this:

m=audio 9 UDP/TLS/RTP/SAVPF 96 111
a=rtpmap:96 red/48000/2
a=rtpmap:111 opus/48000/2

would mean something like “let’s use RED to add redundancy to our Opus conversation”, where RTP packets on the wire would have payload type 96 (RED), while blocks in the RED RTP payload would have payload type 111 instead. This is actually quite important, because endpoints may send both RED and Opus packets at the same time, and the recipient would need to know how to process them; at the same time, knowing the Opus payload type would allow the recipient to figure out what to do with a payload they just extracted from a RED payload.

As such, we had to update our SDP parser to be sure we’d recognize when RED was negotiated and still keep track of which codec a session ended up on, and update our SDP utilities accordingly to allow applications to offer, or accept, RED when programmatically dealing with negotiations.

Once done that, I started testing this in the EchoTest plugin, which always is my first stop when it comes to testing new features: in fact, it provides an easy way to test new negotiation capabilities (the plugin does use the SDP utilities for shaping how the answer should look like), and it’s easy to add configurable API options to tweak features. In this case, I added an option to accept and enable RED, if offered, to check if everything would work as expected and have a look at the traffic. This seemed to do the trick, and I had the chance of seeing what Chrome sent when RED was in use, which looked like this:

First packet was Opus (payload type 111);
Second packet was RED (payload type 96), but “smallish”;
Other packets were RED as well, but all larger and similar in size.

This all made sense, since it basically meant the first audio packet Chrome sent was just a regular Opus frame (no previous frame to add as redundant information), the second would be RED but only contain redundancy for a single additional frame (the only one sent so far), and all others would be RED but each containing redundancy for the two previous frames instead.

Note: at the time of writing, it looks like Chrome will actually change this behaviour in the near future. The changes and the motivations are provided here, and it’s not clear yet if this will also impact the way Chrome will handle incoming packets (e.g., if it will ignore “raw” Opus packets coming in as well).

Taking advantage of this first integration, I extended the Janus recording functionality to be aware of RED as well. This was important, since our janus-pp-rec recordings postprocessor needs to be able to traverse the recorded RTP packets, and extract the media frames to the target container. For audio streams, this is normally relatively easy, because it means just extracting the payload from RTP packets, and then save them to, e.g., an .opus file, but with RED in the picture this meant unpacking RED first, and possibly use the redundant information to fill in the gaps (in case packet loss affected a recording). This allowed me to work on RED parsing first, which (in very verbose mode) led to something like this:

Counted 554 frame packets
Writing .opus file header
  -- Enabling RED decapsulation (pt=96)
  [1] f=1, pt=111, tsoff=960, blen=63
  [2] f=0, pt=111, tsoff=0, blen=TBD.
  >> [1] plen=63
  >> [2] plen=73
  [1] f=1, pt=111, tsoff=1920, blen=63
  [2] f=1, pt=111, tsoff=960, blen=73
  [3] f=0, pt=111, tsoff=0, blen=TBD.
  >> [1] plen=63
  >> [2] plen=73
  >> [3] plen=78
  [1] f=1, pt=111, tsoff=1920, blen=73
  [2] f=1, pt=111, tsoff=960, blen=78
  [3] f=0, pt=111, tsoff=0, blen=TBD.
  >> [1] plen=73
  >> [2] plen=78
  >> [3] plen=73

This confirmed how, apart for the first RTP packet (which was Opus already and so didn’t need unpacking), the first RED packet only contained a single redundant frame (whose timestamp offset was -960), while others had two (with offsets of -960 and -1920 respectively). Just by looking at the block lengths, incidentally, it’s easy to see how eventually each packet contains the primary data of the two packets that were sent just before. Anyway, this allowed me to extend the postprocessor code to take RED into account, if present, and to basically unpack RED before passing the actual frames to the output container (and, as anticipated, using redundant frames in case the original ones were missing).

The next step, at this stage, was an integration in the Record&Play plugin. In fact, that plugin is an easy way to test new features before integrating them in the SFU plugin itself, that is the VideoRoom. It makes it easy to create a PeerConnection to record a contribution (which can be seen as a WebRTC publisher, in SFU terms), and create a different PeerConnection to then playback previously recorded messages (which in this case acts as a WebRTC subscriber instead). What I wanted to assess was what would be needed on the “subscriber” side, in case a recording was performed using RED: in that case, in fact, it’s easy to re-stream the recording with redundant audio for subscribers that do support RED, but not so if subscribers don’t. As such, I took advantage of the newly integrated SDP/RED negotiation features in the Janus core to detect remote support for the feature, in order to decide what to do with the media: for viewers that didn’t support RED (e.g., Firefox trying to watch a RED-enabled recording) I re-used the RED parsing code to extract the primary data, and only send that with the Opus payload type instead. This did the trick, and worked as expected!

At this stage, I had almost all the foundation I needed, except one: the ability to create a RED payload out of a stream of non-RED audio frames instead. In fact, as you can guess (and as explained in the aforementioned Boris’ article on webrtcHacks), when used in an SFU scenario, there are different ways participants may be affected by RED:

RED participant sending media to RED subscribers is easy: the packets can be relayed as-is, since subscribers will be able to consume and use the redundant information added by publishers;
non-RED participant sending media to non-RED subscribers is trivial too: that’s basically how Janus operates right now, so nothing to change there!
RED participant sending media to a non-RED subscriber falls back to the scenario we just described: the ability to playback a RED recording to a RED-unaware viewer using Record&Play gave us the tools to do exactly that, by simply extracting the main data from RED and creating a “regular” RTP packet to send out of that;
non-RED participant sending media to a RED subscriber, instead, could be handled in a couple of different ways: we could indeed just send the packet exactly as it is, as the subscriber should be able to process it (we’ve seen how Chrome always sends a non-RED packet as a started anyway), even though we’d lose the benefit of redundancy in that case; another slightly more complex approach might be to create a RED payload ourselves instead, by basically adding packets we sent the subscriber before as redundancy to the packet we’re sending now.

Since the ability to create a RED packet might be useful in different contexts, I did create a core function to do exactly that, and decided to test it out in the Streaming plugin. As a simple way to rebroadcast plain RTP to WebRTC, this plugin was the perfect playground for such a functionality, as all I needed to add was the ability for subscribers to ask for RED support, which when enabled would have the plugin itself create RED packets out of the plain RTP stream received from, e.g., GStreamer or FFmpeg.

In order to do that, I added an option to enable optional support for RED to new mountpoints. When enabled, this doesn’t mean the Streaming plugin expects RED packets from external sources (which might be an interesting functionality in the future, especially when RTP forwarders will support it), but that it will be able to create RED payloads in case any subscriber support it. Negotiation-wise, I followed the same approach already described for the “Play” part of the Record&Play plugin, meaning that in case a mountpoint has RED enabled, we do offer it, and check if the subscriber negotiated it in their SDP answer. Then, any time we receive an audio RTP packet from outside, we create two different buffers: one that just contains the existing RTP packet (as we did before), and another one that’s actually a RED packetization of this new packet and the two (or one) we received before. These two buffers are then iterated over in the Streaming plugin when it comes to broadcast the packet to viewers, so that the code can choose whether to send the RED packet or the untouched RTP packet via WebRTC, depending on whether or not the subscriber enabled it in negotiation or not.

This worked nicely, which means the packetization code did indeed work as expected! There were a couple of things I ended up thinking about while working on this, though, which should be taken into account in future revisions of this integration, namely:

The RED packetization, in this version, is done by the thread that receives the RTP packets, which means it’s done once per packet for all subscribers that support RED. This was done on purpose, to minimize the impact RED packetization may have in case we need to do it for many subscribers at the same time. Considering a subscriber may join half-stream, the first packet they would receive may be a redundant packet with two older packets they never saw before: not sure if this can confuse browsers, but it didn’t seem to matter in my tests, so it’s probably irrelevant.
Tied to the above, we support a feature called “mountpoint switching”, which allows you to change subscription from one mountpoint to another, all while remaining on the same PeerConnection. When that happens, RTP headers are updated so that it looks like the same stream even though it’s not: this means that, if RED is enabled on both mountpoints (meaning both have their own RED buffers), the first packet a switching subscriber receives from the new mountpoint would contain redundant information for two previous packets that are NOT the one the subscriber actually received shortly before (when they were still subscribed to the previous mountpoint). Again, considering timestamps may be rewritten to take into account the switching time, this may or may not be confusing to browsers, and so may need to be handled somehow (e.g., by sending a regular RTP packet when first switching, and only use the RED buffer after this).
The “mountpoint switching” functionality may also be impacted in case the two mountpoints were configured to use different payload types for the same codec. In that case, a subscriber who negotiated their PeerConnection with mountpoint A would expect Opus to have payload type X; switching to mountpoint B, the payload type for Opus might be Y instead. This is not an issue for “regular” mountpoints, as we can simply overwrite the payload type to put whatever we want instead, but the moment the actual payload type is actually hidden in RED blocks, it becomes more problematic: in fact, it means that those blocks may need to be updated as well before they’re sent to a subscriber who expects a different payload type. This should probably be translated to new helper code in the Janus core utilities, to allow plugins to perform this activity seamlessly (a bit like we do for simulcast and SVC already).
More importantly, the RED packetization logic we implemented in the Streaming plugin is, at the moment, a bit naive: in fact, it assumes a regular stream coming in, with monotonically increasing sequence numbers and no losses. While this may be true in some cases (e.g., a local GStreamer pipeline feeding a colocated Janus instance), it’s definitely not an assumption we can safely rely on, which means that in case of out-of-order packets, for instance, we may end up adding as redundancy information, attached to the current packet, packets that are actually supposed to follow that (e.g., because the packet we just received arrived late). A simple approach may be to reset the RED buffer and packetization any time the sequence number we receive is not the one we expect: the alternative would using a jitter buffer, which we don’t have in the Streaming plugin, though, and we actually don’t want to have, since it’s supposed to be a simple and fast restreamer.

That said, while all these considerations are of course important and will need to be addressed sooner or later, the main purpose of this effort was getting a basic integration of RED in Janus in the first place: once we have a better understanding of how the code works, we can work on improving anything that could actually be an issue, and fine tune its behaviour.

What’s next?

As anticipated, all the work done so far has proven useful to lay the foundation of RED integration in Janus. We now have code to negotiate RED via SDP in a configurable and flexible way, and in case RED is enabled, we have code to both parse RED payloads to extract primary and redundant data, and to craft our own RED payloads instead. We’ve seen how all these features came out, and how they were integrated in some of the existing plugins as a way to validate they function as expected. That said, it’s clear that to see an actual benefit from RED, the functionality should be integrated in two of the most commonly used plugins in Janus, that is the AudioBridge (audio MCU) and VideoRoom (audio/video/data SFU).

Integrating RED in the AudioBridge will probably be a bit of a challenge. In fact, while the ability to parse and craft RED payloads will come in handy, the availability of redundant frames in packets we receive will most likely force us to refactor the way we deal with incoming RTP packets in the AudioBridge plugin. In particular, we’ll probably have to update our buffering mechanism in there to take advantage of redundant frames to fill the gaps, while at the same time changing the way we currently decode audio streams before passing raw frames to the mixer. Sending packets, instead, should be relatively trivial, as the utilities to craft RED payloads when needed means we’ll simply need to keep track of the last few packets we sent to specific participants, before getting rid of them for good. It may be worthwhile to support RED in AudioBridge RTP forwarders as well, even though if used in conjunction with the Streaming plugin this would require changes to that plugin as well, since it currently doesn’t support RED re-broadcasting (as we’ve seen above, we only added plain-RTP to RED rebroadcasting, to test our new RED crafting capabilities).

An integration of RED in the VideoRoom plugin, instead, might be simpler, since as we’ve seen above we mostly need to account for the four different mode of operations that could take place, and the foundation we laid is supposed to help us cover exactly that. As such, the integration will need to mostly take care of configuration aspects (e.g., specify whether or not we’ll allow RED in rooms we create), negotiation (detecting RED support for both publishers and subscribers, in RED-enabled rooms), recording (RED publishers will need to be marked as such, in our MJR recordings) and media relaying (whether or not RED parsing and/or crafting will be needed on a specific path). Of course, the potential issues we discussed when presenting the integration in the Streaming plugin would likely need to be addressed too, in all instances where a non-RED publisher is feeding a RED subscriber. The same considerations made on AudioBridge RTP forwarders apply to the VideoRoom as well, as both plugins implement the same feature pretty much the same way (apart from the content they share).

Finally, since we added new code that touches RTP packets, we definitely need to integrate these new functions (especially the RED parser) to our fuzzing processes. We worked a lot in the past on exactly that, which helped make Janus much more robust to “broken” RTP/RTCP packets, and we definitely wouldn’t want RED to be a weak link that could bring the house down!

If you’re interested in updates on all this, make sure to keep track of the pull request we created for the purpose (PR #2685), as the idea is to only merge it in master once we have a reasonable coverage in most relevant plugins. Should we decide to merge it as it is, good chances are we’ll follow up with a new pull request to address the missing plugins and the required enhancements.

That’s all, folks!

As usual, this ended up being WAY longer than I hoped it would be, but I hope you appreciated the ride anyway. Hopefully this will intrigue you enough to start experimenting with this: in order to ensure this is merged soon, we do need as much feedback as possible, so do play with it (maybe in the worst network you can find :D) and let us know if it does indeed help!

Lorenzo Miniero

I'm getting older but, unlike whisky, I'm not getting any better