Data Channels broadcasting with Janus

November 5, 2018 Lorenzo Miniero

I recently read an announcement from a WebRTC company, celebrating the addition of data channel broadcasting to their solution as an industry first. This made me smile, as Janus has supported data channels almost since day one, which were followed shortly thereafter by integration in most of the plugins for heterogeneous use cases. Anyway, this also made me realize that we probably didn’t do a good enough job at making people aware of this functionality, which is what I’ll try to do in this post. I’ll also try to briefly introduce data channels in the first place, so please indulge me if I start this with a simple question you may already know the answer to…

First of all, what are data channels?

While most associate the concept of WebRTC to a real-time and live exchange of audio/video streams, it is actually not limited to that. Since the very beginning, WebRTC was designed as a framework to actually exchange audio, video and generic data in real-time. Protocol-wise, for audio and video the standard (even though “on steroids”) Real-Time Transport Protocol (RTP/RTCP) was picked: for generic data, instead, the choice was different, and a brand new mechanism was devised instead. This new mechanism is currently being standardized in a dedicated IETF draft.

Thinking about the exchange of generic data, one may wonder why data channels are needed in the first place: in fact, in web development there are many ways of doing that already, e.g., via HTTP (possibly with the aid of long polling or Server-sent events) or WebSockets. Anyway, all those existing solutions are based on TCP, implying a mandatory reliability of the delivery; besides, they need something to bounce the messages between the involved parties, where a peer-to-peer approach might be more advantageous instead. This is exactly where data channels come in: in fact, although the session setup typically requires a server or reference endpoint, WebRTC was envisaged as peer-to-peer for the actual exchange of media after that, and using UDP at its foundation if possible. As such, being able to exchange generic data within the same framework opens the door to many interesting opportunities.

From a technical perspective, data channels can look a bit convoluted. In fact, during the design of the data channels specification, a few key requirements emerged: e.g., secure delivery of the messages, optional reliability if required, congestion control, ability to avoid IP fragmentation, and more. The security aspect was “easily” addressed with Datagram Transport Layer Security (DTLS): in fact, as you know, DTLS is used in WebRTC for the secure exchange of SRTP-related cryptographic information (DTLS-SRTP); as such, the same existing stack could be reused as a transport for this generic data messages as well. All the other requirements, though, required something more complex and comprehensive, which is why the choice eventually fell on the Stream Control Transmission Protocol (SCTP).

Without delving in too many details (this isn’t a networking lesson, after all!), SCTP is a standard protocol that the IETF originally devised as a way to get the best of the TCP and UDP world: message oriented, like UDP; but at the same time potentially reliable and with congestion control capabilities, like TCP. Together with other properties that satisfied additional requirement the specification identified (which we won’t cover for brevity), this basically meant SCTP would be a perfect candidate for the job.

That said, SCTP is a network protocol, but as we anticipated, the idea was to also re-use DTLS as a means for ensuring the security requirements. This meant that, in the design of the data channels specification, the actual protocol stack would look pretty much like this:

What this means is that the application will pretty much interact with an SCTP stack for the purpose of creating data channels and using them. Anyway, the SCTP stack does not have direct access to the network, but is actually encapsulated in DTLS, which basically acts as a “transport”. In turn, DTLS is transported on top of UDP, as usual. This allows data channels to re-use the rest of the WebRTC specification as well, e.g., in terms of connectivity establishment via ICE and secure channel setup (and transport) via DTLS. At the same time, all the properties SCTP has can still be used even in this setup: in fact, while the actual network topology is abstracted out, SCTP packets will eventually still be exchanged between two parties, which will allow all its mechanisms to kick in.

Data channels in Janus

As anticipated, we added support to Janus pretty early in its life. More precisely, it happened around June 2014 (more than 4 years ago!), so about four months after Janus was first released to the world. At the time, the support was quite basic, as it was limited to the EchoTest and VideoCall plugins: little more than a proof-of-concept, if you will. You can learn a bit more about that first step in the dedicated blog post we wrote at the time to introduce the new functionality.

We’ve seen in the previous section how a data channel is basically an SCTP stack on top of a DTLS transport. Considering we already had a DTLS transport in place, to take care of the DTLS-SRTP handshake, all we needed was an SCTP stack that we could use for exchanging messages, and one wouldn’t try to implement networking directly but could be used in an “abstracted” way instead. We ended up choosing usrsctp for a simple reason: everybody else was using it already! 🙂 In fact, it’s the library stack all browsers (well, all except Edge, who never implemented data channels) use to implement data channels. As such, it was basically a no-brainer. Besides, we chose to support string-based data channels only: in fact, while usrsctp as a library definitely supports binary data as well, we decided to keep development on that part on hold, as it wasn’t needed in most of the use cases we were interested in.

Technical considerations aside, we slowly but steadily started integrating it in other plugins as well. At the time of writing, there are several plugins that can negotiate and use data channels:

the EchoTest plugin will bounce back any message you send, by prefixing it with some custom text;
the VideoCall plugin will relay back and forth any message the two peers in the conversation will send each other;
the VideoRoom plugin (the Janus SFU) allows data channel messages to be relayed from a publisher to all its subscribers as well, along audio and video;
the Streaming plugin (basically the broadcasting plugin) does pretty much the same, but with data coming from a generic source that doesn’t need to be WebRTC compliant;
the TextRoom plugin implements the concept of room-based messaging (public and private) over data channels;
both the Lua and Duktape plugins support data channels, meaning you can arbitrarily send and receive messages in a plugin you write there, depending on the custom logic.

This basically means that there are tons of different ways data channels can be used in a Janus-based WebRTC application, depending on which plugin will be used for the job. You may want to add a simple way to do a live chat between some participants; or add some live subtitles or captioning to a stream; or receive live events from a sensor. And this is just about the stock plugins! We briefly touched how Lua and Duktape give you complete freedom in how you use data channels, but there are many third-party plugins out there that use them in other creative ways: e.g., to communicate with non-WebRTC endpoints, or remotely control or monitor non-WebRTC devices like drones. Just to give you an idea, for a “Dangerous Demo” of a few years ago I used data channels as a distributed remote control for a Janus-based SocialTV application; in another I used them in a custom plugin to implement a web-based terminal shell; in yet another I chose them as a way to send ARI commands to Asterisk via the Lua plugin. The possibilities are endless!

Another interesting plus is that these data channel streams don’t need to be limited to WebRTC: in fact, all those streams can also be relayed externally one way or another, and recorded as well. I’ll give a few concrete examples on the relaying part in the next sections, when introducing how data channels are actually used in some of the plugins. For what concerns recording specifically, instead, Janus natively supports recording data channel streams to its custom format .mjr, which can then be postprocessed to a SubRip subtitle file using the janus-pp-rec tool: while subtitles may make more sense in some contexts than others, they still provide an easy way to envision time-aware messaging (which also makes them super-easy to replay in regular media players!).

Considering the main topic of this blog post is broadcasting, we’ll focus on a subset of those plugins: TextRoom, VideoRoom and Streaming plugin respectively. This will hopefully give a clearer idea on how they differ from each other, and how they can be used in real applications.

TextRoom plugin

The TextRoom plugin, as the very original name suggests, basically implements text-based messaging over data channels. More specifically, it allows you to create and/or join multiple “chatrooms” on the same data channel, and exchange public and/or private messages with the other participants. This all happens using a custom JSON-based protocol with a defined syntax for requests, responses/errors and events. It’s important to point out that this protocol is NOT the same as Janus API: it is a protocol that only makes sense to the plugin itself.

As it is, it allows you to do pretty much whatever you’d expect from a chat service. As a consequence, it also has all the knobs to implement some form of text broadcasting as well: in fact, assuming all the recipients join the same “group”, sending a public message will automatically ensure all people in the same group will receive it.

As anticipated, private messaging is available as well, which can be used for some more “drill-down” broadcasting: in fact, private messaging is not limited to a single recipient, but can actually address more at the same time.

Another interesting property this plugin has is its “forward externally” functionality. Very simply, when properly configured, this plugin can make sure that any message addressed to a specific room is forwarded to an external web server via HTTP. This can be useful for different use cases: e.g., for archiving purposes, where an external component tracks all exchanged messages; or as a simple and effective way of relaying commands from a browser to a non-WebRTC component (e.g., for IoT purposes). At the time of writing, though, there’s no way to go the opposite way, that is injecting external messages into a data channel only conversation.

That said, depending on the scope of the target application this plugin has, as expected, pros and cons. The fact you can use the same data channel connection for sending and receiving messages to and from multiple groups indeed makes it very compact and lightweight: while we haven’t carried out any specific measurement of this plugin performance, we expect it would perform quite well with high volumes as well. Anyway, the lack of a way to inject external messages may limit its scalability to what a single Janus instance can provide: this means that, to really scale, distribution would need to be taken care of at the messaging source, i.e., having the sender of the message send the same message to multiple Janus instances at the same time.

Depending on the use case, another limitation may be seen in what is actually one of the plugin strengths, that is, its bidirectional nature: the plugin inherently allows you to send and receive messages on the same channel, which in most cases is a pro. Anyway, if the scope of the application is plain broadcasting, a mono-directional flow might be a better, and more optimized approach. This is what the other two plugins we’ll talk about in this post will deal with.

VideoRoom plugin (SFU)

The VideoRoom plugin is what implements the SFU (Selective Forwarding Unit) functionality in Janus. It is based on a publish/subscribe approach: people contributing their media in a room, become a feed other users can subscribe to. This means that this plugin is basically a collection of monodirectional PeerConnections: some will only be used for receiving media from publishers, and some only for sending this incoming media to the related subscribers instead. Unsurprisingly enough, this is the most popular and widely used plugin in Janus, as it helps with a huge variety of use cases: conferencing, e-learning, collaboration, broadcasting, etc.

While the vast majority of applications using this plugin only involve it for audio and video streams, the VideoRoom plugin does support data channels as well. The way it works is very much in line with how audio and video streams work: publishers can choose what to publish (audio, video, data), and whatever they’re publishing people can subscribe to. This allows for an easy way to broadcast generic data along a live audio/video stream, or even generic data alone, in a mono-directional way: whatever message the publisher will send, all the subscribers will receive, and on the same PeerConnection they’re already using.

Using this functionality is quite straightforward. If you already used the VideoRoom plugin in the past, nothing changes in terms of room creation or management: the only thing you need to do to take advantage of data channels as well, is ensuring that publishers will negotiate them when creating a publisher connection in the first place. If a publisher negotiates data channels, and Janus is built with data channels support, then one will be automatically set up between the two, allowing the publisher to send messages to the plugin. Then, depending on who’s subscribed to the media, the message will be relayed accordingly.

As with the TextRoom plugin, there are pros and cons depending on the target scenario. Unlike what happens in the TextRoom plugin, data channels here are monodirectional and not bidirectional (or to be more precise, they are bidirectional, but they’re only used in one direction): this may be an advantage in some cases, and may be a disadvantage in others (e.g., whenever subscribers are supposed to be able to send something back). In that case, either the TextRoom plugin or an out-of-band mechanism might be a better option. That said, though, scaling is definitely possible using this other approach instead: in fact, using the so called RTP forwarders (which do more than just handle RTP, but whatever…), it’s quite easy to relay incoming messages externally via UDP. This can be used either to share the load across multiple Janus instances (e.g., with the help of the plugin we’ll see in the next section) or, as was the case with the TextRoom, to give external components access to the messages sent by the publisher for other needs.

Streaming plugin (broadcasting)

The Streaming plugin is a plugin we conceived for broadcasting purposes. More precisely, its main aim is accepting incoming, non-WebRTC, streams, and turn them into a WebRTC broadcast. A simple use case is a generic media application generating a plain RTP stream: if this stream is sent to the Streaming plugin, then it will allow multiple subscribers to receive the same stream, but within the context of a WebRTC PeerConnection itself. In its simplicity, this plugin is very powerful and effective. From a functional perspective, it might be seen as a subset of the VideoRoom plugin, where only subscribers are implemented: the publishers part is assumed to be out of scope to the plugin, meaning that the actual media may come from a WebRTC source (e.g., using the RTP forwarders we introduced before) or more than likely not (e.g., ffmpeg, gstreamer, or others).

As for the VideoRoom, the Streaming plugin is not limited to audio and video streams. Data channels are supported as well as part of the streams you can broadcast. Unlike what happens in the VideoRoom plugin, though, support for data channels must be explicitly declared when configuring new mountpoints. A “mountpoint” is the concept the Streaming plugin uses to create a new broadcast: it is identified by a unique identifier, some details about the media that will be broadcast (e.g., codecs and custom SDP attributes), and ports to bind to in order to receive the media externally.

The Streaming plugin configuration happens in the janus.plugin.streaming.cfg file, which is where static mountpoints are created: dynamic mountpoints can be created via API as well, as explained in the documentation. One of the properties that can be configured when creating a new mountpoint is indeed whether or not data channels will need to be negotiated, and what port to listen on for messages to relay. The port is especially important, since as explained the Streaming plugin will only relay something that it received externally: for data channels, plain UDP messages will be used as a source for the broadcasting, which means that each UDP datagram that is received on the port configured for data will be relayed as a data channel message to all the subscribers.

Just as an example, let’s try to create a data channel-only mountpoint:

[data-example]
type = rtp
id = 15
description = Datachannel stream from an UDP source
audio = no
video = no
data = yes
dataport = 5008

We’re basically saying the mountpoint won’t have audio or video, only data: the “rtp” type here is a bit counter-intuitive, as we won’t use RTP in this case, but in the Streaming plugin lingo it just means we’ll do a live streaming with data originated externally. We’re binding to port 5008, which means that any message we’ll get on that port we’ll rebroadcast via data channels. The mountpoint will then appear in the list of streams available in the demo:

A simple way to check if this is working is by “feeding” the mountpoint with some messages via UDP, e.g., using the popular netcat tool:

nc -u localhost 5008

Any message we type there will be sent as a datagram to the specified address, which means the Streaming plugin will get it and restream it via data channels to all the WebRTC subscribers:

Of course, there will be more you want to do with the feature, but as a simple example this does the trick.

For what concerns the pros and cons, they’re basically the same as in the VideoRoom with respect to the directionality of streams: in fact, the Streaming plugin is most definitely a monodirectional kind of plugin, which was conceived to always send and never receive; this means that, while technically users could send messages back along the datachannel used to receive from a mountpoint, the plugin would simply drop and ignore them. Again, this may or may not be an issue, depending on the use case. A huge pro comes from the scalability part: in fact, considering the Streaming plugin does nothing more than relaying something that comes from outside, it’s relatively easy to ensure that multiple instances receive the same data, which means that subscribers can get the same information no matter where they connect to. A common example is a VideoRoom publisher feeding remote Streaming plugin instances with audio, video and/or data:

That said, considering the plugin uses UDP datagrams as a source for messages to relay, this means that the maximum size of each data channel message is limited to the size these UDP datagrams can have. Typical MTUs might indeed limit the size of messages to ~1400 bytes.

… but what about QUIC?

While data channels are quite widespread, and used in several different scenarios, they are not universally loved in their current form. Some have complained about the lack of features or control they provide in some cases, while others lamented how complex they allegedly are to implement. While not all agree and there are voices on both sides of these arguments, this has led the standardization community to look for alternatives, which is what QUIC is partly about. That said, I don’t expect the concepts I’ve gone through in this article would change much: it would simply be a different way of exchanging data in real-time, which means that as soon as it ends up in Janus, the same considerations will very likely apply.

Anyway, WebRTC+QUIC is a topic that goes well beyond data channels per se, and is probably a matter for a different post entirely! If you’re interested in the subject, there are several articles out there already that provide a good introduction to what the objectives are.

That’s all folks!

I hope you enjoyed this little overview, and that you’ll have fun experimenting with these apparently less known features. I’m looking forward to your thoughts!

Lorenzo Miniero

I'm getting older but, unlike whisky, I'm not getting any better