After three years of work (the original pull request was first opened in December 2018) we finally merged the multistream branch in Janus! Considering this was a huge refactoring, we chose a special day for the occasion, that is the 8th birthday of Janus itself, whose first commit appeared in GitHub on the 11th of February of 2014 (time flies when you’re having fun!).
That’s all good, but some of you may be wondering: what the heck is this “multistream” thing you’ve been blabbering about anyway?! I made a long presentation about all that a couple of years ago at CommCon, but I thought I’d take advantage of this post to further explain what it is, what changed, how you can still use the previous version Janus if you don’t want to update yet, and what we may be doing next.
What do we mean by multistream?
To keep it short, by “multistream” we mean the ability to put more than one audio and/or video track in the same PeerConnection.
As you probably know if you’ve used Janus before, in fact, PeerConnections in Janus have been limited since day one to a single media line of the same time: so maximum one audio, one video and one data channel per PeerConnection. This was a design choice made at the beginning for a couple of reasons: first of all for the sake of simplicity, but most importantly because at the time there wasn’t a “standard” way of doing this that would work across different browsers. Specifically, you had:
- Plan B, originally implemented by Chrome, where you’d use a single m-line for all streams of the same type (so one audio m-line for 1-N audio streams, and/or one video m-line for 1-N video streams);
- Unified Plan, originally implemented by Firefox, where you’d use a separate m-line for each media stream instead (using custom attributes like mid/rid to identify them on the wire);
- No Plan, a proposed attempt to reduce the number of offers/answers that, to my knowledge, was never implemented.
This confusing landscape (which was actually even more confusing than the summary I made; just check this old webrtcHacks blog post!) meant that, if you wanted to support multiple streams in the same PeerConnection, you had to implement both Plan B and Unified Plan, and be prepared to translate to/from each other whenever getting incompatible endpoints to interact with each other. Needless to say, that was something I had zero willingness of doing, especially considering I knew for a fact only one would be adopted eventually, and so I really didn’t want to spend a lot of time and effort on something that would have to be abandoned anyway sooner or later.
As such, we chose to stick to one audio/one video for a long time, which could be considered limiting at times, but never really felt like that to us and all the developers that chose Janus for their applications. Even using different PeerConnections, in fact, you could still implement pretty much any kind of scenario you had in mind, at the expense of some additional overhead in setting up PeerConnections: besides, considering the general purpose nature of Janus and its modular approach to media management plugins, different PeerConnections would often be needed anyway whenever different plugins were to be involved at the same time.
Eventually, Unified Plan was chosen as the way to go, and after some time both Firefox and Chrome had working (and interoperable) implementations, which convinced us it was finally time to start looking into this and get to work. And thanks to a generous sponsorship by Highfive and fellow Italians Zextras, we had the opportunity of focusing on exactly that! Unfortunately it ended up taking us a while, and definitely a bit longer than I hoped, but the main reason was that, although we managed to make the changes fairly quickly, they still had to be tested (not only by us, but also and most importantly by Janus users), and the longer the whole process took, the more conflicts would appear between the main branch of Janus (over which we kept a consistent and furious development cycle; that’s what you get to be as agile as we are! ) and the several changes on the multistream branch. But eventually we did it, so here we are!
What did this change mean to Janus?
From an architecture point of view, considering the 1-1 relationship between our so-called “handles” and the PeerConnection they own/control, internally a PeerConnection from a “legacy” version of Janus could be abstracted as seen in the picture below: a PeerConnection implementing an ICE agent with a single stream and a single component (media is always bundled in Janus), and the component responsible and aware of a single audio, video and/or data channel. This strict constraint meant some assumptions could be made with respect to, for instance, data structures that had to be associated to any of the media involved, knowing there would never be mode than one of the same kind.
With the multistream support we just added to Janus, the internal architecture changed considerably, since it had to accommodate an undetermined number of media streams of arbitrary kind, and make sure plugins could use and take advantage of any of them. This abstraction can be summarized in the updated diagram depicted below.
As you can see, a PeerConnection is now represented as a collection of “medium” instances, each referencing a specific media stream as negotiated as part of an SDP exchange. Data channels are still represented as an entity of their own, since you can’t really negotiate more than one data channel m-line via SDP anyway (you’d use multiple streams/labels over the same data channel for the purpose), but they still have a corresponding medium.
This obviously required a considerable refactoring in the Janus core, due to the different addressing mechanism required for streams, which had as a consequence a partial impact on plugins as well, due to their need to be aware of how to address streams in the first place. The main areas that required changes were the following:
- SDP parsing and generation utilities in the core, which are used by plugins as well, and so needed a substantial update to be aware of multiple media streams to negotiate effectively;
- support for (sending and receiving) multiple streams in the core, in order to get rid of the hardcoded references to audio and video we had in our data structures before, and allow for a seamless routing and addressing of arbitrary streams instead;
- related to the above, support for (sending and receiving) multiple streams in plugins as well, since each of them might have to be able to identify which stream to operate on specifically;
- client side changes as well, of course (not strictly speaking Janus related, but still relevant, due to the demos we provide as examples).
Without bothering you too much with technical details (you can check the previously mentioned CommCon presentation or slides for some more in-depth information), this eventually meant using the medium index (as in SDP order) and/or
mid attributes to uniquely address streams in the Janus core, and plugins as well accordingly: considering both client and server are aware of both from the negotiation process, this makes it easy to uniquely address and identify a stream.
What about the impact on plugins?
Obviously updating Janus to support multistream was only the first step: in fact, considering it’s actually plugins that are responsible of any kind of media management (what to send/receive and how), it was important to ensure they’d be aware of this considerable refactoring. This meant making sure they’d use the updated SDP utilities, and support the additional streams that may be negotiated, if possible.
Eventually, while we made sure all plugins would use the updated primitives, not all of them were updated to really take advantage of this new multistream support. This was a choice dictated by the limited scope of some of the plugins: in fact, while it made a lot of sense to support multistream in, e.g., the EchoTest, Streaming and VideoRoom plugins, there wasn’t a real need to extend this functionality to the other plugins as well, either because there wasn’t a real need for that (e.g., AudioBridge, TextRoom, VoiceMail), or because it would only provide a small added value (e.g., SIP, NoSIP, Record&Play). That said, we do plan to update some of the other plugins in the future: the Lua and Duktape plugins, for instance, would definitely benefit from having access to this new functionality, especially considering their flexibility in terms of how quickly new plugin logic can be implemented via their scripting capabilities.
Upgrading the plugins that we did update did take some work, but was also quite fun to do. The following pictures, for instance, show how the EchoTest demo could be used to involve multiple video streams at the same time, some of them coming from different sources.
The EchoTest plugin is always my plugin of choice any time I have to prototype something new, but of course other plugins provided more rewarding and less silly results. The integration in the Streaming plugin, for instance, allowed for the configuration of mountpoints with an arbitrary number of media streams, something that would be interesting in the context, for instance, of multicamera setups: the following picture is a snapshot of the Streaming plugin demo on the official website, which has a synchronized dual camera capture of me goofing around, all played in a loop.
The impact on the VideoRoom plugin, instead, is less immediate to present from a visual perspective. In fact, a meeting involving multiple active participants will look the same whether different PeerConnections are used, or a single one is used instead (which is the main reason why Janus didn’t really suffer from the lack of multistream support so far). It is indeed worth spending some time on how the VideoRoom plugin changed internally, though, as this did have an impact on how you talk to the plugin in the first place for instance.
In case you don’t know, VideoRoom is the plugin that implements the SFU behaviour in Janus. As such, it’s meant to provide an easy publish/subscribe mechanism that allows participants to publish their media within the context of a room: this then becomes a stream other participants can subscribe to, if they want. This flexibility on when/how to publish and/or subscribe means it’s one of the most commonly used plugins in existing Janus-based applications, since it’s very easy to use as foundation for conferencing, meetings, e-learning, webinars and so on, or even just as a very simple way to implement generic WebRTC ingestion (which thanks to RTP forwarders can then be used to pass WebRTC streams to external applications, e.g., for remote processing of any kind), e.g., for WHIP. The way this works in the “legacy” version of Janus can be summarized in the following picture, where each arrow is a separate PeerConnection.
This means that, in a 3-person conference as the one one depicted above, Janus is responsible of 9 PeerConnections, 3 for each participant. Specifically, each participant uses one PeerConnection to publish their media (audio, video and/or data), while they use a separate PeerConnection for each of the remote participants they want to subscribe to. As a consequence, depending on the media topology (in terms of how many participants publish their streams, and how many subscribers each participant has), this may result in a high number of PeerConnections: in the “worst” case (everyone publishing, and everyone subscribing to everyone), the number of PeerConnections will grow exponentially with the number of participants (so 9 for 3 participants, 16 for 4 participants, etc.).
The multistream version of the VideoRoom plugin adds support for a different way of publishing and subscribing to streams: in fact, while it still allows using PeerConnections for a single audio/video stream just as before (e.g., to facilitate the migration from a legacy version of Janus), it also allows the grouping of multiple streams over the same PeerConnection instead. More specifically, it allows participants to use a single PeerConnection to send all their contributions, and a different single PeerConnection to receive all contributions from other participants instead; the main reason not to use a single PeerConnection for both sending and receiving is to avoid issues like glare, especially during renegotiations, and besides keeping incoming and outgoing streams separate helps making it all simpler to handle. As such, when used properly, this means a participant can use up to two PeerConnections independently of how many participants are present. This is explained visually in the following diagrams, which show how this differs from the “legacy” approach instead, where each green box represents a PeerConnection that may actually wrap multiple streams at the same time.
Of course it’s important to point out that, while the number of PeerConnections can indeed be drastically reduced this way (thus helping reduce the overhead in terms of network resources, and speeding up session updates thanks to the pre-existing media channel), this does nothing with respect to the exponential growth in bandwidth requirements as numbers increase. In fact, if in the previous case for 4 active users we’d have 16 PeerConnections, with each user subscribing to 3 different streams using 3 different PeerConnections, with this new approach we can cut the number of PeerConnections down (still just one for all three subscriptions), but that single PeerConnection will still need to send three different streams to the subscriber, which means the same amount of audio/video data is involved nevertheless.
Ok, you convinced me, how do I use it?
Switching to the multistream version of Janus should be mostly painless, as where possible we’ve tried to keep backwards compatibility with the previous version. Of course, if you want to use the new features (e.g., the multistream functionality of the VideoRoom), then you will have to learn how to use the new APIs instead, as otherwise the legacy APIs will fallback to the previous approach.
The main changes you may need to be aware of are:
- Plugin API changes (e.g., in Streaming or VideoRoom) to use the new features;
janus.jschanges to address multiple streams (new callbacks and/or signature changes).
For what concerns the Janus API itself, in fact, nothing changes so you don’t need to worry about that: you create sessions and handles the same way as before, and the same applies to negotiating and updating media sessions via SDP (where you may now negotiate more streams than before, obviously). The main changes in Janus itself, as a consequence, are in how you talk to a plugin (the plugin-specific APIs, that is), and in some cases how you configure some of the plugin resources in configuration files (e.g., creating Streaming plugin mountpoints). At the same time, since WebRTC clients may have to work with PeerConnections that include multiple streams of the same type, the
janus.js library was updated as well, which introduced some changes in the methods and callbacks: some changed name, while others only had a change in signature to address specific streams.
Considering the main purpose of this post is just introducing this new feature from a high level perspective, I won’t go in detail on the changes (you may want to refer to the updated plugins documentation and the updated demos for that), but I’ll summarize some of the key changes.
As anticipated, mountpoints can now include multiple audio/video streams, rather than just one per type as before. This means that, while the legacy syntax for creating mountpoints (statically or dynamically) is still supported, that one will be limited to the creation of mountpoints of the old kind: in case you want to create a multistream mountpoint, you’ll need to use the new syntax instead, which is based on providing an array of streams you want the mountpoint to serve, with info on each stream in the respective index. As you can see looking at the sample in the configuration file or the plugin documentation, the new format is quite straightforward, and much more flexible than the one we had before. The online demos do come with a multistream mountpoint (called “Multistream test” in the list), if you want to check how it works in practice, and the repo also contains a sample script called test_gstreamer1_multistream.sh to demonstrate how you can feed such a multistream mountpoint from external applications like GStreamer.
Of course, just as in the previous version of Janus, subscribers can choose whether they want to subscribe to all the streams in a mountpoint, or only a subset of them.
This plugin is the one that saw the most changes as part of the multistream effort, and that was to be expected, considering it implements the SFU functionality so many developers rely upon. Just as the Streaming plugin, the VideoRoom plugin also preserves the legacy syntax to publish and subscribe in a room, but again, just as in the Streaming plugin, this will result in a limited experience, since it will prevent you from taking advantage of multiple streams per PeerConnection. In order to fully leverage the new functionality, you should learn how to use the new API instead, which was conceived to be much more flexible and verbose as well (e.g., in order to somehow “describe” the different streams a participant may be publishing).
Specifically, since as anticipated the new VideoRoom version starts from the assumption that a single PeerConnection can be used by a participant to send whatever they want, and another single PeerConnection can be used to receive whatever they want, this is reflected in the two different subsets of APIs that are now available:
- publishers can publish media exactly as before (so via “join”, “joinandconfigure”, “configure” and/or “publish”) but with a twist, as they can now also provide a “description” array that contains a verbose description of the streams they’re publishing, indexed by the respective ID: this allows publishers, for instance, to tag the first video stream as “My webcam”, and another one as “Screen” instead, so that interested subscribers are aware of what each stream refers to, and possibly use this information for presentation purposes as well;
- subscribers, instead, now use two new different API requests after a “join”, called “subscribe” and “unsubscribe”: these methods can use any combination of publisher IDs and mid attributes to selectively subscribe to whatever they’re interested in, from a single publisher or more than one; any call to either may or may not result in an updated SDP offer coming from Janus, depending on whether the request resulted in a media change in the first place (to avoid glare, multiple changes could be sent in a single SDP update, e.g., in case several changes were requested in a short time and in between updates); the related events will also include information related to the stream they refer to, if provided by publishers.
While familiarizing with these new API concepts may take a minute, you’ll soon find out that they’re much more flexible than the one we had before. Make sure you check the updated documentation, and have a look at the new demo we created (the old one still exists, and they’re interoperable) that implements a simple video conference using multistream.
janus.js and demos
As anticipated, in order to make sure web applications could somehow have access to multiple streams, we had to change
janus.js a bit, and as a consequence update all demos accordingly. The changes were mostly cosmetic in most demos, and a bit deeper in others, depending on whether or not they actually did make use of this new multistream functionality in the first place.
The biggest change is that the old
onremotestream callbacks have gone, and have been replaced by
onremotetrack instead: both are invoked any time a track is added/removed, and provide the
mid that addresses the specific track in the m-line. Other methods and callbacks, instead, have simply a slightly different signature, often with just a single additional
mid attribute to reference the exact track it refers to: you may want to check the changes in the existing demos to learn more (they’ll be easy to spot, especially in demos that didn’t see other changes).
One thing that is still missing, though, is the ability to capture multiple streams via
janus.js, and/or add/remove them easily via some helper functions. While it’s one of the things we plan to fix next, if you want to do that in the meanwhile you’ll have to mess with transceivers manually yourself, and use the PeerConnection primitives to trigger updates manually as well.
What if I still want to use the previous version of Janus?
No problem there! We’re perfectly aware that the 0.x version of Janus is still the one the vast majority of developers, individuals and companies are relying upon for their existing implementations, and so we never even thought of pulling the rug. As such, we decided to keep the previous version of Janus very much alive, but in a separate branch, called 0.x. If you were pulling code from Janus using the
master branch before, you’ll want to change your scripts to refer to the
0.x branch instead; if you were using tagged versions, instead, then nothing you need to do, as tagged versions will work pretty much as they always did.
We also plan to keep the
0.x branch updated with respect to bug fixes and occasional enhacements. Please notice, though, that our main focus will from this day be the new code based on multistream, which means that if you’re excited about new features, that’s where they’ll definitely be done first, and that while we may backport them to the
0.x branch as well subsequently, it may not happen right away. As such, I definitely encourage you to start looking into the multistream version as soon as possible as well: if not to migrate right away, at the very least to familiarize with its slightly different APIs, and get a taste of all the enhancements.
Now that the multistream branch has been merged and we don’t have to worry about conflicting branches anymore, there’s a lot of things we plan to do next. Some are changes that will not impact Janus itself, but will make life easier for us; others will provide interesting new functionality instead.
Just as a short list on the top of my head, a possible list of upcoming changes may be the following:
- adding support for capturing multiple tracks in
janus.js(which is indeed sorely missing in that regard, at the moment);
- transparently bridge VideoRoom instances on different Janus servers, to expose remote VideoRoom publishers as if they were local ones (which will make SFU distribution across different Janus instances easier than how it works today, where we need to actively orchestrate VideoRoom and Streaming plugin resources, and track them externally);
- improve the existing (and partial) support we recently added for, e.g., RED and AV1-SVC;
- possibly add support for additional functionality like ulpfec (since it’s based on RED we might re-use part of the code we have already, and with FlexFEC still apparently so far away it might give us a temporary if not browser-specific option to add a bit of robustness to video streams);
- rename all of our
LOG_XXXdefines to something like
JANUS_XXX, to avoid conflicts with frameworks (e.g., rsyslog) that we may want to integrate in the future;
- move all the C code to a
srcfolder to clean up the hopelessly cluttered root folder of the repo;
- extend this new multistream support to our Janode SDK as well, where support is currently limited.
These are just a few things I happened to think about, but I’m sure many more things will happen in the Janus future now that we have a more flexible architecture in place. Heck, I’m sure most of the ideas will come from you, our beloved Janus users, in the first place!
That’s all, folks!
I hope you enjoyed this short summary of what was actually a very long journey and a HUGE effort. Hopefully it will encourage you to start tinkering with the new version of Janus for your application, and give you ideas to build even cooler things than you did before!