Efficient tools to build your streaming infrastructure


AI-Powered Speech Recognition and Translation for Live Streaming

Use automated closed captioning and subtitling to improve live streaming experience


Live streaming delivers not only video and audio, but also closed captions and subtitles which are simply a transcribed text synchronized with the audio. Typically, those captions are generated at source and delivered to viewers via media servers. However, you may want to generate them on-the-fly using modern AI technologies. Speech-to-Text is now supported in Nimble Streamer for your live streaming scenarios.

Why is closed captioning important?

An estimated 1.5 billion people worldwide have some degree of hearing loss, making watching videos a challenging process for them. Adding captions improves these individuals’ quality of life and allows them to enjoy the same content experience as others. That’s why many countries require broadcasters and streaming companies to add closed captions. There are laws like Americans with Disabilities Act (ADA), European Accessibility Act (EAA) and more. So having closed captions is the core requirement for broadcasting and streaming in many cases.

Besides the humanitarian aspect, captions are just convenient. Many people simply prefer to read captions, such as when watching a video in a crowded environment where they cannot turn up the volume without disturbing others.

Example of closed captions in a movie.

Translation for live streaming subtitles

Another challenge closely related to captions is subtitling in different languages.

There are more than 7,000 languages, and only about 20% of the world’s population speaks English, which means that if you want your content to reach more audience than your home country, you must translate it to all languages where your viewers are located. This is why you need real-time translation to expand your viewership.

It’s a relatively known task for VOD content, but adding it to live streaming is quite a challenge.

AI-based speech-to-text and translation with Nimble Streamer

Nimble Streamer can help you solve these problems using Whisper AI automatic speech recognition (ASR) model. All you need to do is follow these simple steps.

  • Set up live input and output as you normally do for your streams.
  • Enable AI speech-to-text processing for the designated streams.
  • Use a player which can present WebVTT subtitles in your website or app.
  • Deliver your content via HLS as usual.

The output HLS stream will have all data necessary for closed captions display and your player will pick it up so your viewers would have a great viewing experience.

Closed captions of a live stream from Nimble Streamer as shown in THEOPlayer

Transcription pricing

The price model is simple and the starter price is very affordable and is based on WMSPanel pricing.

In order to get the transcription running in your WMSPanel account (basic price is 20 USD/m) with your Nimble Streamer instance (which is 50 USD/m), you need a Nimble Live Transcoder license for 50 USD and an Addenda package license for 50 USD.

This makes a starter price of 170 USD.

Speech-to-Text performance

Speech recognition is a heavy-duty task which requires a lot of computing resources.

At the moment our ASR implementation with Whisper base language model can only work with NVidia accelerators. Their GPUs can handle all processing needed for this extraordinary task.

We’ve run some tests and we can tell that the following can be achieved using Nimble Streamer engine. The following hardware can produce the following input streams into output HLS with closed captioning:
– NVidia GeForce RTX3070 can process 17 input streams.
– NVidia GeForce RTX4050 can process 10 input streams.

We’ll share more details as we run more tests on other hardware.

Start now

Here are the steps you need to follow in order to make closed captions and translation for your live streams.

  1. Create WMSPanel account and subscribe for it.
  2. Install Nimble Streamer on a Ubuntu 24.04 with NVidia graphic card.
  3. Create Live Transcoder license and register it on your Nimble instance via panel UI.
  4. Create Addenda license and register it on your Nimble instance via panel UI.
  5. Follow the setup instructions.
  6. After the setup is done, your designated output streams will have closed captions and/or subtitles in them.

That’s it, you can now use the power of AI to improve your viewers’ experience.

Let us know if you have any feedback or issues when using our recognition features.

Nimble Streamer uses Whisper.cpp library and model available via MIT license.