Efficient tools to build your streaming infrastructure


AI-Powered Speech Recognition and Translation for Live Streaming

Use automated closed captioning and subtitling to improve live streaming experience


Live streaming delivers not only video and audio, but also closed captions and subtitles which are simply a transcribed text synchronized with the audio. Typically, those captions are generated at source and delivered to viewers via media servers. However, you may want to generate them on-the-fly using modern AI technologies. Speech-to-Text is now supported in Nimble Streamer for your live streaming scenarios.

Voice recognition engine

Why is closed captioning important?

An estimated 1.5 billion people worldwide have some degree of hearing loss, making watching videos a challenging process for them. Adding captions improves these individuals’ quality of life and allows them to enjoy the same content experience as others. That’s why many countries require broadcasters and streaming companies to add closed captions. There are laws like Americans with Disabilities Act (ADA), European Accessibility Act (EAA) and more. So having closed captions is the core requirement for broadcasting and streaming in many cases.

Besides the humanitarian aspect, captions are just convenient. Many people simply prefer to read captions, such as when watching a video in a crowded environment where they cannot turn up the volume without disturbing others.

Example of closed captions in a movie.

Translation for live streaming subtitles

Another challenge closely related to captions is subtitling in different languages.

There are more than 7,000 languages, and only about 20% of the world’s population speaks English, which means that if you want your content to reach more audience than your home country, you must translate it to all languages where your viewers are located. This is why you need real-time translation to expand your viewership.

It’s a relatively known task for VOD content, but adding it to live streaming is quite a challenge.

AI-based speech-to-text and translation with Nimble Streamer

Nimble Streamer can help you solve these problems using several approaches:

  • Use Whisper AI automatic speech recognition (ASR) model with your Nimble Streamer instance
  • Integrate Speechmatics STT service into your workflow for transcription and translation.
  • Integrate KWIKmotion AI Capture service into your workflow for transcription and translation.

All you need to do is follow these simple steps.

  • Set up live input and output as you normally do for your streams.
  • Enable AI speech-to-text processing for the designated streams with respective ASR engine.
  • If you set up so, the automatic translation is also performed.
  • Use a player which can present WebVTT subtitles in your website or app.
  • Deliver your content via HLS as usual.

The output HLS stream will have all data necessary for closed captions display and your player will pick it up so your viewers would have a great viewing experience.

Closed captions of a live stream from Nimble Streamer
processed by Whisper as shown in THEOPlayer

Start now

Here are the steps you need to follow in order to make closed captions and translation for your live streams.

  1. Create WMSPanel account and subscribe for it.
  2. Install Nimble Streamer on a Ubuntu 24.04 with NVidia graphic card.
  3. Create Live Transcoder license and register it on your Nimble instance via panel UI.
  4. Create Addenda license and register it on your Nimble instance via panel UI.
  5. Follow setup instructions for your ASR engine of choice:
  6. Apply these instructions to enable CEA-708 subtitles if needed.
  7. After the setup is done, your designated output streams will have closed captions and/or subtitles in them.

That’s it, you can now use the power of AI to improve your viewers’ experience.

Transcription pricing

The price model is simple and the starter price is very affordable and is based on WMSPanel pricing.

In order to get the transcription running in your WMSPanel account (basic price is 20 USD/m) with your Nimble Streamer instance (50 USD/m), you need a Live Transcoder license (50 USD/m) and an Addenda package license (50 USD/m).

If you use Speechmatics or KWIKmotion services, their respective pricing applies to the transcription process, you need to refer to these service providers for exact quotes. Softvelum is not affiliated with neither of those companies.

Let us know if you have any feedback or issues when using our recognition features.

Nimble Streamer uses Whisper.cpp library and model available via MIT license.