Automated voice recognition setup in Nimble Streamer – Softvelum: efficient tools to build your streaming networks

The Nimble Streamer team has recently introduced speech recognition and translation for live streaming powered by Whisper.cpp AI engine. It’s an important feature set that allows live content to be more accessible for any audience and improve overall user experience.

In this article, we’ll describe how to enable and set up automatic voice recognition and translation in Nimble Streamer. The result of transcription is delivered as WebVTT subtitles in HLS output stream.

Notice that Nimble Streamer has full support for all types of subtitles and closed captions.

Prerequisites

In order to enable and set up transcribing in Nimble Streamer, you need to have the following.

WMSPanel account with active subscription.
Nimble Streamer is installed on an Ubuntu 24.04 and registered in WMSPanel. Other OSes and versions will be supported later.
The server with Nimble Streamer must have a NVIDIA graphic card with proper drivers installed.
Live Transcoder is installed and its license is activated and registered on your Nimble Streamer instance. You can do it easily via panel UI.
Addenda license is activated and registered on your Nimble Streamer instance.

Installation

In order to add Whisper.cpp transcription engine and model, run the following command and restart Nimble instance.

sudo apt install nimble-transcriber
sudo service nimble restart

Enable recognition for live streams

Once your Nimble instance has the speech recognition package, you may enable transcription for that server in general as well as for any particular live stream.

To enable transcription, go to Nimble Streamer top menu, click on Live Streams Settings and select the server where you want to enable transcription.

Select particular output application where you’d like to enable transcription, or create a new app setting.

Check Generate WebVTT for audio checkbox and save settings.

If you’d like to enable transcription on the server level, open Global tab, enable the same option and save settings.

You may also enable the generation of CEA-708 subtitles, please read this article for more details.

After you apply settings, you need to re-start the input stream. Once the re-started input is picked up by Nimble instance, the output HLS stream will have WebVTT subtitles carrying the transcribed closed captions.

Enable translation

Whisper.cpp library allows not just transcribing the stream but also translate it to another language. The setup related to server and live app settings are the same as described above. However, you need to add a new parameter in nimble.conf file:

whisper_language = <language code>

A two-letter language code is used, like “en”, “es”, “fr” etc. like:

whisper_language = en

The language that you define in this parameter will be used for transcription and further translation of all streams that have voice recognition enabled. So if your source is in English and you set language to “en” then it will only run the transcription process. If that parameter is different – e.g. “es” – then transcription will be followed by translation. This way, all output streams’ subtitles will be in respective language, like Spanish.

At the moment, only one language can be set on the server level. Nimble supports all languages supported by Whisper.

Also, don’t forget to restart the Nimble instance to make the parameter work:

sudo service nimble restart

Additional Nimble config file options

Besides the language parameter, you may also define the following settings in nimble.conf file.

transcriber_stream_limit – defines the maximum number of input streams that Nimble will process with transcription. By default, it’s 10.

whisper_model_path – if you use some other model other than basic, you may point Nimble instance to use it instead.

Please restart Nimble instance once you make any changes to the config.

Visit nimble.conf file page for more details about other parameters.

Speechmatics integration

Nimble Streamer offers seamless integration with Speechmatics service for live voice recognition and subtitle generation. Speechmatics integration allows delivering accurate subtitles in real time and leverage cloud-based STT to ease load on your infrastructure.

This empowers broadcasters and streaming providers to enhance accessibility and audience engagement across borders.

Full setup instructions are here.

Please let us know if you have any questions, issues or suggestion for our voice recognition feature set.