whisper model parameters

This project uses the OpenAI's open source whisper model for Automatic Speech Recognition (ASR) tasks.

Model information

The basic parameters and links of the model are as follows, note that the GPU VRAM must be greater than the required VRAM:

TIP

If you pursue recognition accuracy, it is recommended to use models with parameters small or above.

Size	Parameters	Multilingual model	Required VRAM
tiny	39 M	`tiny`	~1 GB
base	74 M	`base`	~1 GB
small	244 M	`small`	~2 GB
medium	769 M	`medium`	~5 GB
large	1550 M	`large`	~10 GB

Calculate VRAM requirements

Use Nvidia GPU to accelerate the rendering process of ffmpeg, each task requires approximately 180 MB of VRAM. The VRAM required for the whisper model is as shown in the table above.

Therefore, you can roughly calculate the required VRAM.

For example, using the small model:

If using the pipeline mode, since it runs in parallel, at least 180 + 2620 = 2800 MB of VRAM is required.
If using the append or merge mode, at least 2620 MB of VRAM is required.

WARNING

Please ensure that the GPU VRAM is greater than the calculated result, otherwise the VRAM will be exhausted, resulting in RuntimeError: CUDA out of memory.

Change model method

Please set the Inference_Model parameter in the bilive.toml file to the corresponding model size name, such as tiny, base, small, medium, large.
Download the corresponding model file and place it in the src/subtitle/models folder.
Re-run the ./scan.sh script.

whisper model parameters ​

Model information ​

Calculate VRAM requirements ​

Change model method ​

whisper model parameters

Model information

Calculate VRAM requirements

Change model method