Skip to content

whisper model parameters

This project uses the OpenAI's open source whisper model for Automatic Speech Recognition (ASR) tasks.

Model information

The basic parameters and links of the model are as follows, note that the GPU VRAM must be greater than the required VRAM:

TIP

If you pursue recognition accuracy, it is recommended to use models with parameters small or above.

SizeParametersMultilingual modelRequired VRAM
tiny39 Mtiny~1 GB
base74 Mbase~1 GB
small244 Msmall~2 GB
medium769 Mmedium~5 GB
large1550 Mlarge~10 GB

Calculate VRAM requirements

Use Nvidia GPU to accelerate the rendering process of ffmpeg, each task requires approximately 180 MB of VRAM. The VRAM required for the whisper model is as shown in the table above.

Therefore, you can roughly calculate the required VRAM.

For example, using the small model:

  • If using the pipeline mode, since it runs in parallel, at least 180 + 2620 = 2800 MB of VRAM is required.
  • If using the append or merge mode, at least 2620 MB of VRAM is required.

WARNING

Please ensure that the GPU VRAM is greater than the calculated result, otherwise the VRAM will be exhausted, resulting in RuntimeError: CUDA out of memory.

Change model method

  1. Please set the Inference_Model parameter in the bilive.toml file to the corresponding model size name, such as tiny, base, small, medium, large.
  2. Download the corresponding model file and place it in the src/subtitle/models folder.
  3. Re-run the ./scan.sh script.