whisper model parameters
This project uses the OpenAI's open source whisper model for Automatic Speech Recognition (ASR) tasks.
Model information
The basic parameters and links of the model are as follows, note that the GPU VRAM must be greater than the required VRAM:
TIP
If you pursue recognition accuracy, it is recommended to use models with parameters small or above.
| Size | Parameters | Multilingual model | Required VRAM |
|---|---|---|---|
| tiny | 39 M | tiny | ~1 GB |
| base | 74 M | base | ~1 GB |
| small | 244 M | small | ~2 GB |
| medium | 769 M | medium | ~5 GB |
| large | 1550 M | large | ~10 GB |
Calculate VRAM requirements
Use Nvidia GPU to accelerate the rendering process of ffmpeg, each task requires approximately 180 MB of VRAM. The VRAM required for the whisper model is as shown in the table above.
Therefore, you can roughly calculate the required VRAM.
For example, using the small model:
- If using the
pipelinemode, since it runs in parallel, at least 180 + 2620 = 2800 MB of VRAM is required. - If using the
appendormergemode, at least 2620 MB of VRAM is required.
WARNING
Please ensure that the GPU VRAM is greater than the calculated result, otherwise the VRAM will be exhausted, resulting in RuntimeError: CUDA out of memory.
Change model method
- Please set the
Inference_Modelparameter in thebilive.tomlfile to the corresponding model size name, such astiny,base,small,medium,large. - Download the corresponding model file and place it in the
src/subtitle/modelsfolder. - Re-run the
./scan.shscript.