whisper model parameters
This project uses the OpenAI's open source whisper model for Automatic Speech Recognition (ASR) tasks.
Model information
The basic parameters and links of the model are as follows, note that the GPU VRAM must be greater than the required VRAM:
TIP
If you pursue recognition accuracy, it is recommended to use models with parameters small
or above.
Size | Parameters | Multilingual model | Required VRAM |
---|---|---|---|
tiny | 39 M | tiny | ~1 GB |
base | 74 M | base | ~1 GB |
small | 244 M | small | ~2 GB |
medium | 769 M | medium | ~5 GB |
large | 1550 M | large | ~10 GB |
Calculate VRAM requirements
Use Nvidia GPU to accelerate the rendering process of ffmpeg, each task requires approximately 180 MB of VRAM. The VRAM required for the whisper
model is as shown in the table above.
Therefore, you can roughly calculate the required VRAM.
For example, using the small
model:
- If using the
pipeline
mode, since it runs in parallel, at least 180 + 2620 = 2800 MB of VRAM is required. - If using the
append
ormerge
mode, at least 2620 MB of VRAM is required.
WARNING
Please ensure that the GPU VRAM is greater than the calculated result, otherwise the VRAM will be exhausted, resulting in RuntimeError: CUDA out of memory.
Change model method
- Please set the
Inference_Model
parameter in thebilive.toml
file to the corresponding model size name, such astiny
,base
,small
,medium
,large
. - Download the corresponding model file and place it in the
src/subtitle/models
folder. - Re-run the
./scan.sh
script.