Wednesday, March 12, 2025

Model Exploration on HuggingFace Platform for Automatic Speech Recognition

To See All Articles About Technology: Index of Lessons in Technology

This model is too heavy for my setup.
# "distil-whisper/distil-small.en" Size: +310MB
I need a model that's smaller in size.

If you need a smaller speech-to-text model, you can try the following:

1. Use openai/whisper-tiny (Smallest Whisper Model)

  • Size: ~155MB
  • Pros: Smallest Whisper model, decent accuracy
  • Cons: Limited performance on noisy speech

Code to Load Locally

python
from transformers import pipeline # Define the local model directory local_model_path = "./models/whisper-tiny" # Load the smallest Whisper ASR model asr = pipeline("automatic-speech-recognition", model=local_model_path)

Download the Model Locally (If Not Already)

python
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor model_id = "openai/whisper-tiny" save_path = "./models/whisper-tiny" model = AutoModelForSpeechSeq2Seq.from_pretrained(model_id) model.save_pretrained(save_path) processor = AutoProcessor.from_pretrained(model_id) processor.save_pretrained(save_path)

2. Alternative: facebook/wav2vec2-base-960h (+350MB)

  • Size: +350MB
  • Pros: Lighter than Whisper, works well on short speech
  • Cons: Doesn't support multilingual speech

Code for wav2vec2-base

python
asr = pipeline("automatic-speech-recognition", model="facebook/wav2vec2-base-960h")

Would you like an even lighter model or real-time transcription support? 🚀

Tags: Large Language Models,Technology,

No comments:

Post a Comment