Speech-to-text (STT) technology has become a game-changer for accessibility and productivity. However, many STT solutions require an internet connection, raising concerns about privacy and latency. In this blog post, we’ll build an offline voice transcription app using OpenAI’s Whisper model and Gradio for a user-friendly interface.
Why Whisper?
OpenAI’s Whisper is a powerful ASR (Automatic Speech Recognition) model trained on diverse datasets. However, its large versions require significant compute resources. To ensure lightweight performance, we’ll use Whisper-Tiny, the smallest variant (~155MB).
Setting Up the Project
1. Install Dependencies
First, install the required libraries:
2. Load the Model Locally
Instead of downloading Whisper every time, we’ll load it from a local directory:
👉 Tip: If you haven’t downloaded the model yet, use:
3. Process Audio for Transcription
Audio recordings may have multiple channels and varying sample rates. We'll normalize the audio and resample it to 16kHz for better performance:
4. Create a User Interface with Gradio
Gradio allows us to quickly build an interactive UI for testing the model. The user can record audio or upload a file, and the app will display the transcribed text.
Running the App
Simply run the script:
Gradio will generate a local web interface where you can record and transcribe speech in real time.
Conclusion
With OpenAI Whisper-Tiny and Gradio, we’ve built an offline speech-to-text app that is:
✅ Fast & lightweight (~72MB model)
✅ Works without the internet
✅ Easy to use with a web UI
If you need more accurate transcription, you can explore larger Whisper models or try distil-whisper for efficiency.
🚀 Ready to build your own speech recognition app? Give it a try!
No comments:
Post a Comment