Saturday, March 8, 2025

Hour 1 - Beginning development

To See All Articles About Technology: Index of Lessons in Technology

Step 1: Setup teacher (Ashish) - student (Keshav) accounts on PythonAnywhere

To do this step, follow instructions from this page (pythonanywhere.com/pages/Education).

Step 2: I want to check if this package is already installed: ffmpeg

You can check whether ffmpeg is installed on your system by running a command like ffmpeg -version in your terminal or by using a small Python script to do the same.


Method 1: Using the Terminal

Open your terminal (or Command Prompt on Windows) and run:

sh
ffmpeg -version
  • If installed: You’ll see version information and details about the ffmpeg build.
  • If not installed: You might see an error message like command not found (Linux/Mac) or a similar message on Windows.

Method 2: Using Python

You can use Python’s subprocess module to check if ffmpeg is available:

python
import subprocess def check_ffmpeg_installed(): try: result = subprocess.run( ["ffmpeg", "-version"], stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True ) if result.returncode == 0: print("ffmpeg is installed!") print(result.stdout) else: print("ffmpeg appears to be installed but returned an error.") except FileNotFoundError: print("ffmpeg is not installed or not found in the system PATH.") check_ffmpeg_installed()
  • Explanation:
    • The script tries to run ffmpeg -version and captures its output.
    • If ffmpeg is not found, a FileNotFoundError is raised.
    • If the command runs successfully (return code 0), it prints the version info.

Choose the method that best fits your needs. Would you like more details on using or installing ffmpeg?

12:57 ~ $ ffmpeg -version
ffmpeg version 4.2.7-0ubuntu0.1 Copyright (c) 2000-2022 the FFmpeg developers
built with gcc 9 (Ubuntu 9.4.0-1ubuntu1~20.04.1)
configuration: --prefix=/usr --extra-version=0ubuntu0.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-avresample --disable-filter=resample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librsvg --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-nvenc --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared
libavutil      56. 31.100 / 56. 31.100
libavcodec     58. 54.100 / 58. 54.100
libavformat    58. 29.100 / 58. 29.100
libavdevice    58.  8.100 / 58.  8.100
libavfilter     7. 57.100 /  7. 57.100
libavresample   4.  0.  0 /  4.  0.  0
libswscale      5.  5.100 /  5.  5.100
libswresample   3.  5.100 /  3.  5.100
libpostproc    55.  5.100 / 55.  5.100

Step 3: Check for "flask"

12:58 ~ $ pip show flask Name: Flask Version: 2.1.2 Summary: A simple framework for building complex web applications. Home-page: https://palletsprojects.com/p/flask Author: Armin Ronacher Author-email: armin.ronacher@active-4.com License: BSD-3-Clause Location: /usr/local/lib/python3.10/site-packages Requires: click, itsdangerous, Jinja2, Werkzeug Required-by: dash, Flask-Admin, Flask-Babel, Flask-Bcrypt, Flask-Bootstrap, Flask-Compress, Flask-HTTPAuth, Flask-Login, Flask-RESTful, Flask-SQLAlchemy, Flask-SSLify, Flask-WTF

Step 4: What is the difference between whisper openai-whisper and what are there uses?

Both names generally refer to the same underlying technology—a state‐of‐the‐art automatic speech recognition (ASR) model developed by OpenAI—but they’re used in slightly different contexts:


OpenAI Whisper

  • Official Implementation:
    OpenAI Whisper is the original model released by OpenAI. It’s available on GitHub (in the openai/whisper repository) and comes in several sizes (tiny, base, small, medium, large) to balance accuracy and speed.

  • Primary Use Cases:

    • High-Quality Transcription: Converts spoken language into text.
    • Multilingual Recognition: Supports multiple languages and can also identify the language spoken.
    • Subtitling and Captioning: Can be used to generate subtitles or closed captions.
    • Voice-Activated Applications: Powers various voice assistant and dictation systems.
  • How to Use:
    You typically install it via pip as the package openai-whisper:

    bash
    pip install openai-whisper

    Then you can load and run the model directly in Python using its provided API.


"whisper" or "openai-whisper" (as a Package)

  • Naming Convention:
    When you see references to “whisper” or “openai-whisper” (especially on PyPI), they are referring to the official package that implements OpenAI’s Whisper model.

    • The PyPI package is named openai-whisper, which is simply the packaged form of the original OpenAI Whisper implementation.
  • Usage Context:

    • If you’re building an application (for example, a Flask endpoint or an offline speech-to-text app), you would install the package using the PyPI name and then use the provided API to load one of the available model sizes and perform transcription.
    • The package supports both CPU and GPU inference, making it adaptable for different deployment scenarios—from high-end servers to on-device applications after appropriate optimizations (such as quantization).

In Summary

  • They’re Essentially the Same:
    “OpenAI Whisper” is the name of the model and the technology, while openai-whisper is the name you’ll use to install and work with the official implementation in Python.

  • Their Uses Are Aligned:
    Both are designed for accurate and multilingual speech recognition, suitable for applications like transcription, voice assistants, subtitles, and more.

If you plan to build applications that leverage this technology (whether via a Flask endpoint, desktop app, or mobile integration), you’ll be using the official OpenAI Whisper model via the openai-whisper package.

Step 5: Setup "whisper"

13:17 ~ $ cat test_whisper.py 
import whisper
model = whisper.load_model("small")
print(model)
13:17 ~ $ python test_whisper.py 
Traceback (most recent call last):
  File "/home/keshavpawar137/test_whisper.py", line 1, in 
    import whisper
ModuleNotFoundError: No module named 'whisper'
13:17 ~ $ pip install whisper
Defaulting to user installation because normal site-packages is not writeable
Looking in links: /usr/share/pip-wheels
Collecting whisper
  Downloading whisper-1.1.10.tar.gz (42 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 42.8/42.8 kB 618.8 kB/s eta 0:00:00
  Preparing metadata (setup.py) ... done
Requirement already satisfied: six in /usr/local/lib/python3.10/site-packages (from whisper) (1.16.0)
Building wheels for collected packages: whisper
  Building wheel for whisper (setup.py) ... done
  Created wheel for whisper: filename=whisper-1.1.10-py3-none-any.whl size=41138 sha256=f7c359cb3ced55eba4775c1d331167df4286838de0288cf0bce194d38bc48da1
  Stored in directory: /home/keshavpawar137/.cache/pip/wheels/aa/7c/1d/015619716e2facae6631312503baf3c3220e6a9a3508cb14b6
Successfully built whisper
Installing collected packages: whisper
Successfully installed whisper-1.1.10



Get the current space usage in my account in PythonAnywhere

13:21 ~ $ pwd
/home/keshavpawar137

13:19 ~ $ du -sh .
668K

Step 6: Uninstall "whisper" (if you installed that) and install "openai-whisper"

With "whisper:" 13:22 ~ $ python test_whisper.py Traceback (most recent call last): File "/home/keshavpawar137/test_whisper.py", line 2, in [module] model = whisper.load_model("small") AttributeError: module 'whisper' has no attribute 'load_model'

This error usually indicates that Python isn’t loading the expected Whisper module. Here are a few common reasons and solutions:


1. Naming Conflicts

  • Local File Name Conflict:
    If you have a file named whisper.py in your working directory, Python will import that instead of the official package.
    Solution: Rename your file (and remove any associated whisper.pyc) to avoid conflicts.

  • Module Shadowing:
    Make sure there’s no directory or module named “whisper” in your project that could be interfering with the import.


2. Incorrect Package Installation

  • Verify the Correct Package:
    The official package is called openai-whisper.
    Solution:
    bash
    pip uninstall whisper pip install openai-whisper
    Then, ensure you’re importing it as:
    python
    import whisper model = whisper.load_model("small")

3. Check Module Contents

You can verify which attributes the module provides with:

python
import whisper print(dir(whisper))

If you don’t see load_model in the output, it means the wrong module is being loaded.


By ensuring you’re using the official openai-whisper package and avoiding naming conflicts, the load_model function should be available.

13:22 ~ $ pip uninstall whisper
Found existing installation: whisper 1.1.10
Uninstalling whisper-1.1.10:
    Would remove:
    /home/keshavpawar137/.local/bin/find-corrupt-whisper-files.py
    /home/keshavpawar137/.local/bin/rrd2whisper.py
    /home/keshavpawar137/.local/bin/update-storage-times.py
    /home/keshavpawar137/.local/bin/whisper-auto-resize.py
    /home/keshavpawar137/.local/bin/whisper-auto-update.py
    /home/keshavpawar137/.local/bin/whisper-create.py
    /home/keshavpawar137/.local/bin/whisper-diff.py
    /home/keshavpawar137/.local/bin/whisper-dump.py
    /home/keshavpawar137/.local/bin/whisper-fetch.py
    /home/keshavpawar137/.local/bin/whisper-fill.py
    /home/keshavpawar137/.local/bin/whisper-info.py
    /home/keshavpawar137/.local/bin/whisper-merge.py
    /home/keshavpawar137/.local/bin/whisper-resize.py
    /home/keshavpawar137/.local/bin/whisper-set-aggregation-method.py
    /home/keshavpawar137/.local/bin/whisper-set-xfilesfactor.py
    /home/keshavpawar137/.local/bin/whisper-update.py
    /home/keshavpawar137/.local/lib/python3.10/site-packages/whisper-1.1.10.dist-info/*
    /home/keshavpawar137/.local/lib/python3.10/site-packages/whisper.py
Proceed (Y/n)? y
    Successfully uninstalled whisper-1.1.10


13:28 ~ $ pip install openai-whisper
Defaulting to user installation because normal site-packages is not writeable
Looking in links: /usr/share/pip-wheels
Collecting openai-whisper
    Downloading openai-whisper-20240930.tar.gz (800 kB)
        ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 800.5/800.5 kB 14.1 MB/s eta 0:00:00
    Installing build dependencies ... done
    Getting requirements to build wheel ... done
    Preparing metadata (pyproject.toml) ... done
Requirement already satisfied: numpy in /usr/local/lib/python3.10/site-packages (from openai-whisper) (1.21.6)
Collecting triton>=2.0.0
    Downloading triton-3.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (253.1 MB)
        ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 253.1/253.1 MB 5.0 MB/s eta 0:00:00
ERROR: Could not install packages due to an OSError: [Errno 122] Disk quota exceeded

13:29 ~ $ du -sh .
83M     .
13:30 ~ $ 

Requirements:
pip install flask openai-whisper soundfile

conda-forge / flask 3.1.0
A simple framework for building complex web applications.

piiq / openai-whisper 20230308
Robust Speech Recognition via Large-Scale Weak Supervisionconda	osx-arm64

Sheepless / openai-whisper 20231117
Robust Speech Recognition via Large-Scale Weak Supervisionconda	noarch



Moving away from "openai-whisper" and using Whisper-Tiny via HuggingFace:

Using OpenAI-Whisper-Tiny via HuggingFace for Automatic Speech Recognition app (Research)
16:34 ~ $ pip install transformers
Defaulting to user installation because normal site-packages is not writeable
Looking in links: /usr/share/pip-wheels
Collecting transformers
    Downloading transformers-4.49.0-py3-none-any.whl (10.0 MB)
        ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 10.0/10.0 MB 49.6 MB/s eta 0:00:00
Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.10/site-packages (from transformers) (2021.11.10)
Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.10/site-packages (from transformers) (1.21.6)
Collecting tokenizers<0.22,>=0.21
    Downloading tokenizers-0.21.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.0 MB)
        ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.0/3.0 MB 42.7 MB/s eta 0:00:00
Collecting safetensors>=0.4.1
    Downloading safetensors-0.5.3-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (471 kB)
        ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 471.6/471.6 kB 8.0 MB/s eta 0:00:00
Requirement already satisfied: requests in /usr/local/lib/python3.10/site-packages (from transformers) (2.28.1)
Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.10/site-packages (from transformers) (6.0)
Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/site-packages (from transformers) (21.3)
Requirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.10/site-packages (from transformers) (4.62.3)
Collecting huggingface-hub<1.0,>=0.26.0
    Downloading huggingface_hub-0.29.2-py3-none-any.whl (468 kB)
        ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 468.1/468.1 kB 10.7 MB/s eta 0:00:00
Requirement already satisfied: filelock in /usr/local/lib/python3.10/site-packages (from transformers) (3.4.2)
Collecting fsspec>=2023.5.0
    Downloading fsspec-2025.3.0-py3-none-any.whl (193 kB)
        ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 193.6/193.6 kB 4.3 MB/s eta 0:00:00
Requirement already satisfied: typing-extensions>=3.7.4.3 in /usr/local/lib/python3.10/site-packages (from huggingface-hub<1.0,>=0.26.0->transformers) (3.10.0.2)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /usr/local/lib/python3.10/site-packages (from packaging>=20.0->transformers) (2.4.7)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/site-packages (from requests->transformers) (3.3)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.10/site-packages (from requests->transformers) (1.26.9)
Requirement already satisfied: charset-normalizer<3,>=2 in /usr/local/lib/python3.10/site-packages (from requests->transformers) (2.1.0)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/site-packages (from requests->transformers) (2022.6.15)
Installing collected packages: safetensors, fsspec, huggingface-hub, tokenizers, transformers
Successfully installed fsspec-2025.3.0 huggingface-hub-0.29.2 safetensors-0.5.3 tokenizers-0.21.0 transformers-4.49.0


We are going to use TensorFlow as backend for HuggingFace implementation as that comes with PythonAnywhere cloud by default.

16:39 ~ $ pip show tensorflow
Name: tensorflow
Version: 2.9.0
Summary: TensorFlow is an open source machine learning framework for everyone.
Home-page: https://www.tensorflow.org/
Author: Google Inc.
Author-email: packages@tensorflow.org
License: Apache 2.0
Location: /usr/local/lib/python3.10/site-packages
Requires: absl-py, astunparse, flatbuffers, gast, google-pasta, grpcio, h5py, keras, keras-preprocessing, libclang, numpy, opt-einsum, packaging, protobuf, setuptools, six, tensorboard, tensorflow-estimator, tensorflow-io-gcs-filesystem, termcolor, typing-extensions, wrapt
Required-by:




16:43 ~ $ pip install librosa
Defaulting to user installation because normal site-packages is not writeable
Looking in links: /usr/share/pip-wheels
Collecting librosa
  Downloading librosa-0.10.2.post1-py3-none-any.whl (260 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 260.1/260.1 kB 6.5 MB/s eta 0:00:00
Requirement already satisfied: scikit-learn>=0.20.0 in /usr/local/lib/python3.10/site-packages (from librosa) (1.0.2)
Collecting audioread>=2.1.9
  Downloading audioread-3.0.1-py3-none-any.whl (23 kB)
Collecting pooch>=1.1
  Downloading pooch-1.8.2-py3-none-any.whl (64 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 64.6/64.6 kB 1.3 MB/s eta 0:00:00
Collecting typing-extensions>=4.1.1
  Downloading typing_extensions-4.12.2-py3-none-any.whl (37 kB)
Requirement already satisfied: numpy!=1.22.0,!=1.22.1,!=1.22.2,>=1.20.3 in /usr/local/lib/python3.10/site-packages (from librosa) (1.21.6)
Collecting msgpack>=1.0
  Downloading msgpack-1.1.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (378 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 378.0/378.0 kB 9.6 MB/s eta 0:00:00
Requirement already satisfied: numba>=0.51.0 in /usr/local/lib/python3.10/site-packages (from librosa) (0.55.1)
Requirement already satisfied: decorator>=4.3.0 in /usr/local/lib/python3.10/site-packages (from librosa) (5.1.1)
Collecting lazy-loader>=0.1
  Downloading lazy_loader-0.4-py3-none-any.whl (12 kB)
Collecting soundfile>=0.12.1
  Downloading soundfile-0.13.1-py2.py3-none-manylinux_2_28_x86_64.whl (1.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.3/1.3 MB 25.1 MB/s eta 0:00:00
Collecting soxr>=0.3.2
  Downloading soxr-0.5.0.post1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (252 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 252.8/252.8 kB 4.7 MB/s eta 0:00:00
Requirement already satisfied: joblib>=0.14 in /usr/local/lib/python3.10/site-packages (from librosa) (1.1.0)
Requirement already satisfied: scipy>=1.2.0 in /usr/local/lib/python3.10/site-packages (from librosa) (1.7.3)
Requirement already satisfied: packaging in /usr/local/lib/python3.10/site-packages (from lazy-loader>=0.1->librosa) (21.3)
Requirement already satisfied: llvmlite<0.39,>=0.38.0rc1 in /usr/local/lib/python3.10/site-packages (from numba>=0.51.0->librosa) (0.38.0)
Requirement already satisfied: setuptools in /usr/local/lib/python3.10/site-packages (from numba>=0.51.0->librosa) (60.2.0)
Requirement already satisfied: platformdirs>=2.5.0 in /usr/local/lib/python3.10/site-packages (from pooch>=1.1->librosa) (2.5.2)
Requirement already satisfied: requests>=2.19.0 in /usr/local/lib/python3.10/site-packages (from pooch>=1.1->librosa) (2.28.1)
Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.10/site-packages (from scikit-learn>=0.20.0->librosa) (3.0.0)
Requirement already satisfied: cffi>=1.0 in /usr/local/lib/python3.10/site-packages (from soundfile>=0.12.1->librosa) (1.15.1)
Requirement already satisfied: pycparser in /usr/local/lib/python3.10/site-packages (from cffi>=1.0->soundfile>=0.12.1->librosa) (2.21)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /usr/local/lib/python3.10/site-packages (from packaging->lazy-loader>=0.1->librosa) (2.4.7)
Requirement already satisfied: charset-normalizer<3,>=2 in /usr/local/lib/python3.10/site-packages (from requests>=2.19.0->pooch>=1.1->librosa) (2.1.0)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.10/site-packages (from requests>=2.19.0->pooch>=1.1->librosa) (1.26.9)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/site-packages (from requests>=2.19.0->pooch>=1.1->librosa) (2022.6.15)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/site-packages (from requests>=2.19.0->pooch>=1.1->librosa) (3.3)
Installing collected packages: typing-extensions, soxr, msgpack, audioread, soundfile, pooch, lazy-loader, librosa
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
arviz 0.11.4 requires typing-extensions<4,>=3.7.4.3, but you have typing-extensions 4.12.2 which is incompatible.
Successfully installed audioread-3.0.1 lazy-loader-0.4 librosa-0.10.2.post1 msgpack-1.1.0 pooch-1.8.2 soundfile-0.13.1 soxr-0.5.0.post1 typing-extensions-4.12.2
16:44 ~ $ du -sh .
209M    .
16:44 ~ $ 

(hf_202412) ashish@ashish-ThinkPad-T440s:~/Desktop/Using OpenAI-Whisper-Tiny via HuggingFace for Automatic Speech Recognition app (Research)$ conda list librosa                                                                                                                                             
# packages in environment at /home/ashish/anaconda3/envs/hf_202412:                                                                                      
#                                                                                                                                                        
# Name                    Version                   Build  Channel                                                                                       
librosa                   0.10.2.post1       pyhd8ed1ab_1    conda-forge  


--- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- 

16:56 ~ $ pip install torch
Defaulting to user installation because normal site-packages is not writeable
Looking in links: /usr/share/pip-wheels
Requirement already satisfied: torch in /usr/local/lib/python3.10/site-packages (1.11.0+cpu)
Requirement already satisfied: typing-extensions in ./.local/lib/python3.10/site-packages (from torch) (4.12.2)
16:59 ~ $ 
16:59 ~ $ pip show torch
Name: torch
Version: 1.11.0+cpu
Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home-page: https://pytorch.org/
Author: PyTorch Team
Author-email: packages@pytorch.org
License: BSD-3
Location: /usr/local/lib/python3.10/site-packages
Requires: typing-extensions
Required-by: torchaudio, torchvision
17:00 ~ $ 



17:22 ~ $ ls
README.txt  app_pt.py  models--openai--whisper-tiny.zip
17:22 ~ $ unzip models--openai--whisper-tiny.zip 
Archive:  models--openai--whisper-tiny.zip
   creating: models--openai--whisper-tiny/
   creating: models--openai--whisper-tiny/blobs/
  inflating: models--openai--whisper-tiny/blobs/c2048dfa9fd94a052e62e908d2c4dfb18534b4d2  
  inflating: models--openai--whisper-tiny/blobs/d13b786c04765fb1a06492b53587752cd67665ea  
  inflating: models--openai--whisper-tiny/blobs/d7016e21da8776c8a9d577d0f559600f09a240eb  
  inflating: models--openai--whisper-tiny/blobs/1e95340ff836fad1b5932e800fb7b8c5e6d78a74  
  inflating: models--openai--whisper-tiny/blobs/6038932a2a1f09a66991b1c2adae0d14066fa29e  
  inflating: models--openai--whisper-tiny/blobs/dd6ae819ad738ac1a546e9f9282ef325c33b9ea0  
  inflating: models--openai--whisper-tiny/blobs/e3d256c988462aa153dcabe2aa38b8e9b436c06f  
  inflating: models--openai--whisper-tiny/blobs/bf69932dca4b3719b59fdd8f6cc1978109509f6c  
  inflating: models--openai--whisper-tiny/blobs/417aa9de49a132dd3eb6a56d3be2718b15f08917  
  inflating: models--openai--whisper-tiny/blobs/7ebd0e69e78190ffe1438491fa05cc1f5c1aa3a4c4db3bc1723adbb551ea2395  
  inflating: models--openai--whisper-tiny/blobs/4b26dd66b8f7bca37d851d259fdc118315cacc62  
   creating: models--openai--whisper-tiny/snapshots/
   creating: models--openai--whisper-tiny/snapshots/169d4a4341b33bc18d8881c4b69c2e104e1cc0af/
    linking: models--openai--whisper-tiny/snapshots/169d4a4341b33bc18d8881c4b69c2e104e1cc0af/preprocessor_config.json  -> ../../blobs/c2048dfa9fd94a052e62e908d2c4dfb18534b4d2 
    linking: models--openai--whisper-tiny/snapshots/169d4a4341b33bc18d8881c4b69c2e104e1cc0af/tokenizer_config.json  -> ../../blobs/d13b786c04765fb1a06492b53587752cd67665ea 
    linking: models--openai--whisper-tiny/snapshots/169d4a4341b33bc18d8881c4b69c2e104e1cc0af/vocab.json  -> ../../blobs/d7016e21da8776c8a9d577d0f559600f09a240eb 
    linking: models--openai--whisper-tiny/snapshots/169d4a4341b33bc18d8881c4b69c2e104e1cc0af/tokenizer.json  -> ../../blobs/1e95340ff836fad1b5932e800fb7b8c5e6d78a74 
    linking: models--openai--whisper-tiny/snapshots/169d4a4341b33bc18d8881c4b69c2e104e1cc0af/merges.txt  -> ../../blobs/6038932a2a1f09a66991b1c2adae0d14066fa29e 
    linking: models--openai--whisper-tiny/snapshots/169d4a4341b33bc18d8881c4b69c2e104e1cc0af/normalizer.json  -> ../../blobs/dd6ae819ad738ac1a546e9f9282ef325c33b9ea0 
    linking: models--openai--whisper-tiny/snapshots/169d4a4341b33bc18d8881c4b69c2e104e1cc0af/added_tokens.json  -> ../../blobs/e3d256c988462aa153dcabe2aa38b8e9b436c06f 
    linking: models--openai--whisper-tiny/snapshots/169d4a4341b33bc18d8881c4b69c2e104e1cc0af/special_tokens_map.json  -> ../../blobs/bf69932dca4b3719b59fdd8f6cc1978109509f6c 
    linking: models--openai--whisper-tiny/snapshots/169d4a4341b33bc18d8881c4b69c2e104e1cc0af/config.json  -> ../../blobs/417aa9de49a132dd3eb6a56d3be2718b15f08917 
    linking: models--openai--whisper-tiny/snapshots/169d4a4341b33bc18d8881c4b69c2e104e1cc0af/model.safetensors  -> ../../blobs/7ebd0e69e78190ffe1438491fa05cc1f5c1aa3a4c4db3bc1723adbb551ea2395 
    linking: models--openai--whisper-tiny/snapshots/169d4a4341b33bc18d8881c4b69c2e104e1cc0af/generation_config.json  -> ../../blobs/4b26dd66b8f7bca37d851d259fdc118315cacc62 
   creating: models--openai--whisper-tiny/refs/
  inflating: models--openai--whisper-tiny/refs/main  
   creating: models--openai--whisper-tiny/.no_exist/
   creating: models--openai--whisper-tiny/.no_exist/169d4a4341b33bc18d8881c4b69c2e104e1cc0af/
  inflating: models--openai--whisper-tiny/.no_exist/169d4a4341b33bc18d8881c4b69c2e104e1cc0af/chat_template.jinja  
  inflating: models--openai--whisper-tiny/.no_exist/169d4a4341b33bc18d8881c4b69c2e104e1cc0af/processor_config.json  
  inflating: models--openai--whisper-tiny/.no_exist/169d4a4341b33bc18d8881c4b69c2e104e1cc0af/chat_template.json  
finishing deferred symbolic links:
  models--openai--whisper-tiny/snapshots/169d4a4341b33bc18d8881c4b69c2e104e1cc0af/preprocessor_config.json -> ../../blobs/c2048dfa9fd94a052e62e908d2c4dfb18534b4d2
  models--openai--whisper-tiny/snapshots/169d4a4341b33bc18d8881c4b69c2e104e1cc0af/tokenizer_config.json -> ../../blobs/d13b786c04765fb1a06492b53587752cd67665ea
  models--openai--whisper-tiny/snapshots/169d4a4341b33bc18d8881c4b69c2e104e1cc0af/vocab.json -> ../../blobs/d7016e21da8776c8a9d577d0f559600f09a240eb
  models--openai--whisper-tiny/snapshots/169d4a4341b33bc18d8881c4b69c2e104e1cc0af/tokenizer.json -> ../../blobs/1e95340ff836fad1b5932e800fb7b8c5e6d78a74
  models--openai--whisper-tiny/snapshots/169d4a4341b33bc18d8881c4b69c2e104e1cc0af/merges.txt -> ../../blobs/6038932a2a1f09a66991b1c2adae0d14066fa29e
  models--openai--whisper-tiny/snapshots/169d4a4341b33bc18d8881c4b69c2e104e1cc0af/normalizer.json -> ../../blobs/dd6ae819ad738ac1a546e9f9282ef325c33b9ea0
  models--openai--whisper-tiny/snapshots/169d4a4341b33bc18d8881c4b69c2e104e1cc0af/added_tokens.json -> ../../blobs/e3d256c988462aa153dcabe2aa38b8e9b436c06f
  models--openai--whisper-tiny/snapshots/169d4a4341b33bc18d8881c4b69c2e104e1cc0af/special_tokens_map.json -> ../../blobs/bf69932dca4b3719b59fdd8f6cc1978109509f6c
  models--openai--whisper-tiny/snapshots/169d4a4341b33bc18d8881c4b69c2e104e1cc0af/config.json -> ../../blobs/417aa9de49a132dd3eb6a56d3be2718b15f08917
  models--openai--whisper-tiny/snapshots/169d4a4341b33bc18d8881c4b69c2e104e1cc0af/model.safetensors -> ../../blobs/7ebd0e69e78190ffe1438491fa05cc1f5c1aa3a4c4db3bc1723adbb551ea2395
  models--openai--whisper-tiny/snapshots/169d4a4341b33bc18d8881c4b69c2e104e1cc0af/generation_config.json -> ../../blobs/4b26dd66b8f7bca37d851d259fdc118315cacc62
17:22 ~ $ 
17:22 ~ $ 
17:22 ~ $ du -sh .
446M    .
17:22 ~ $ 
Tags: Technology,Large Language Models,Cloud

Using OpenAI-Whisper-Tiny via HuggingFace for Automatic Speech Recognition app (Research)

To See All Articles About Technology: Index of Lessons in Technology

Can you show to create a flask based endpoint that does transcription of audio recorded from browser using the above model. Also give the associated frontend code.

Below is a complete example that shows how to build a Flask endpoint using the Hugging Face model "openai/whisper-tiny" to transcribe audio files uploaded from a browser. The Flask app will accept an audio file (recorded via the browser), process it with the Hugging Face Transformers pipeline, and return the transcription as JSON. A simple HTML/JavaScript front end is provided that records audio using the MediaRecorder API and sends it to the endpoint.


Flask Endpoint (app.py)

python
from flask import Flask, request, jsonify import os import uuid from transformers import WhisperProcessor, WhisperForConditionalGeneration import torch import librosa app = Flask(__name__) # Load the processor and model from Hugging Face model_name = "openai/whisper-tiny" processor = WhisperProcessor.from_pretrained(model_name) model = WhisperForConditionalGeneration.from_pretrained(model_name) @app.route("/transcribe", methods=["POST"]) def transcribe(): if "audio" not in request.files: return jsonify({"error": "No audio file provided."}), 400 audio_file = request.files["audio"] # Save the incoming audio file temporarily temp_filename = f"temp_{uuid.uuid4().hex}.wav" audio_file.save(temp_filename) try: # Load the audio using librosa and resample to 16kHz audio, sr = librosa.load(temp_filename, sr=16000) # Process audio into input features expected by the model input_features = processor(audio, sampling_rate=sr, return_tensors="pt").input_features # Generate transcription (default generation parameters; adjust if needed) predicted_ids = model.generate(input_features) transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0] except Exception as e: transcription = f"Error processing audio: {str(e)}" finally: os.remove(temp_filename) return jsonify({"transcription": transcription}) if __name__ == "__main__": # Run the server on port 5000 and listen on all interfaces. app.run(host="0.0.0.0", port=5000, debug=True)

Notes:

  • Make sure you have installed the required packages:
    bash
    pip install flask transformers torch librosa
  • The audio file is saved temporarily then removed after transcription.

Frontend (index.html)

This HTML page uses the MediaRecorder API to capture audio from the user's microphone, creates a Blob from the recorded data, and sends it via a POST request to the /transcribe endpoint.

html
<!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <title>Speech Transcription Demo</title> <style> body { font-family: Arial, sans-serif; margin: 2em; } #recordButton { padding: 1em; font-size: 1.2em; } #status { margin-top: 1em; } #transcription { margin-top: 1em; font-size: 1.1em; color: #333; } </style> </head> <body> <h1>Speech Transcription Demo</h1> <button id="recordButton">Start Recording</button> <p id="status"></p> <h2>Transcription:</h2> <p id="transcription"></p> <script> let mediaRecorder; let audioChunks = []; const recordButton = document.getElementById('recordButton'); const statusDisplay = document.getElementById('status'); const transcriptionDisplay = document.getElementById('transcription'); let isRecording = false; recordButton.addEventListener('click', async () => { if (!isRecording) { try { // Request access to the microphone const stream = await navigator.mediaDevices.getUserMedia({ audio: true }); mediaRecorder = new MediaRecorder(stream); audioChunks = []; mediaRecorder.ondataavailable = event => { if (event.data && event.data.size > 0) { audioChunks.push(event.data); } }; mediaRecorder.onstop = async () => { // Create a Blob from the recorded audio chunks const audioBlob = new Blob(audioChunks, { type: 'audio/wav' }); const formData = new FormData(); formData.append('audio', audioBlob, 'recording.wav'); statusDisplay.textContent = 'Sending audio for transcription...'; try { const response = await fetch('/transcribe', { method: 'POST', body: formData }); const data = await response.json(); transcriptionDisplay.textContent = data.transcription || 'No transcription available.'; } catch (error) { transcriptionDisplay.textContent = 'Error: ' + error; } statusDisplay.textContent = ''; }; mediaRecorder.start(); recordButton.textContent = 'Stop Recording'; isRecording = true; statusDisplay.textContent = 'Recording...'; } catch (error) { statusDisplay.textContent = 'Error accessing microphone: ' + error; } } else { // Stop recording and trigger transcription mediaRecorder.stop(); recordButton.textContent = 'Start Recording'; isRecording = false; statusDisplay.textContent = 'Processing audio...'; } }); </script> </body> </html>

Usage Instructions:

  • Place this index.html file in a directory served by your Flask app (or use a static file server).
  • When you load the page, click Start Recording to record your voice. Click the button again to stop recording and send the audio to the server.
  • The transcription returned from the server will be displayed on the page.

This setup gives you an end-to-end solution where your Flask endpoint leverages the Hugging Face "openai/whisper-tiny" model to transcribe audio recorded from the browser. Feel free to adjust generation parameters, error handling, and UI details as needed for your application.




Can rewrite the python code to use tensorflow as backend for transformers?

Below is an example of a Flask endpoint that uses the Hugging Face Transformers library with TensorFlow as the backend to load the "openai/whisper-tiny" model for transcription. In this example, we use the TensorFlow‑compatible model loader (via TFAutoModelForConditionalGeneration) and tell the processor to return TensorFlow tensors. The endpoint accepts an uploaded audio file, processes it with the model, and returns the transcription as JSON. The accompanying HTML page uses the MediaRecorder API to record audio in the browser and send it to the endpoint.


Flask Endpoint (app_tf.py)

python
from flask import Flask, request, jsonify import os import uuid import librosa from transformers import WhisperProcessor, TFAutoModelForConditionalGeneration app = Flask(__name__) # Use the TensorFlow version of the model model_name = "openai/whisper-tiny" processor = WhisperProcessor.from_pretrained(model_name) model = TFAutoModelForConditionalGeneration.from_pretrained(model_name) @app.route("/transcribe", methods=["POST"]) def transcribe(): if "audio" not in request.files: return jsonify({"error": "No audio file provided."}), 400 audio_file = request.files["audio"] # Save the audio file temporarily temp_filename = f"temp_{uuid.uuid4().hex}.wav" audio_file.save(temp_filename) try: # Load the audio with librosa and ensure a 16kHz sampling rate audio, sr = librosa.load(temp_filename, sr=16000) # Process the audio and get input features as a TensorFlow tensor inputs = processor(audio, sampling_rate=sr, return_tensors="tf") input_features = inputs.input_features # Generate transcription using the TF model (generate returns a TF tensor) predicted_ids = model.generate(input_features) # Convert predicted_ids tensor to numpy before decoding transcription = processor.batch_decode(predicted_ids.numpy(), skip_special_tokens=True)[0] except Exception as e: transcription = f"Error processing audio: {str(e)}" finally: os.remove(temp_filename) return jsonify({"transcription": transcription}) if __name__ == "__main__": app.run(host="0.0.0.0", port=5000, debug=True)

Dependencies:
Make sure you install the necessary packages:

bash
pip install flask transformers tensorflow librosa

Frontend (index.html)

This HTML page uses the browser’s MediaRecorder API to record the user’s voice and send the resulting audio file to the /transcribe endpoint.

html
<!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <title>Speech Transcription Demo</title> <style> body { font-family: Arial, sans-serif; margin: 2em; } #recordButton { padding: 1em; font-size: 1.2em; } #status { margin-top: 1em; } #transcription { margin-top: 1em; font-size: 1.1em; color: #333; } </style> </head> <body> <h1>Speech Transcription Demo</h1> <button id="recordButton">Start Recording</button> <p id="status"></p> <h2>Transcription:</h2> <p id="transcription"></p> <script> let mediaRecorder; let audioChunks = []; const recordButton = document.getElementById('recordButton'); const statusDisplay = document.getElementById('status'); const transcriptionDisplay = document.getElementById('transcription'); let isRecording = false; recordButton.addEventListener('click', async () => { if (!isRecording) { try { // Request microphone access const stream = await navigator.mediaDevices.getUserMedia({ audio: true }); mediaRecorder = new MediaRecorder(stream); audioChunks = []; mediaRecorder.ondataavailable = event => { if (event.data && event.data.size > 0) { audioChunks.push(event.data); } }; mediaRecorder.onstop = async () => { // Combine audio chunks into a Blob const audioBlob = new Blob(audioChunks, { type: 'audio/wav' }); const formData = new FormData(); formData.append('audio', audioBlob, 'recording.wav'); statusDisplay.textContent = 'Sending audio for transcription...'; try { const response = await fetch('/transcribe', { method: 'POST', body: formData }); const data = await response.json(); transcriptionDisplay.textContent = data.transcription || 'No transcription available.'; } catch (error) { transcriptionDisplay.textContent = 'Error: ' + error; } statusDisplay.textContent = ''; }; mediaRecorder.start(); recordButton.textContent = 'Stop Recording'; isRecording = true; statusDisplay.textContent = 'Recording...'; } catch (error) { statusDisplay.textContent = 'Error accessing microphone: ' + error; } } else { // Stop recording and trigger transcription mediaRecorder.stop(); recordButton.textContent = 'Start Recording'; isRecording = false; statusDisplay.textContent = 'Processing audio...'; } }); </script> </body> </html>

Summary

  • The Flask endpoint (app_tf.py) uses TensorFlow as the backend by loading the model with TFAutoModelForConditionalGeneration and returning TensorFlow tensors (with return_tensors="tf").
  • The Frontend (index.html) records audio using the browser’s MediaRecorder API, sends the recorded audio as a file to the Flask endpoint, and displays the transcription result.

This setup provides an end-to-end solution where a browser-based client records audio and sends it to a TensorFlow‑backed transcription endpoint using the Hugging Face "openai/whisper-tiny" model.




from transformers import WhisperProcessor,
TFAutoModelForConditionalGeneration
ImportError: cannot import name 'TFAutoModelForConditionalGeneration' from 'transformers' (/home/ashish/anaconda3/envs/hf_202412/lib/python3.12/site-packages/transformers/__init__.py)

This error indicates that the class you're trying to import—TFAutoModelForConditionalGeneration—is not available in your current Transformers installation. Here are a few points to consider and possible solutions:


1. Version or Availability Issue

  • Transformers Version:
    Make sure you’re using a recent version of the Transformers library. You can upgrade with:

    bash
    pip install --upgrade transformers

    However, even with the latest version, not all model architectures have TensorFlow implementations.

  • TensorFlow Support for Whisper:
    Currently, the Whisper models (like "openai/whisper-tiny") are primarily implemented in PyTorch. The TensorFlow equivalents (for conditional generation) may not be available yet. The class TFAutoModelForConditionalGeneration might not support Whisper because a TensorFlow version of Whisper isn’t implemented.


2. Alternatives and Workarounds

  • Use the PyTorch Version:
    Since the Whisper model is officially available as a PyTorch model, you might consider using the PyTorch backend with:

    python
    from transformers import WhisperProcessor, AutoModelForConditionalGeneration model = AutoModelForConditionalGeneration.from_pretrained("openai/whisper-tiny")

    This approach is currently the recommended path for using Whisper.

  • Check for Experimental TF Support:
    If you specifically need a TensorFlow implementation, you could check the Transformers GitHub repository or its release notes for any experimental support for Whisper on TF. At the time of writing, Whisper’s TensorFlow support is not yet mainstream.

  • Use TFAutoModelForSeq2SeqLM (If Applicable):
    Some sequence-to-sequence models use the class TFAutoModelForSeq2SeqLM for TensorFlow. However, unless the Whisper model is explicitly supported in TensorFlow, this may not work either.


Summary

  • Error Cause: The error is raised because TFAutoModelForConditionalGeneration isn’t implemented (or isn’t available in your Transformers version) for the Whisper model.
  • Recommendation: For now, use the PyTorch implementation (via AutoModelForConditionalGeneration) when working with the "openai/whisper-tiny" model.
  • Future Updates: Keep an eye on the Transformers release notes; TensorFlow support for Whisper might be added in a future release.

If you need further guidance or have additional questions, feel free to ask!

Tags: Technology,Large Language Models,