whisper-transcribe

SpillwaveSolutions/whisper-transcribe

Transcribes audio and video files to text using OpenAI's Whisper CLI with contextual grounding. Converts audio/video to text, transcribes recordings, and creates transcripts from media files. Use when asked to "whisper transcribe", "transcribe audio", "convert recording to text", or "speech to text". Uses markdown files in the same directory as context to improve transcription accuracy for technical terms, proper nouns, and domain-specific vocabulary.

0 stars

0 forks

Python

13 views

View on GitHub Add to Favorites

SKILL.md

name: whisper-transcribe description: | Transcribes audio and video files to text using OpenAI's Whisper CLI with contextual grounding. Converts audio/video to text, transcribes recordings, and creates transcripts from media files. Use when asked to "whisper transcribe", "transcribe audio", "convert recording to text", or "speech to text". Uses markdown files in the same directory as context to improve transcription accuracy for technical terms, proper nouns, and domain-specific vocabulary. version: 1.0.0 category: media-processing triggers:

whisper
transcribe
transcription
audio to text
video to text
speech to text
convert recording
meeting transcript
.mp3
.wav
.m4a
.mp4
.webm author: Claude Code license: MIT tags:
whisper
transcription
audio
video
speech-to-text
context-grounding

Whisper Transcribe Skill

Transcribe audio and video files to text using OpenAI's Whisper with contextual grounding from markdown files.

Purpose

Intelligent audio/video transcription that:

Converts media files to accurate text transcripts
Uses markdown context files to correct technical terms, names, and jargon
Handles various audio/video formats (mp3, wav, m4a, mp4, webm, etc.)

When to Use

User asks to transcribe an audio or video file
User wants to convert a recording to text
User mentions "whisper" in context of transcription
User needs meeting notes or interview transcripts
User has media files with domain-specific terminology

Installation

macOS (Recommended for MacBook Pro)

# Install via Homebrew (recommended)
brew install ffmpeg openai-whisper

# Verify installation
whisper --version

Linux/pip Installation

# Install ffmpeg first
sudo apt install ffmpeg  # Debian/Ubuntu
# or: sudo dnf install ffmpeg  # Fedora

# Install Whisper
pip install openai-whisper

Verify Installation

whisper --version
ffmpeg -version

Transcription Workflow

Step 1: Identify Media File and Context

Locate the audio/video file to transcribe
Check for markdown files in the same directory (context files)
If no context files exist, optionally create one using assets/context-template.md

Step 2: Run Whisper Transcription

Basic transcription:

whisper "/path/to/audio.mp3" --output_dir "/path/to/output"

With model selection (trade-off: speed vs accuracy):

# Fast (less accurate)
whisper "audio.mp3" --model tiny

# Balanced (recommended)
whisper "audio.mp3" --model base

# High quality
whisper "audio.mp3" --model small

# Best quality (slower, requires more RAM)
whisper "audio.mp3" --model medium
whisper "audio.mp3" --model large

With language specification:

whisper "audio.mp3" --language en

Output format options:

whisper "audio.mp3" --output_format txt    # Plain text
whisper "audio.mp3" --output_format srt    # Subtitles
whisper "audio.mp3" --output_format vtt    # Web subtitles
whisper "audio.mp3" --output_format json   # Detailed JSON
whisper "audio.mp3" --output_format all    # All formats

Step 3: Apply Context Grounding

Use the scripts/transcribe_with_context.py script for automated grounding, or manually apply corrections:

# Automated approach (recommended)
python scripts/transcribe_with_context.py /path/to/audio.mp3

For manual grounding:

Read the transcript output
Read all .md files in the media file's directory
Extract terminology, names, and technical terms from context files
Search transcript for likely misrecognitions
Apply corrections based on context

Common corrections:

"cooler net ease" -> "Kubernetes"
"sequel" -> "SQL"
"post gress" -> "Postgres"
Names: Match phonetic variations to names in context files

Step 4: Save Corrected Transcript

Save the grounded transcript with a clear filename:

original_filename_transcript.txt
original_filename_transcript.md

Context Files

Context files are markdown files in the same directory as the media file. They provide grounding information to improve transcription accuracy.

What to Include in Context Files

People: Names of speakers, team members, interviewees
Technical Terms: Domain-specific vocabulary, product names
Acronyms: Abbreviations and their expansions
Organizations: Company names, department names
Projects: Project codenames, feature names

Context File Example

See assets/context-template.md for a complete template.

# Meeting Context

## Speakers
- Richard Hightower (host)
- Jane Smith (engineering lead)

## Technical Terms
- Kubernetes (container orchestration)
- FastAPI (Python web framework)
- AlloyDB (Google Cloud database)

## Acronyms
- CI/CD - Continuous Integration/Continuous Deployment
- PR - Pull Request

Model Selection Guide

Use base for general use, medium for important recordings. See references/whisper-options.md for full model comparison and all available options.

Quick reference: tiny (fastest) < base (balanced) < small (better) < medium (high) < large (best accuracy)

For MacBook Pro with Apple Silicon: small or medium models recommended for best speed/accuracy balance.

Troubleshooting

"whisper: command not found"

# macOS
brew install openai-whisper

# Linux
pip install openai-whisper
export PATH="$HOME/.local/bin:$PATH"

"ffmpeg not found"

# macOS
brew install ffmpeg

# Linux
sudo apt install ffmpeg

Out of memory errors

Use a smaller model:

whisper "audio.mp3" --model tiny

Slow transcription

Use tiny or base model for faster results
Ensure correct architecture is being used (Apple Silicon vs Intel)

Resources

scripts/

The scripts/transcribe_with_context.py script automates the full workflow:

Finds context files automatically
Runs Whisper transcription
Applies context-based corrections
Saves the final transcript

Usage:

python scripts/transcribe_with_context.py /path/to/audio.mp3

references/

See references/whisper-options.md for complete CLI reference and advanced options.

assets/

The assets/context-template.md provides a template for creating context files to improve transcription accuracy.

README

Whisper Transcribe Skill

A Claude Code skill for transcribing audio and video files using OpenAI's Whisper with context-grounding from markdown files.

Features

Audio/Video Transcription: Convert media files to text using OpenAI Whisper
Context Grounding: Uses markdown files in the same directory to improve accuracy for technical terms, names, and jargon
Multi-format Support: Works with mp3, wav, m4a, mp4, webm, and more
Cross-platform: Supports macOS (Homebrew) and Linux installations
Automated Workflow: Python script handles the full transcription pipeline

Installation

Quick Install with Skilz (Recommended)

The easiest way to install this skill is using the skilz universal installer:

npx skilz install SpillwaveSolutions_whisper-transcribe/whisper-transcribe

This command automatically downloads and configures the skill for Claude Code.

View on Skilz Marketplace: whisper-transcribe

Manual Installation

Clone the repository to your Claude Code skills directory:

git clone https://github.com/SpillwaveSolutions/whisper-transcribe.git ~/.claude/skills/whisper-transcribe

Prerequisites

After installing the skill, you need to install Whisper and ffmpeg on your system.

macOS (Homebrew)

brew install ffmpeg openai-whisper

Linux

# Install ffmpeg
sudo apt install ffmpeg  # Debian/Ubuntu

# Install Whisper
pip install openai-whisper

Verify Installation

whisper --version
ffmpeg -version

Usage

Basic Transcription

whisper /path/to/audio.mp3 --output_dir /path/to/output

With Context Grounding Script

python scripts/transcribe_with_context.py /path/to/audio.mp3 --model base --language en

The script will:

Find markdown context files in the same directory
Run Whisper transcription
Apply corrections based on context (technical terms, names)
Save both original and grounded transcripts

Model Selection

Model	Speed	Accuracy	RAM Required	Best For
tiny	Fastest	Lower	~1 GB	Quick drafts, testing
base	Fast	Good	~1 GB	General use
small	Medium	Better	~2 GB	Important recordings
medium	Slower	High	~5 GB	Professional transcription
large	Slowest	Highest	~10 GB	Critical accuracy needs

For MacBook Pro with Apple Silicon: small or medium models recommended for best speed/accuracy balance.

Context Files

Create markdown files in the same directory as your audio to improve transcription accuracy.

Example Context File

# Meeting Context

## Speakers
- Richard Hightower (host)
- Jane Smith (engineering lead)

## Technical Terms
- Kubernetes (container orchestration)
- FastAPI (Python web framework)
- AlloyDB (Google Cloud database)

## Acronyms
- CI/CD - Continuous Integration/Continuous Deployment
- PR - Pull Request

See assets/context-template.md for a complete template.

Project Structure

whisper-transcribe/
├── SKILL.md                        # Skill definition
├── README.md                       # This file
├── scripts/
│   └── transcribe_with_context.py  # Automated transcription script
├── references/
│   └── whisper-options.md          # Complete Whisper CLI reference
└── assets/
    └── context-template.md         # Template for context files

Triggers

This skill activates when users mention:

whisper, transcribe, transcription
audio to text, video to text, speech to text
meeting transcript, convert recording
File extensions: .mp3, .wav, .m4a, .mp4, .webm

Troubleshooting

"whisper: command not found"

# macOS
brew install openai-whisper

# Linux
pip install openai-whisper
export PATH="$HOME/.local/bin:$PATH"

"ffmpeg not found"

# macOS
brew install ffmpeg

# Linux
sudo apt install ffmpeg

Out of memory errors

Use a smaller model:

whisper "audio.mp3" --model tiny

Slow transcription

Use tiny or base model for faster results
Ensure correct architecture is being used (Apple Silicon vs Intel)

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

MIT

Installation

Option 1: Use slash command in Claude Code

/install-skill https://github.com/SpillwaveSolutions/whisper-transcribe

Option 2: Clone to skills directory

# Global (all projects)

git clone https://github.com/SpillwaveSolutions/whisper-transcribe ~/.claude/skills/whisper-transcribe

# Project-specific

git clone https://github.com/SpillwaveSolutions/whisper-transcribe .claude/skills/whisper-transcribe

Add MCP server to .cursor/mcp.json:

{
  "mcpServers": {
    "skillz": {
      "command": "npx",
      "args": ["-y", "skillz-mcp", "https://github.com/SpillwaveSolutions/whisper-transcribe"]
    }
  }
}

Restart Cursor after adding the configuration.

Option 1: Use Gemini CLI command

gemini extensions install https://github.com/SpillwaveSolutions/whisper-transcribe

Option 2: Clone to extensions directory

git clone https://github.com/SpillwaveSolutions/whisper-transcribe ~/.gemini/extensions/whisper-transcribe