whisper-transcribe
SpillwaveSolutions/whisper-transcribeTranscribes audio and video files to text using OpenAI's Whisper CLI with contextual grounding. Converts audio/video to text, transcribes recordings, and creates transcripts from media files. Use when asked to "whisper transcribe", "transcribe audio", "convert recording to text", or "speech to text". Uses markdown files in the same directory as context to improve transcription accuracy for technical terms, proper nouns, and domain-specific vocabulary.
SKILL.md
name: whisper-transcribe description: | Transcribes audio and video files to text using OpenAI's Whisper CLI with contextual grounding. Converts audio/video to text, transcribes recordings, and creates transcripts from media files. Use when asked to "whisper transcribe", "transcribe audio", "convert recording to text", or "speech to text". Uses markdown files in the same directory as context to improve transcription accuracy for technical terms, proper nouns, and domain-specific vocabulary. version: 1.0.0 category: media-processing triggers:
- whisper
- transcribe
- transcription
- audio to text
- video to text
- speech to text
- convert recording
- meeting transcript
- .mp3
- .wav
- .m4a
- .mp4
- .webm author: Claude Code license: MIT tags:
- whisper
- transcription
- audio
- video
- speech-to-text
- context-grounding
Whisper Transcribe Skill
Transcribe audio and video files to text using OpenAI's Whisper with contextual grounding from markdown files.
Purpose
Intelligent audio/video transcription that:
- Converts media files to accurate text transcripts
- Uses markdown context files to correct technical terms, names, and jargon
- Handles various audio/video formats (mp3, wav, m4a, mp4, webm, etc.)
When to Use
- User asks to transcribe an audio or video file
- User wants to convert a recording to text
- User mentions "whisper" in context of transcription
- User needs meeting notes or interview transcripts
- User has media files with domain-specific terminology
Installation
macOS (Recommended for MacBook Pro)
# Install via Homebrew (recommended)
brew install ffmpeg openai-whisper
# Verify installation
whisper --version
Linux/pip Installation
# Install ffmpeg first
sudo apt install ffmpeg # Debian/Ubuntu
# or: sudo dnf install ffmpeg # Fedora
# Install Whisper
pip install openai-whisper
Verify Installation
whisper --version
ffmpeg -version
Transcription Workflow
Step 1: Identify Media File and Context
- Locate the audio/video file to transcribe
- Check for markdown files in the same directory (context files)
- If no context files exist, optionally create one using
assets/context-template.md
Step 2: Run Whisper Transcription
Basic transcription:
whisper "/path/to/audio.mp3" --output_dir "/path/to/output"
With model selection (trade-off: speed vs accuracy):
# Fast (less accurate)
whisper "audio.mp3" --model tiny
# Balanced (recommended)
whisper "audio.mp3" --model base
# High quality
whisper "audio.mp3" --model small
# Best quality (slower, requires more RAM)
whisper "audio.mp3" --model medium
whisper "audio.mp3" --model large
With language specification:
whisper "audio.mp3" --language en
Output format options:
whisper "audio.mp3" --output_format txt # Plain text
whisper "audio.mp3" --output_format srt # Subtitles
whisper "audio.mp3" --output_format vtt # Web subtitles
whisper "audio.mp3" --output_format json # Detailed JSON
whisper "audio.mp3" --output_format all # All formats
Step 3: Apply Context Grounding
Use the scripts/transcribe_with_context.py script for automated grounding, or manually apply corrections:
# Automated approach (recommended)
python scripts/transcribe_with_context.py /path/to/audio.mp3
For manual grounding:
- Read the transcript output
- Read all
.mdfiles in the media file's directory - Extract terminology, names, and technical terms from context files
- Search transcript for likely misrecognitions
- Apply corrections based on context
Common corrections:
- "cooler net ease" -> "Kubernetes"
- "sequel" -> "SQL"
- "post gress" -> "Postgres"
- Names: Match phonetic variations to names in context files
Step 4: Save Corrected Transcript
Save the grounded transcript with a clear filename:
original_filename_transcript.txt
original_filename_transcript.md
Context Files
Context files are markdown files in the same directory as the media file. They provide grounding information to improve transcription accuracy.
What to Include in Context Files
- People: Names of speakers, team members, interviewees
- Technical Terms: Domain-specific vocabulary, product names
- Acronyms: Abbreviations and their expansions
- Organizations: Company names, department names
- Projects: Project codenames, feature names
Context File Example
See assets/context-template.md for a complete template.
# Meeting Context
## Speakers
- Richard Hightower (host)
- Jane Smith (engineering lead)
## Technical Terms
- Kubernetes (container orchestration)
- FastAPI (Python web framework)
- AlloyDB (Google Cloud database)
## Acronyms
- CI/CD - Continuous Integration/Continuous Deployment
- PR - Pull Request
Model Selection Guide
Use base for general use, medium for important recordings. See references/whisper-options.md for full model comparison and all available options.
Quick reference: tiny (fastest) < base (balanced) < small (better) < medium (high) < large (best accuracy)
For MacBook Pro with Apple Silicon: small or medium models recommended for best speed/accuracy balance.
Troubleshooting
"whisper: command not found"
# macOS
brew install openai-whisper
# Linux
pip install openai-whisper
export PATH="$HOME/.local/bin:$PATH"
"ffmpeg not found"
# macOS
brew install ffmpeg
# Linux
sudo apt install ffmpeg
Out of memory errors
Use a smaller model:
whisper "audio.mp3" --model tiny
Slow transcription
- Use
tinyorbasemodel for faster results - Ensure correct architecture is being used (Apple Silicon vs Intel)
Resources
scripts/
The scripts/transcribe_with_context.py script automates the full workflow:
- Finds context files automatically
- Runs Whisper transcription
- Applies context-based corrections
- Saves the final transcript
Usage:
python scripts/transcribe_with_context.py /path/to/audio.mp3
references/
See references/whisper-options.md for complete CLI reference and advanced options.
assets/
The assets/context-template.md provides a template for creating context files to improve transcription accuracy.
README
Whisper Transcribe Skill
A Claude Code skill for transcribing audio and video files using OpenAI's Whisper with context-grounding from markdown files.
Features
- Audio/Video Transcription: Convert media files to text using OpenAI Whisper
- Context Grounding: Uses markdown files in the same directory to improve accuracy for technical terms, names, and jargon
- Multi-format Support: Works with mp3, wav, m4a, mp4, webm, and more
- Cross-platform: Supports macOS (Homebrew) and Linux installations
- Automated Workflow: Python script handles the full transcription pipeline
Installation
Quick Install with Skilz (Recommended)
The easiest way to install this skill is using the skilz universal installer:
npx skilz install SpillwaveSolutions_whisper-transcribe/whisper-transcribe
This command automatically downloads and configures the skill for Claude Code.
View on Skilz Marketplace: whisper-transcribe
Manual Installation
Clone the repository to your Claude Code skills directory:
git clone https://github.com/SpillwaveSolutions/whisper-transcribe.git ~/.claude/skills/whisper-transcribe
Prerequisites
After installing the skill, you need to install Whisper and ffmpeg on your system.
macOS (Homebrew)
brew install ffmpeg openai-whisper
Linux
# Install ffmpeg
sudo apt install ffmpeg # Debian/Ubuntu
# Install Whisper
pip install openai-whisper
Verify Installation
whisper --version
ffmpeg -version
Usage
Basic Transcription
whisper /path/to/audio.mp3 --output_dir /path/to/output
With Context Grounding Script
python scripts/transcribe_with_context.py /path/to/audio.mp3 --model base --language en
The script will:
- Find markdown context files in the same directory
- Run Whisper transcription
- Apply corrections based on context (technical terms, names)
- Save both original and grounded transcripts
Model Selection
| Model | Speed | Accuracy | RAM Required | Best For |
|---|---|---|---|---|
| tiny | Fastest | Lower | ~1 GB | Quick drafts, testing |
| base | Fast | Good | ~1 GB | General use |
| small | Medium | Better | ~2 GB | Important recordings |
| medium | Slower | High | ~5 GB | Professional transcription |
| large | Slowest | Highest | ~10 GB | Critical accuracy needs |
For MacBook Pro with Apple Silicon: small or medium models recommended for best speed/accuracy balance.
Context Files
Create markdown files in the same directory as your audio to improve transcription accuracy.
Example Context File
# Meeting Context
## Speakers
- Richard Hightower (host)
- Jane Smith (engineering lead)
## Technical Terms
- Kubernetes (container orchestration)
- FastAPI (Python web framework)
- AlloyDB (Google Cloud database)
## Acronyms
- CI/CD - Continuous Integration/Continuous Deployment
- PR - Pull Request
See assets/context-template.md for a complete template.
Project Structure
whisper-transcribe/
├── SKILL.md # Skill definition
├── README.md # This file
├── scripts/
│ └── transcribe_with_context.py # Automated transcription script
├── references/
│ └── whisper-options.md # Complete Whisper CLI reference
└── assets/
└── context-template.md # Template for context files
Triggers
This skill activates when users mention:
- whisper, transcribe, transcription
- audio to text, video to text, speech to text
- meeting transcript, convert recording
- File extensions: .mp3, .wav, .m4a, .mp4, .webm
Troubleshooting
"whisper: command not found"
# macOS
brew install openai-whisper
# Linux
pip install openai-whisper
export PATH="$HOME/.local/bin:$PATH"
"ffmpeg not found"
# macOS
brew install ffmpeg
# Linux
sudo apt install ffmpeg
Out of memory errors
Use a smaller model:
whisper "audio.mp3" --model tiny
Slow transcription
- Use
tinyorbasemodel for faster results - Ensure correct architecture is being used (Apple Silicon vs Intel)
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
MIT