video-whisper

ylongw/video-whisper

🎙️ Local video/audio transcription on Apple Silicon using MLX Whisper. No API keys, no cloud, no cost.

3 stars

0 forks

Shell

25 views

View on GitHub Add to Favorites

SKILL.md

Video Whisper — Local Video/Audio Transcription

Transcribe videos and audio locally on Apple Silicon using MLX Whisper. Supports YouTube, Bilibili, Xiaohongshu, Douyin, podcasts, and local files.

Runs entirely on-device. No API keys. No cloud. No cost.

Requirements

Apple Silicon Mac (M1/M2/M3/M4)
Homebrew packages: yt-dlp, ffmpeg
Python venv with mlx-whisper

Installation

# 1. Install system dependencies
brew install yt-dlp ffmpeg

# 2. Create Python venv and install mlx-whisper
python3 -m venv ~/.openclaw/venvs/whisper
~/.openclaw/venvs/whisper/bin/pip install mlx-whisper

Usage

CLI

bash scripts/transcribe.sh "<URL_or_FILE>" [model]

URL: YouTube, Bilibili, Xiaohongshu, Douyin, or any yt-dlp supported site
Local file: /path/to/video.mp4, /path/to/audio.wav, etc.
model (optional): defaults to mlx-community/whisper-medium-mlx

Output:

/tmp/whisper_output.txt — plain text transcript
/tmp/whisper_output.json — JSON with timestamps per segment

Examples

# YouTube video
bash scripts/transcribe.sh "https://www.youtube.com/watch?v=dQw4w9WgXcQ"

# Bilibili video
bash scripts/transcribe.sh "https://www.bilibili.com/video/BV1xx411c7mD"

# Local file
bash scripts/transcribe.sh ~/Downloads/podcast.mp3

# Use large model for better accuracy
bash scripts/transcribe.sh "https://youtu.be/xxx" mlx-community/whisper-large-v3-mlx

Custom Python Path

If your mlx-whisper is installed in a non-standard location:

export WHISPER_PYTHON=/path/to/your/venv/bin/python3
bash scripts/transcribe.sh "<URL>"

Available Models

Model	Size	Speed (10min video)	Best For
`mlx-community/whisper-small-mlx`	~460MB	~20s	Quick drafts, English
`mlx-community/whisper-medium-mlx`	~1.5GB	~60-90s	Recommended — good balance
`mlx-community/whisper-large-v3-mlx`	~3GB	~90-120s	Best accuracy, multilingual

First run downloads the model to ~/.cache/huggingface/hub/ (cached for future use).

Performance (Mac mini M4, 16GB)

Video Length	medium	large-v3
5 min	~30-40s	~50-60s
10 min	~60-90s	~90-120s
30 min	~3-4 min	~5-6 min
60 min	~6-8 min	~10-12 min

OpenClaw Integration

Drop this skill into your OpenClaw workspace:

cp -r video-whisper ~/.openclaw/workspace/skills/

Then ask your agent: "帮我转录这个视频 https://..."

The agent will run the script, read the output, and summarize or analyze as needed.

Notes

Chinese content: use medium or large-v3 (small is weak on Chinese)
Xiaohongshu/Douyin: may need browser cookies (--cookies-from-browser chrome)
Long videos (>1h): consider running in background
All temp files in /tmp/, cleaned up automatically

License

MIT

Installation

Option 1: Use slash command in Claude Code

/install-skill https://github.com/ylongw/video-whisper

Option 2: Clone to skills directory

# Global (all projects)

git clone https://github.com/ylongw/video-whisper ~/.claude/skills/video-whisper

# Project-specific

git clone https://github.com/ylongw/video-whisper .claude/skills/video-whisper

Add MCP server to .cursor/mcp.json:

{
  "mcpServers": {
    "skillz": {
      "command": "npx",
      "args": ["-y", "skillz-mcp", "https://github.com/ylongw/video-whisper"]
    }
  }
}

Restart Cursor after adding the configuration.

Option 1: Use Gemini CLI command

gemini extensions install https://github.com/ylongw/video-whisper

Option 2: Clone to extensions directory

git clone https://github.com/ylongw/video-whisper ~/.gemini/extensions/video-whisper

Topics

apple-silicon local-ai mlx speech-to-text transcription whisper

Related Skills

using-git-worktrees

An agentic skills framework & software development methodology that works.

Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by executing routine tasks, explaining complex code, and handling git workflows - all through natural language commands.

swiftui-view-refactor

My Codex Skills

daily-workflow

A complete starter kit for an Obsidian + Claude Code personal knowledge management system.