xiaoyuzhou-podcast

zcker/xiaoyuzhou-podcast-skill

Download Xiaoyuzhou FM podcasts with full transcripts via FunASR ASR. Use when user provides xiaoyuzhoufm.com links or requests podcast content analysis.

0 stars

0 forks

Python

101 views

View on GitHub Add to Favorites

SKILL.md

name: xiaoyuzhou-podcast description: Download Xiaoyuzhou FM podcasts with full transcripts via FunASR ASR. Use when user provides xiaoyuzhoufm.com links or requests podcast content analysis. allowed-tools:

bash
read
glob
webfetch

Xiaoyuzhou Podcast Skill

Download podcasts from xiaoyuzhoufm.com and generate full transcripts using FunASR Automatic Speech Recognition (ASR).

Overview

This skill processes Xiaoyuzhou FM podcast links to:

Download audio files and show notes
Generate full transcripts via ASR (FunASR paraformer-zh)
Extract structured metadata
Provide comprehensive content for analysis

When to Use

Activate this skill when:

User provides a xiaoyuzhoufm.com episode link
User asks to "download podcast" or "transcribe podcast"
User needs full text content of a podcast for analysis
User provides a 24-character hex episode ID

Workflow

Step 1: Install Dependencies

First-time setup:

~/.claude/skills/xiaoyuzhou-podcast/scripts/install.sh

This checks and installs:

Python 3.8+
PyTorch with Metal (MPS) acceleration
FunASR and ModelScope
xyz-dl downloader

Step 2: Download Audio and Show Notes

~/.claude/skills/xiaoyuzhou-podcast/scripts/download.sh <URL or Episode ID>

Examples:

# Using full URL
scripts/download.sh https://www.xiaoyuzhoufm.com/episode/6942f3e852d4707aaa1feba3

# Using episode ID only
scripts/download.sh 6942f3e852d4707aaa1feba3

# Custom output directory
scripts/download.sh 6942f3e852d4707aaa1feba3 ~/MyPodcasts

Output structure:

~/Research/Podcast/
├── {id}_{host} - {title}/       # 播客目录
│   ├── README.md                # 最终合并文档（Show Notes + 转录）
│   └── .cache/                  # 临时缓存（处理后自动删除）
│       ├── *.md                 # Show Notes
│       └── *.m4a                # 音频文件

Step 3: Generate Full Transcript (Enhanced)

python3 ~/.claude/skills/xiaoyuzhou-podcast/scripts/transcribe_enhanced.py --audio <audio_path>

New Features:

✅ Speaker Diarization - Automatically identify different speakers
✅ Smart Segmentation - Intelligent paragraph breaks based on context
✅ Dialogue Formatting - Structured conversation format with speaker labels

Options:

--audio: Path to audio file (required)
--output-dir: Custom output directory
--hotword: Space-separated keywords to improve accuracy
--batch-size: Batch size in seconds (default: 300)
--no-diarization: Disable speaker diarization
--no-segmentation: Disable smart segmentation

Example:

# Basic transcription with all enhancements
python3 scripts/transcribe_enhanced.py --audio ~/Research/Podcast/6942f3e852d4707aaa1feba3/.cache/podcast.m4a

# With hotwords
python3 scripts/transcribe_enhanced.py --audio podcast.m4a --hotword "巴菲特 穆迪 投资理念"

# Disable speaker diarization (faster)
python3 scripts/transcribe_enhanced.py --audio podcast.m4a --no-diarization

Output:

~/Research/Podcast/{id}_{host} - {title}/.cache/
├── {id}_{host} - {title}.txt              # Raw transcript
├── {id}_{host} - {title}_formatted.md     # Enhanced version ⭐
└── {id}_{host} - {title}_timestamp.txt    # With timestamps

Formatted Version Includes:

Dialogue Record - Speaker-labeled conversations (when diarization enabled)
Full Text - Smart paragraph segmentation

Step 4: Extract Structured Information

~/.claude/skills/xiaoyuzhou-podcast/scripts/extract-info.sh <Episode ID or Show Notes path>

This outputs:

Basic metadata (title, host, duration, date)
Links (episode URL, audio URL)
File locations
Transcript statistics and preview

Input Format

Accepts either:

Full URL: https://www.xiaoyuzhoufm.com/episode/{24-char-hex-id}
Episode ID: 24-character hexadecimal string (e.g., 6942f3e852d4707aaa1feba3)

Performance

Expected performance on Mac M1/M2/M3:

Chinese ASR accuracy: > 90%
Processing speed: 0.3-0.5x real-time (1 hour audio → 18-30 min transcription)
Memory usage: ~1.5-2GB (model + audio)
Metal (MPS) acceleration: 2-3x faster than CPU

Technical Details

ASR Engine

FunASR paraformer-zh:

Model: 220M parameters, ~900MB
Training data: 60,000 hours of Chinese Mandarin
Native timestamp support (character-level)
Automatic punctuation restoration
Voice Activity Detection (VAD) for silence removal

Model Storage

Models are automatically downloaded to:

~/.cache/modelscope/hub/iic/
├── speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pybuild/
├── speech_fsmn_vad_zh-cn-16k-common-pybuild/
└── punc_ct-transformer_cn-en-common-vocab471067-large/

Total disk usage: ~2GB (first-time download)

Acceleration

Mac M1/M2/M3: Metal Performance Shaders (MPS)
NVIDIA GPU: CUDA support (optional)
CPU: Fallback mode (slower)

Success Response Format

[SUCCESS] Podcast processed successfully

**Metadata:**
- Title: {title}
- Host: {host}
- Duration: {duration}
- Published: {date}
- Episode ID: {id}

**Files:**
- Final Document: ~/Research/Podcast/{id}_{host} - {title}/README.md
- (Cache files deleted after processing)

**Transcript Statistics:**
- Word count: {count}
- Processing time: {time}

**Transcript Preview:**
{first 500 characters}...

Error Handling

Error Type	Message	Solution
`invalid_url`	Invalid URL format	Use full URL or 24-char hex ID
`not_installed`	Dependencies missing	Run `install.sh`
`download_failed`	Audio download failed	Check network, verify URL
`transcribe_failed`	ASR transcription failed	Check audio file integrity
`not_found`	Episode not found	Verify URL is correct
`mps_unavailable`	Metal acceleration unavailable	Will use CPU fallback

Important Notes

Usage Restrictions

Personal Use Only: Downloaded content and transcripts are for personal use only
Support Creators: Consider supporting podcast creators through official channels
Platform Terms: Respect Xiaoyuzhou's terms of service
Rate Limiting: Avoid frequent bulk downloads to prevent server overload

Accuracy Considerations

ASR accuracy is ~90%+, but may vary with:
- Accents and dialects
- Background music or noise
- Multiple speakers (no speaker diarization)
- Technical terminology
Hotwords: Use --hotword parameter to improve specific term recognition
Review Recommended: Proofread critical content manually

Troubleshooting

Common Issues

1. MPS (Metal) not available

Ensure Mac M1/M2/M3 device
Update macOS to latest version
PyTorch 2.0+ required

2. Model download fails

Check internet connection
Verify sufficient disk space (~2GB)
Use ModelScope mirror if in China

3. Slow transcription

Check MPS is enabled: python3 -c "import torch; print(torch.backends.mps.is_available())"
Increase batch size if RAM allows
Close other resource-intensive applications

4. Poor accuracy on specific terms

Add hotwords: --hotword "term1 term2 term3"
Check audio quality

For detailed troubleshooting, see references/troubleshooting.md

Example Usage

User: 帮我下载并转录这个播客 https://www.xiaoyuzhoufm.com/episode/6942f3e852d4707aaa1feba3

Assistant: I'll help you download and transcribe this podcast. Let me start by running the installation check, then download and transcribe it.

[Runs install.sh]
[Runs download.sh with URL]
[Runs transcribe.py with audio file]
[Runs extract-info.sh to show summary]

[SUCCESS] Podcast downloaded and transcribed!

**Metadata:**
- Title: EP9 深度专访MIT博士"黑色面包"-我为什么重仓Fiserv (FISV)
- Host: 鹅先知 投资、出海和长寿科技
- Duration: 196:20
- Published: 2025-01-15
- Episode ID: 6942f3e852d4707aaa1feba3

**Files:**
- Final Document: ~/Research/Podcast/6942f3e852d4707aaa1feba3_鹅先知.../README.md

**Transcript Preview:**
大家好，欢迎收听本期节目。今天我们邀请了MIT博士...

The transcript is ready for analysis. Would you like me to summarize key points or search for specific topics?

References

Installation

Option 1: Use slash command in Claude Code

/install-skill https://github.com/zcker/xiaoyuzhou-podcast-skill

Option 2: Clone to skills directory

# Global (all projects)

git clone https://github.com/zcker/xiaoyuzhou-podcast-skill ~/.claude/skills/xiaoyuzhou-podcast-skill

# Project-specific

git clone https://github.com/zcker/xiaoyuzhou-podcast-skill .claude/skills/xiaoyuzhou-podcast-skill

Add MCP server to .cursor/mcp.json:

{
  "mcpServers": {
    "skillz": {
      "command": "npx",
      "args": ["-y", "skillz-mcp", "https://github.com/zcker/xiaoyuzhou-podcast-skill"]
    }
  }
}

Restart Cursor after adding the configuration.

Option 1: Use Gemini CLI command

gemini extensions install https://github.com/zcker/xiaoyuzhou-podcast-skill

Option 2: Clone to extensions directory

git clone https://github.com/zcker/xiaoyuzhou-podcast-skill ~/.gemini/extensions/xiaoyuzhou-podcast-skill

Topics

claude-skills xiaoyuzhoufm