🎨

音訊處理

357 skills in 內容與媒體 > 音訊處理

marketing-content-generation

Generate content drafts adapted to GTM motion. Use when creating blog posts, case studies, social posts, sales collateral, or app store copy. Requires brand-voice.md and positioning.md.

BellaBe/ideas-os

更新於 6d ago

podcast

Creates audio podcasts from text using browser text-to-speech. Use when user mentions podcast, audio conversation, dialogue, spoken content, voice narration, audio book, or text-to-speech generation. Supports multiple speakers with automatic language detection. Zero cost, no API keys, works in browser.

sgasser/claude-skill-podcast

更新於 6d ago

ai-multimodal

Process and generate multimedia content using Google Gemini API. Capabilities include analyze audio files (transcription with timestamps, summarization, speech understanding, music/sound analysis up to 9.5 hours), understand images (captioning, object detection, OCR, visual Q&A, segmentation), process videos (scene detection, Q&A, temporal analysis, YouTube URLs, up to 6 hours), extract from documents (PDF tables, forms, charts, diagrams, multi-page), generate images (text-to-image, editing, composition, refinement). Use when working with audio/video files, analyzing images or screenshots, processing PDF documents, extracting structured data from media, creating images from text prompts, or implementing multimodal AI features. Supports multiple models (Gemini 2.5/2.0) with context windows up to 2M tokens. | Sử dụng khi: AI, LLM, vision, embedding, phân tích hình ảnh, Gemini API.

wollfoo/setup-factory

更新於 6d ago

openai-api

Build with OpenAI's stateless APIs - Chat Completions (GPT-5, GPT-4o), Embeddings, Images (DALL-E 3), Audio (Whisper + TTS), and Moderation. Includes Node.js SDK and fetch-based approaches for Cloudflare Workers.Use when: implementing chat completions with GPT-5/GPT-4o, streaming responses with SSE, using function calling/tools, creating structured outputs with JSON schemas, generating embeddings for RAG (text-embedding-3-small/large), generating images with DALL-E 3, editing images with GPT-Image-1, transcribing audio with Whisper, synthesizing speech with TTS (11 voices), moderating content (11 safety categories), or troubleshooting rate limits (429), invalid API keys (401), function calling failures, streaming parse errors, embeddings dimension mismatches, or token limit exceeded.

ovachiever/droid-tings

更新於 6d ago

ai-transcript-analyzer

Analyze transcript files using OpenAI API (gpt-5-mini) to extract insights, summaries, key topics, quotes, and action items. This skill should be used when users have transcript files (from WhisperKit, YouTube, podcasts, meetings, etc.) and want AI-powered analysis, summaries, or custom insights extracted from the content. Supports both default comprehensive analysis and custom prompts for specific information extraction.

buddyh/claude-code-skills

更新於 6d ago

ai-multimodal

Process and generate multimedia content using Google Gemini API for better vision capabilities. Capabilities include analyze audio files (transcription with timestamps, summarization, speech understanding, music/sound analysis up to 9.5 hours), understand images (better image analysis than Claude models, captioning, reasoning, object detection, design extraction, OCR, visual Q&A, segmentation, handle multiple images), process videos (scene detection, Q&A, temporal analysis, YouTube URLs, up to 6 hours), extract from documents (PDF tables, forms, charts, diagrams, multi-page), generate images (text-to-image with Imagen 4, editing, composition, refinement), generate videos (text-to-video with Veo 3, 8-second clips with native audio). Use when working with audio/video files, analyzing images or screenshots (instead of default vision capabilities of Claude, only fallback to Claude's vision capabilities if needed), processing PDF documents, extracting structured data from media, creating images/videos from text pr

tkhieu/peraichi-coding-agent-starter-kit

更新於 6d ago

esphome-box3-builder

Marketplace

This skill should be used when the user asks to "configure esp32-s3-box-3", "set up box-3", "create box-3 voice assistant", "display lambda on box-3", "configure ili9xxx display", "set up gt911 touch", "configure i2s audio", "es7210 microphone", "es8311 speaker", "box-3 audio pipeline", or mentions error messages like "I2S DMA buffer error", "Touch not responding", "Display flicker", "Audio popping", "PSRAM not detected". Provides complete ESP32-S3-BOX-3 hardware templates, display lambda cookbook, touch patterns, and voice assistant configurations.

nodnarbnitram/claude-code-extensions

更新於 6d ago

analysis-logic-trace

Marketplace

Validate inference chains step-by-step by examining whether each logical connection from premise to conclusion is sound, making implicit reasoning steps explicit and checking for gaps or leaps. Use when: (1) asked to validate reasoning steps, trace the logic, or verify if conclusions follow from premises, (2) arguments skip intermediate inferential steps or use 'therefore' without showing the reasoning path, (3) evaluating multi-step proofs, mathematical reasoning, or decision frameworks where each step builds on previous ones, (4) reasoning depends on unstated assumptions being treated as established facts.

synapseradio/thinkies

更新於 6d ago

sound-engineer

Expert in spatial audio, procedural sound design, game audio middleware, and app UX sound design. Specializes in HRTF/Ambisonics, Wwise/FMOD integration, UI sound design, and adaptive music systems. Activate on 'spatial audio', 'HRTF', 'binaural', 'Wwise', 'FMOD', 'procedural sound', 'footstep system', 'adaptive music', 'UI sounds', 'notification audio', 'sonic branding'. NOT for music composition/production (use DAW), audio post-production for film (linear media), voice cloning/TTS (use voice-audio-engineer), podcast editing (use standard audio editors), or hardware design.

erichowens/some_claude_skills

更新於 6d ago

narrative-voice

Marketplace

Find and maintain consistent authorial voice across different contexts. Use when: (1) asked to develop distinctive voice or style, (2) different sections sound like different authors, (3) writing extended content like blogs or documentation, (4) unifying multiple pieces under common identity, (5) professional writing sounds generic.

synapseradio/thinkies

更新於 6d ago

synthesisgrounded-audio-brief

Produce grounded audio briefs by chaining source-scoped input, citation verification, dialogue dramatization, and multi-speaker TTS orchestration. Use for “Audio Overview” style outputs.

Cloudhabil/AGI-Server

更新於 6d ago

biblical-accuracy

Comprehensive biblical accuracy verification for sermons, teachings, and theological content aligned with United Church of God theology. Validates scripture references, quotations, contextual integrity, theological soundness per UCG doctrine, and performs deep linguistic analysis of Greek and Hebrew original language texts to ensure fidelity to biblical meaning. Use when writing or reviewing any biblical, theological, or sermon content.

williacj/claude-skills

更新於 6d ago

hooks-builder

Creates and configures Claude Code hooks for lifecycle automation. Use when implementing PreToolUse validation, PostToolUse formatting, PermissionRequest auto-approve, custom notifications, session management, or deterministic agent control.

bsamiee/Parametric_Portal

更新於 6d ago

voice-memos

Process voice memos with AI transcription and analysis. Multi-language support (EN, HE), speaker identification, action item extraction with priorities, smart summaries, and auto-categorization (meeting, journal, brainstorm, interview). Triggers - "process voice memos", "transcribe", "analyze memo", "show transcripts", "voice inbox", "extract action items", "meeting notes", "transcribe audio".

gsannikov/Exocortex

更新於 6d ago

daw-music

Marketplace

Digital Audio Workstation usage, music composition, interactive music systems,and game audio implementation for immersive soundscapes.

pluginagentmarketplace/custom-plugin-game-developer

更新於 6d ago

audio-converter

Convert audio files between formats (MP3, WAV, FLAC, OGG, M4A) with bitrate and sample rate control. Batch processing supported.

dkyazzentwatwa/chatgpt-skills

更新於 6d ago

transformers

Marketplace

Loading and using pretrained models with Hugging Face Transformers. Use when working with pretrained models from the Hub, running inference with Pipeline API, fine-tuning models with Trainer, or handling text, vision, audio, and multimodal tasks.

itsmostafa/llm-engineering-skills

更新於 6d ago

voice-audio-engineer

Expert in voice synthesis, TTS, voice cloning, podcast production, speech processing, and voice UI design via ElevenLabs integration. Specializes in vocal clarity, loudness standards (LUFS), de-essing, dialogue mixing, and voice transformation. Activate on 'TTS', 'text-to-speech', 'voice clone', 'voice synthesis', 'ElevenLabs', 'podcast', 'voice recording', 'speech-to-speech', 'voice UI', 'audiobook', 'dialogue'. NOT for spatial audio (use sound-engineer), music production (use DAW tools), game audio middleware (use sound-engineer), sound effects generation (use sound-engineer with ElevenLabs SFX), or live concert audio.

erichowens/some_claude_skills

更新於 6d ago

custom-plugin-flutter-skill-accessibility

Marketplace

Production-grade Flutter accessibility mastery - Semantics API, screen readers (VoiceOver/TalkBack), WCAG 2.1 AA/AAA compliance, inclusive design patterns, automated a11y testing with comprehensive code examples

pluginagentmarketplace/custom-plugin-flutter

更新於 6d ago

brand-identity

Create or update comprehensive brand identity including strategy, visual design, and voice

chufeng-huang-sipaway/sip-videogen

更新於 6d ago