ai-llm-inference

Operational patterns for LLM inference (recent advances): vLLM with 24x throughput gains, FP8/FP4 quantization (30-50% cost reduction), FlashInfer kernels, advanced fusions, PagedAttention, continuous batching, model compression, speculative decoding, and GPU/CPU scheduling. Emphasizes production-ready performance and cost optimization.

$ インストール

git clone https://github.com/vasilyu1983/AI-Agents-public /tmp/AI-Agents-public && cp -r /tmp/AI-Agents-public/frameworks/claude-code-kit/framework/skills/ai-llm-inference ~/.claude/skills/AI-Agents-public

// tip: Run this command in your terminal to install the skill