ai-llm-inference
Operational patterns for LLM inference (recent advances): vLLM with 24x throughput gains, FP8/FP4 quantization (30-50% cost reduction), FlashInfer kernels, advanced fusions, PagedAttention, continuous batching, model compression, speculative decoding, and GPU/CPU scheduling. Emphasizes production-ready performance and cost optimization.
$ 安裝
git clone https://github.com/vasilyu1983/AI-Agents-public /tmp/AI-Agents-public && cp -r /tmp/AI-Agents-public/frameworks/claude-code-kit/framework/skills/ai-llm-inference ~/.claude/skills/AI-Agents-public// tip: Run this command in your terminal to install the skill
Repository

vasilyu1983
Author
vasilyu1983/AI-Agents-public/frameworks/claude-code-kit/framework/skills/ai-llm-inference
21
Stars
6
Forks
Updated6d ago
Added6d ago