ai-evaluation-suite
Comprehensive AI/LLM evaluation toolkit for production AI systems. Covers LLM output quality, prompt engineering, RAG evaluation, agent performance, hallucination detection, bias assessment, cost/token optimization, latency metrics, model comparison, and fine-tuning evaluation. Includes BLEU/ROUGE metrics, perplexity, F1 scores, LLM-as-judge patterns, and benchmarks like MMLU and HumanEval.
$ 설치
git clone https://github.com/majiayu000/claude-skill-registry /tmp/claude-skill-registry && cp -r /tmp/claude-skill-registry/skills/data/ai-evaluation-suite ~/.claude/skills/claude-skill-registry// tip: Run this command in your terminal to install the skill
Repository

majiayu000
Author
majiayu000/claude-skill-registry/skills/data/ai-evaluation-suite
0
Stars
0
Forks
Updated1w ago
Added1w ago