prompt-benchmark

Systematic prompt evaluation framework with MATH, GSM8K, and Game of 24 benchmarks. Use when evaluating prompt effectiveness on standard benchmarks, comparing meta-prompting strategies quantitatively, measuring prompt quality improvements, or validating categorical prompt optimizations against ground truth datasets.

$ 설치

git clone https://github.com/manutej/categorical-meta-prompting /tmp/categorical-meta-prompting && cp -r /tmp/categorical-meta-prompting/.claude/skills/prompt-benchmark ~/.claude/skills/categorical-meta-prompting

// tip: Run this command in your terminal to install the skill