Unnamed Skill
Monitor and optimize CPU/GPU usage with load measurement and cost-effectivevalidation strategies.Triggers: CPU usage, GPU usage, performance, load monitoring, build performance,training, resource consumption, test suite, compilationUse when: session starts (auto-load with token-conservation), planning buildsor training that could pin CPUs/GPUs for >1 minute, retrying failed resource-heavy commandsDO NOT use when: simple operations with no resource impact.DO NOT use when: quick single-file operations.Use this skill BEFORE resource-intensive operations. Establish baselines proactively.
$ 설치
git clone https://github.com/athola/claude-night-market /tmp/claude-night-market && cp -r /tmp/claude-night-market/plugins/conserve/skills/performance-monitoring/cpu-gpu-performance ~/.claude/skills/claude-night-market// tip: Run this command in your terminal to install the skill
name: cpu-gpu-performance description: |
Triggers: performance Monitor and optimize CPU/GPU usage with load measurement and cost-effective validation strategies.
Triggers: CPU usage, GPU usage, performance, load monitoring, build performance, training, resource consumption, test suite, compilation
Use when: session starts (auto-load with token-conservation), planning builds or training that could pin CPUs/GPUs for >1 minute, retrying failed resource-heavy commands
DO NOT use when: simple operations with no resource impact. DO NOT use when: quick single-file operations.
Use this skill BEFORE resource-intensive operations. Establish baselines proactively. location: plugin token_budget: 400 progressive_loading: true dependencies: hub: [token-conservation] modules: []
Table of Contents
- When to Use
- Required TodoWrite Items
- Step 1 – Establish Current Baseline (
baseline) - Step 2 – Narrow the Scope (
scope) - Step 3 – Instrument Before You Optimize (
instrument) - Step 4 – Throttle and Sequence Work (
throttle) - Step 5 – Log Decisions + Next Steps (
log) - Output Expectations
CPU/GPU Performance Discipline
When to Use
- At the beginning of every session (auto-load alongside
token-conservation). - Whenever you plan to build, train, or test anything that could pin CPU cores or GPUs for more than a minute.
- Before retrying a failing command that previously consumed significant resources.
Required TodoWrite Items
cpu-gpu-performance:baselinecpu-gpu-performance:scopecpu-gpu-performance:instrumentcpu-gpu-performance:throttlecpu-gpu-performance:log
Step 1 – Establish Current Baseline (baseline)
-
Capture current utilization:
uptimeps -eo pcpu,cmd | headnvidia-smi --query-gpu=utilization.gpu,memory.used --format=csv
Note which hosts/GPUs are already busy.
-
Record any CI/cluster budgets (time quotas, GPU hours) before launching work.
-
Set a per-task CPU minute / GPU minute budget that respects those limits.
Step 2 – Narrow the Scope (scope)
- Avoid running "whole world" jobs after a small fix. Prefer diff-based
or tag-based selective testing:
pytest -k- Bazel target patterns
cargo test <module>
- Batch low-level fixes so you can validate multiple changes with a single targeted command.
- For GPU jobs, favor unit-scale smoke inputs or lower epoch counts before scheduling the full training/eval sweep.
Step 3 – Instrument Before You Optimize (instrument)
- Pick the right profiler/monitor:
- CPU work:
perfintel vtunecargo flamegraph- language-specific profilers
- GPU work:
nvidia-smi dmonnsysnvprof- DLProf
- framework timeline tracers
- CPU work:
- Capture kernel/ops timelines, memory footprints, and data pipeline latency so you have evidence when throttling or parallelizing.
- Record hot paths + I/O bottlenecks in notes so future reruns can jump straight to the culprit.
Step 4 – Throttle and Sequence Work (throttle)
- Use
nice,ionice, or Kubernetes/Slurm quotas to prevent starvation of shared nodes. - Chain heavy tasks with guardrails:
- Rerun only the failed test/module
- Then (optionally) escalate to the next-wider shard
- Reserve the full suite for the final gate
- Stagger GPU kernels (smaller batch sizes or gradient accumulation) when memory pressure risks eviction; prefer checkpoint/restore over restarts.
Step 5 – Log Decisions + Next Steps (log)
Conclude by documenting the commands that were run and their resource cost (duration, CPU%, GPU%), confirming whether they remained within the per-task budget. If a full suite or long training run was necessary, justify why selective or staged approaches were not feasible. Capture any follow-up tasks, such as adding a new test marker or profiling documentation, to streamline future sessions.
Output Expectations
- Brief summary covering:
- baseline metrics
- scope chosen
- instrumentation captured
- throttling tactics
- follow-up items
- Concrete example(s) of what ran (e.g.):
- "reran
pytest tests/test_orders.py -k test_refundinstead ofpytest -m slow" - "profiled
nvidia-smi dmonoutput to prove GPU idle time before scaling"
- "reran
Troubleshooting
Common Issues
Command not found Ensure all dependencies are installed and in PATH
Permission errors Check file permissions and run with appropriate privileges
Unexpected behavior
Enable verbose logging with --verbose flag
Repository
