Fixing Bugs Systematically
Diagnose and fix bugs through systematic investigation, root cause analysis, and targeted validation. Use when something is broken, errors occur, performance degrades, or unexpected behavior manifests.
$ 安裝
git clone https://github.com/CaptainCrouton89/.claude /tmp/.claude && cp -r /tmp/.claude/skills.archive/bug-fixing-protocol ~/.claude/skills/-claude// tip: Run this command in your terminal to install the skill
name: Fixing Bugs Systematically description: Diagnose and fix bugs through systematic investigation, root cause analysis, and targeted validation. Use when something is broken, errors occur, performance degrades, or unexpected behavior manifests.
Fixing Bugs Systematically
Structured protocol for isolating root causes and implementing focused fixes in existing features.
When to Use
- Something is broken and needs diagnosis and repair
- Error messages or unexpected behavior occurs
- Performance degradation in existing functionality
- Intermittent or hard-to-reproduce issues
Core Steps
1. Context & Reproduction
Read relevant documentation:
docs/feature-spec/F-##-*.mdfor affected featuredocs/user-stories/US-###-*.mdfor expected behavior and acceptance criteriadocs/api-contracts.yamlif API-relateddocs/system-design.mdfor architecture context
Document the bug:
- Expected behavior (cite story AC or spec)
- Actual behavior (what's broken)
- Reproduction steps
- Feature ID (F-##) and Story ID (US-###) if known
2. Investigation
Simple bugs (obvious entry point)
Use direct investigation:
- Grep to locate error messages or related code
- Read suspected files to examine implementation
- Trace function calls and data transformations
- Check related files for connected logic
Complex bugs (multiple subsystems or unclear origin)
Delegate to async agents in parallel:
Spawn senior-engineer agents to:
- Trace error flow through specific subsystem
- Analyze related failure patterns
- Investigate runtime conditions
Spawn Explore agents to:
- Map data flow across multiple files
- Find all error handling for specific operation
- Locate configuration and integration points
Example: For authentication bug, spawn:
- Agent 1: "Trace auth flow from login endpoint to session creation"
- Agent 2: "Find all error handling and validation in auth module"
- Agent 3: "Locate session storage config and related code"
Wait for results using ./agent-responses/await {agent_id}
3. Root Cause Analysis
Generate hypotheses:
- List 3-8 potential root causes from investigation
- Rank by probability (evidence from code) and impact
- Select most likely cause(s)
Decision point:
- Fix immediately if root cause is obvious and confirmed
- Add validation if multiple plausible causes or runtime-dependent behavior
4. Validation (if needed)
Add minimal debugging:
- Logging at decision points
- Data inspection at boundaries
- Input/output logging at integration points
Test to confirm root cause before proceeding to fix.
5. Implementation
Fix the confirmed root cause:
- Keep changes minimal and focused
- Maintain API stability unless approved
- Follow existing patterns in codebase
Update documentation if needed:
- Add note in feature spec or changelog
- Update
docs/api-contracts.yamlif contract changed (requires approval) - For slash commands:
/manage-project/update/update-featureto correct spec/manage-project/update/update-storyif ACs were ambiguous/manage-project/update/update-apiif API changed (with approval)
6. Validation & Testing
Verify fix against acceptance criteria:
- Test all ACs from affected user stories
- Check 1-2 key edge cases and error states
- Run contract tests if API changed
- Verify events in
docs/data-plan.mdstill fire correctly
7. Cleanup
- Remove all debugging and logging code
- Verify no temporary files remain
Investigation Strategy
For direct investigation:
- Use grep, read_file to understand subsystem
- Trace flows manually through related files
- Focus on specific area where bug manifests
When to validate before fixing:
- Multiple plausible root causes exist
- Runtime-dependent behavior
- Intermittent or hard-to-reproduce issues
For async investigation:
- Each agent investigates independent subsystem
- Run in parallel for speed
- Maximum 6 agents (diminishing returns)
Artifacts
Inputs:
docs/feature-spec/F-##-*.md— Feature specsdocs/user-stories/US-###-*.md— Expected behavior and ACsdocs/api-contracts.yaml— API specsdocs/system-design.md— Architecture context
Outputs:
- Investigation findings (inline notes or agent reports)
- Updated feature spec with bug resolution notes
- Fixed code with accompanying tests
Quick Reference
| Scenario | Approach |
|---|---|
| Single subsystem, obvious entry | Direct investigation → immediate fix |
| Multiple subsystems, unclear origin | Spawn 2-4 agents in parallel → synthesize findings → fix |
| Runtime-dependent or intermittent | Add targeted logging → reproduce → analyze logs → fix |
| Multiple independent fixes needed | Pass investigation results to fix agents via artifact files |
Repository
