accelerator-research-agent

RS42-AI/scout-ai-lite

Research accelerator portfolio companies using Firecrawl and Tavily MCPs. Generates structured CSV and markdown reports with systematic impact scoring. Optimized for token efficiency.

0 stars
0 forks
5 views

SKILL.md


name: accelerator-research-agent description: Research accelerator portfolio companies using Firecrawl and Tavily MCPs. Generates structured CSV and markdown reports with systematic impact scoring. Optimized for token efficiency.

Accelerator Research Agent

A token-optimized Claude Desktop skill for researching accelerator portfolio companies with systematic impact analysis.

When to Use This Skill

Activate when user asks to:

  • "Research companies from [accelerator name]"
  • "Analyze [accelerator] portfolio"
  • "Score companies for impact" or "evaluate mission alignment"
  • Mentions: YC, Techstars, Fast Forward, 500 Global, a16z

Prerequisites

Required MCP Servers (both tested and validated):

  1. Firecrawl MCP - Structured extraction

    • Free tier: 500 credits/month
    • Use firecrawl_extract for JSON extraction
  2. Tavily MCP - AI-optimized search

    • Free tier: 100 RPM (6,000/hour)
    • Use tavily-search for company research

Core Workflow (3 Phases)

Phase 1: Portfolio Extraction

Goal: Get company list from accelerator portfolio page

Tool: firecrawl_extract (PRIMARY - 100% success rate)

Schema Pattern: See SCHEMA-TEMPLATES.md for tested schemas (YC, Fast Forward, Healthcare, Climate, Fintech)

Quick Schema (customize based on accelerator):

{
  "name": "mcp__MCP_DOCKER__firecrawl_extract",
  "arguments": {
    "urls": ["PORTFOLIO_URL"],
    "prompt": "Extract all portfolio companies including name, website, description, industry",
    "schema": {
      "type": "object",
      "properties": {
        "companies": {
          "type": "array",
          "items": {
            "type": "object",
            "properties": {
              "name": {"type": "string"},
              "website": {"type": "string"},
              "description": {"type": "string"},
              "industry": {"type": "string"}
            },
            "required": ["name"]
          }
        }
      },
      "required": ["companies"]
    }
  }
}

Token Optimization:

  • Only require "name" field
  • Use string types for all fields (more flexible)
  • Add "maxAge": 604800000 for caching (7 days)

If Extract Fails - Use fallback:

{
  "name": "mcp__MCP_DOCKER__firecrawl_scrape",
  "arguments": {
    "url": "PORTFOLIO_URL",
    "formats": ["markdown"]
  }
}

Then manually parse the markdown.

Phase 2: Company Research

Goal: Research each company using web search

Tool: tavily-search with token-efficient parameters

CRITICAL - Token Optimization:

{
  "name": "mcp__MCP_DOCKER__tavily-search",
  "arguments": {
    "query": "[company name] mission target market",
    "max_results": 3,                    // ✅ NOT 10! Saves 70% tokens
    "search_depth": "basic",             // ✅ NOT "advanced"! Faster
    "include_raw_content": false         // ✅ Critical - saves massive tokens
  }
}

Batch Processing (IMPORTANT):

  • Research 3-5 companies at a time (not 10-20)
  • Generate incremental reports to avoid token limits

Research Query Pattern:

"[Company Name] mission target market product"

Extract from Results:

  • Founder names
  • Mission/tagline
  • Target market demographic
  • Product/service description
  • Key metrics (users, funding, team size)

Phase 3: Impact Scoring

Goal: Score companies using 5-tier rubric

5-Tier Impact Rubric (Customizable):

⭐⭐⭐⭐⭐ Tier 1 - Direct Impact

  • Primary target: Underserved populations
  • Core product addresses fundamental challenges
  • Impact central to business model

⭐⭐⭐⭐ Tier 2 - Strong Alignment

  • Significant focus on underserved
  • Clear pathway to reach target communities
  • Impact is key differentiator

⭐⭐⭐ Tier 3 - Moderate Alignment

  • Serves underserved as secondary market
  • Impact through indirect channels
  • Mixed revenue model

⭐⭐ Tier 4 - Weak Alignment

  • Minimal underserved focus
  • Impact is incidental or aspirational
  • Primarily serves mainstream markets

⭐ Tier 5 - Minimal Alignment

  • No focus on underserved
  • Luxury/premium positioning
  • Opposite of mission

Customization Examples:

  • Climate Tech: Direct emissions reduction → Greenwashing
  • Healthcare: Medicaid focus → Luxury medicine
  • Fintech: Unbanked → High-net-worth

Phase 4: Report Generation

CSV Format (Excel/Sheets compatible):

Company Name,Website,Description,Industry,Impact Tier,Impact Reasoning,Founder,Funding

Markdown Format:

# [Accelerator] Portfolio Research Report

## Executive Summary
- Total companies researched: X
- Impact distribution: Tier 1 (X), Tier 2 (X), etc.

## High-Impact Companies (Tier 1-2)

### Company Name
- **Website**: [URL]
- **Impact Tier**: ⭐⭐⭐⭐⭐
- **Mission**: [Brief mission]
- **Target Market**: [Demographics]
- **Why High Impact**: [Reasoning]
- **Metrics**: [Users, funding, etc.]

[Repeat for each high-impact company]

## Moderate Impact Companies (Tier 3)
[Summarized list]

## Lower Priority Companies (Tier 4-5)
[Brief list]

Token Management Best Practices

Critical for Avoiding Limits:

  1. Batch Processing: Research 3-5 companies at a time
  2. Tavily Parameters:
    • max_results: 3 (not 10)
    • search_depth: "basic" (not "advanced")
    • include_raw_content: false (saves massive tokens)
  3. Incremental Reports: Generate partial results, then continue
  4. Schema Efficiency: Only require essential fields
  5. Caching: Use maxAge parameter for portfolio pages

Common Scenarios

Scenario 1: YC Research

User: "Research 10 YC W25 climate tech companies"

Steps:
1. Extract YC W25 companies (firecrawl_extract + YC schema)
2. Filter to climate tech vertical (JSON filtering)
3. Research FIRST 5 companies (tavily-search, max_results=3)
4. Score and generate partial report
5. Research NEXT 5 companies (new batch)
6. Append to report

Scenario 2: Fast Forward Impact

User: "Score Fast Forward portfolio for low-income US impact"

Steps:
1. Extract Fast Forward companies (firecrawl_extract)
2. Research in batches of 3 (tavily-search)
3. Apply low-income US impact rubric
4. Generate CSV + markdown report

Scenario 3: Healthcare Medicaid

User: "Find healthcare startups serving Medicaid populations"

Steps:
1. Extract with healthcare vertical schema (see SCHEMA-TEMPLATES.md)
2. Research with query: "[company] Medicaid low-income healthcare"
3. Filter to Medicaid focus
4. Score using healthcare impact rubric

Troubleshooting

Token Limit Hit:

  • Reduce batch size to 3 companies
  • Use search_depth: "basic"
  • Set include_raw_content: false
  • Generate incremental reports

Extract Returns Empty:

  • Check SCHEMA-TEMPLATES.md for validated schemas
  • Improve prompt specificity
  • Try fallback to firecrawl_scrape

Search Returns Poor Results:

  • Refine query: "[company name] mission target market"
  • Reduce max_results to 3
  • Try alternative search: "[company name] about"

Files Reference

  • SCHEMA-TEMPLATES.md: Production-tested extraction schemas
  • README.md: Setup instructions and MCP configuration

Output Deliverables

This skill generates ONLY research outputs:

  • ✅ CSV file with all company data
  • ✅ Markdown report with analysis

This skill does NOT:

  • ❌ Create Linear/project tracking issues
  • ❌ Integrate with CRM systems
  • ❌ Send notifications

Use separate skills for pipeline management if needed.


Version: 2.1 (Token-Optimized) | Testing: Validated on YC, Fast Forward

README

Accelerator Research Agent

Production-ready Claude Desktop skill for researching accelerator portfolio companies using validated MCP tools.

🎯 What This Skill Does

Systematically research and analyze accelerator companies with AI-powered structured extraction:

  • Validated Tech Stack: Firecrawl Extract (100% success rate) + Tavily Search (90%+ enrichment)
  • Structured Extraction: JSON schemas tested on Y Combinator (20/20) and Fast Forward (5/5)
  • Impact Methodology: 5-tier rubric for evaluating mission alignment (customizable for any thesis)
  • Professional Outputs: CSV exports + markdown reports with documented scoring
  • Research-Focused: Pure research workflow - does NOT create tracking issues

🚀 Quick Start

1. Install MCP Servers

This skill requires 2 MCP servers configured in Claude Desktop:

Required (Both validated in live testing):

  • Firecrawl MCP - Structured extraction (firecrawl.dev)

    • Tested: 100% success rate on Y Combinator, Fast Forward
    • Free tier: 500 credits/month
  • Tavily MCP - AI-optimized search (tavily.com)

    • Tested: 90%+ success on company enrichment
    • Free tier: 100 RPM (6,000/hour)

NOT Recommended:

  • ❌ Coresignal - Too expensive, doesn't cover new startups

📖 Setup instructions: Add to ~/Library/Application Support/Claude/claude_desktop_config.json

2. Upload Skill to Claude Desktop

  1. Download this repository as ZIP
  2. Extract files
  3. Copy SKILL.md or reference it in your Claude Desktop workflow

3. Try It Out

Test Setup First (copy from SKILL.md):

{
  "name": "mcp__MCP_DOCKER__firecrawl_extract",
  "arguments": {
    "urls": ["https://www.ffwd.org/directory?portfolio=true"],
    "prompt": "Extract first 3 companies",
    "schema": { /* see SKILL.md for schema */ }
  }
}

Then Request Research:

"Research 10 companies from YC W25 focused on climate tech"

Claude will:

  1. Extract YC W25 companies using firecrawl_extract with validated schema
  2. Filter to 10 climate tech companies from JSON output
  3. Research each company using tavily-search (batch of 10)
  4. Score using climate tech impact rubric
  5. Generate CSV + markdown report

📊 What You Get

Output Files:

  • CSV: All company data (Excel/Sheets compatible)
  • Markdown: Detailed research report with analysis

No Tracking Issues: This skill does NOT create Linear/project issues. Use a separate Linear skill for pipeline management.

💡 Example Use Cases

Impact Investor

"Research Fast Forward portfolio and score for low-income US impact"

Climate Tech VC

"Find climate tech companies from Techstars 2024 and evaluate carbon impact"

Healthcare Focus

"Research YC healthcare companies serving Medicaid populations"

🔧 Features

Validated MCP Tools

  • Firecrawl Extract (PRIMARY): Structured JSON extraction with 100% success rate
    • Tested on Y Combinator (20/20 companies)
    • Tested on Fast Forward (5/5 companies)
    • Same cost as scrape, better output
  • Tavily Search: AI-optimized company research (90%+ success rate)
  • Fallback Tools: firecrawl_scrape, tavily-extract if needed

Production-Ready Schemas

  • Ready-to-use templates in SCHEMA-TEMPLATES.md
  • Validated patterns for YC, Fast Forward, Techstars
  • Vertical-specific schemas (healthcare, climate, fintech)
  • Comprehensive error handling

Impact Methodology

  • 5-tier rubric (Direct → Minimal alignment)
  • Systematic evaluation framework
  • Customizable for different theses
  • Documented reasoning per company

Output Formats

  • CSV with all research data (Excel/Sheets compatible)
  • Markdown detailed reports
  • Impact distribution analysis

📝 Workflow (3 Phases)

Phase 1: Portfolio Scraping

PRIMARY: Use firecrawl_extract with JSON schema (100% success rate)

  • Returns structured JSON directly (no parsing needed)
  • Validated schemas in SCHEMA-TEMPLATES.md
  • FALLBACK: Use firecrawl_scrape if extract fails

Phase 2: Company Research

Deep research using tavily-search (90%+ success rate)

  • Founder information
  • Mission and target market
  • Key metrics (users, funding, employees)
  • Batch processing (10 companies at a time)

Phase 3: Impact Scoring & Reports

  • Apply 5-tier rubric with documented reasoning
  • Generate CSV (Excel/Sheets compatible)
  • Generate markdown report with analysis

🎨 Customization

Adapt Impact Rubric

Default: Low-income US impact

Easily adapt for:

  • Climate Tech: Emissions reduction → Greenwashing
  • Healthcare: Medicaid focus → Luxury medicine
  • Financial Inclusion: Unbanked → High-net-worth

See references/impact-scoring.md for guide.

🔒 Prerequisites

Required

  • Claude Desktop installed
  • Docker Desktop running
  • API keys for Tavily + Firecrawl
  • (Optional) Coresignal API key

💰 API Costs

Free Tier Research (0-500 companies/month)

  • Firecrawl: 500 credits free
  • Tavily: 100 RPM free (6,000/hour)
  • Cost: $0/month
  • Sufficient for: 10-20 accelerator portfolios

Paid Tier Research (500+ companies/month)

  • Firecrawl Starter: $30/month (5,000 credits)
  • Tavily Pro: $50/month (unlimited within higher rate limits)
  • Cost: $80/month total
  • Sufficient for: 100+ accelerator portfolios

Performance Benchmarks (from live testing):

  • 10 companies: 2-3 minutes, $0 (free tier)
  • 50 companies: 10-15 minutes, $5-10 (paid tier)
  • 100+ companies: 30-45 minutes, $10-20 (paid tier)

📚 Documentation

  • SKILL.md - Complete skill guide (1,145 lines)

    • Phase 1A: firecrawl_extract with validated schemas ⭐ PRIMARY
    • Phase 2: tavily-search research patterns
    • Phase 3: Impact scoring & report generation
    • Tool-specific best practices
    • Comprehensive troubleshooting (8 common issues)
  • SCHEMA-TEMPLATES.md - Ready-to-use extraction schemas

    • Y Combinator schema (tested 20/20)
    • Fast Forward schema (tested 5/5)
    • Healthcare, climate, fintech vertical schemas
    • Schema design guidelines (5 golden rules)
    • Common errors & fixes

🛠️ Troubleshooting

Extract returns empty array {"companies": []}

  1. Improve prompt: "Extract all portfolio companies including name, website, description..."
  2. Simplify schema: Only require "name" field
  3. Add waitFor: 5000 for JavaScript pages
  4. Fallback to firecrawl_scrape

"MCP server not found"

  • Check ~/Library/Application Support/Claude/claude_desktop_config.json
  • Restart Claude Desktop completely
  • Verify Docker Desktop is running
  • Test with simple query: mcp__MCP_DOCKER__firecrawl_extract on example.com

Full troubleshooting guide: See SKILL.md Section 10 (8 common issues with solutions)

📁 File Structure

sourcing-agent-demo-skill/
├── SKILL.md                    # Main skill guide (1,145 lines)
├── SCHEMA-TEMPLATES.md         # Production-tested extraction schemas
└── README.md                   # This file - project overview

Key Sections in SKILL.md:

  • Phase 1A: firecrawl_extract (PRIMARY) - Lines 53-170
  • Phase 2: tavily-search research - Lines 232-284
  • Phase 3: Impact scoring - Lines 286-417
  • Example scenarios with extract - Lines 422-571
  • Troubleshooting (8 issues) - Lines 761-1001
  • Validation checklist - Lines 1034-1089

🔗 Next Steps

After running research:

  1. Review CSV/Markdown reports
  2. Import to your preferred system:
    • Airtable/Google Sheets for database view
    • Use separate Linear skill for tracking (if needed)
    • CRM integration for deal pipeline
  3. Customize impact rubric for your specific thesis
  4. Scale to multiple accelerators using batch processing

🏆 Why This Skill is Production-Ready

Validated by Live Testing:

  • firecrawl_extract: 100% success (Y Combinator 20/20, Fast Forward 5/5)
  • tavily-search: 90%+ success on enrichment queries
  • ✅ Free tier sufficient for 10-20 portfolios/month
  • ✅ Comprehensive error handling for common issues
  • ✅ Tested on real accelerators (not just examples)

Based on Obsidian Testing Notes:

  • "Firecrawl MCP Comprehensive Capability Assessment.md"
  • "Tavily MCP Comprehensive Capability Assessment.md"
  • "MCP Pairing Analysis - Accelerator Scout Optimal Stack.md"

📝 Version History

v2.0 - Production-Ready (Current)

  • ✅ Rewrote Phase 1 to use firecrawl_extract as primary (100% success rate)
  • ✅ Added SCHEMA-TEMPLATES.md with validated patterns
  • ✅ Comprehensive troubleshooting for extract function (8 common issues)
  • ✅ Decision trees for tool selection
  • ✅ Removed Coresignal (too expensive, doesn't cover new startups)

v1.0 - Initial (Deprecated)

  • Used firecrawl_scrape (required manual parsing)
  • No validated schemas
  • Generic error handling

🔗 Resources


Status: ✅ Production-Ready | Testing: Validated on YC, Fast Forward | Version: 2.0