accelerator-research-agent

RS42-AI/scout-ai-lite

Research accelerator portfolio companies using Firecrawl and Tavily MCPs. Generates structured CSV and markdown reports with systematic impact scoring. Optimized for token efficiency.

0 stars

0 forks

5 views

View on GitHub Add to Favorites

SKILL.md

name: accelerator-research-agent description: Research accelerator portfolio companies using Firecrawl and Tavily MCPs. Generates structured CSV and markdown reports with systematic impact scoring. Optimized for token efficiency.

Accelerator Research Agent

A token-optimized Claude Desktop skill for researching accelerator portfolio companies with systematic impact analysis.

When to Use This Skill

Activate when user asks to:

"Research companies from [accelerator name]"
"Analyze [accelerator] portfolio"
"Score companies for impact" or "evaluate mission alignment"
Mentions: YC, Techstars, Fast Forward, 500 Global, a16z

Prerequisites

Required MCP Servers (both tested and validated):

Firecrawl MCP - Structured extraction
- Free tier: 500 credits/month
- Use firecrawl_extract for JSON extraction
Tavily MCP - AI-optimized search
- Free tier: 100 RPM (6,000/hour)
- Use tavily-search for company research

Core Workflow (3 Phases)

Phase 1: Portfolio Extraction

Goal: Get company list from accelerator portfolio page

Tool: firecrawl_extract (PRIMARY - 100% success rate)

Schema Pattern: See SCHEMA-TEMPLATES.md for tested schemas (YC, Fast Forward, Healthcare, Climate, Fintech)

Quick Schema (customize based on accelerator):

{
  "name": "mcp__MCP_DOCKER__firecrawl_extract",
  "arguments": {
    "urls": ["PORTFOLIO_URL"],
    "prompt": "Extract all portfolio companies including name, website, description, industry",
    "schema": {
      "type": "object",
      "properties": {
        "companies": {
          "type": "array",
          "items": {
            "type": "object",
            "properties": {
              "name": {"type": "string"},
              "website": {"type": "string"},
              "description": {"type": "string"},
              "industry": {"type": "string"}
            },
            "required": ["name"]
          }
        }
      },
      "required": ["companies"]
    }
  }
}

Token Optimization:

Only require "name" field
Use string types for all fields (more flexible)
Add "maxAge": 604800000 for caching (7 days)

If Extract Fails - Use fallback:

{
  "name": "mcp__MCP_DOCKER__firecrawl_scrape",
  "arguments": {
    "url": "PORTFOLIO_URL",
    "formats": ["markdown"]
  }
}

Then manually parse the markdown.

Phase 2: Company Research

Goal: Research each company using web search

Tool: tavily-search with token-efficient parameters

CRITICAL - Token Optimization:

{
  "name": "mcp__MCP_DOCKER__tavily-search",
  "arguments": {
    "query": "[company name] mission target market",
    "max_results": 3,                    // ✅ NOT 10! Saves 70% tokens
    "search_depth": "basic",             // ✅ NOT "advanced"! Faster
    "include_raw_content": false         // ✅ Critical - saves massive tokens
  }
}

Batch Processing (IMPORTANT):

Research 3-5 companies at a time (not 10-20)
Generate incremental reports to avoid token limits

Research Query Pattern:

"[Company Name] mission target market product"

Extract from Results:

Founder names
Mission/tagline
Target market demographic
Product/service description
Key metrics (users, funding, team size)

Phase 3: Impact Scoring

Goal: Score companies using 5-tier rubric

5-Tier Impact Rubric (Customizable):

⭐⭐⭐⭐⭐ Tier 1 - Direct Impact

Primary target: Underserved populations
Core product addresses fundamental challenges
Impact central to business model

⭐⭐⭐⭐ Tier 2 - Strong Alignment

Significant focus on underserved
Clear pathway to reach target communities
Impact is key differentiator

⭐⭐⭐ Tier 3 - Moderate Alignment

Serves underserved as secondary market
Impact through indirect channels
Mixed revenue model

⭐⭐ Tier 4 - Weak Alignment

Minimal underserved focus
Impact is incidental or aspirational
Primarily serves mainstream markets

⭐ Tier 5 - Minimal Alignment

No focus on underserved
Luxury/premium positioning
Opposite of mission

Customization Examples:

Climate Tech: Direct emissions reduction → Greenwashing
Healthcare: Medicaid focus → Luxury medicine
Fintech: Unbanked → High-net-worth

Phase 4: Report Generation

CSV Format (Excel/Sheets compatible):

Company Name,Website,Description,Industry,Impact Tier,Impact Reasoning,Founder,Funding

Markdown Format:

# [Accelerator] Portfolio Research Report

## Executive Summary
- Total companies researched: X
- Impact distribution: Tier 1 (X), Tier 2 (X), etc.

## High-Impact Companies (Tier 1-2)

### Company Name
- **Website**: [URL]
- **Impact Tier**: ⭐⭐⭐⭐⭐
- **Mission**: [Brief mission]
- **Target Market**: [Demographics]
- **Why High Impact**: [Reasoning]
- **Metrics**: [Users, funding, etc.]

[Repeat for each high-impact company]

## Moderate Impact Companies (Tier 3)
[Summarized list]

## Lower Priority Companies (Tier 4-5)
[Brief list]

Token Management Best Practices

Critical for Avoiding Limits:

Batch Processing: Research 3-5 companies at a time
Tavily Parameters:
- max_results: 3 (not 10)
- search_depth: "basic" (not "advanced")
- include_raw_content: false (saves massive tokens)
Incremental Reports: Generate partial results, then continue
Schema Efficiency: Only require essential fields
Caching: Use maxAge parameter for portfolio pages

Common Scenarios

Scenario 1: YC Research

User: "Research 10 YC W25 climate tech companies"

Steps:
1. Extract YC W25 companies (firecrawl_extract + YC schema)
2. Filter to climate tech vertical (JSON filtering)
3. Research FIRST 5 companies (tavily-search, max_results=3)
4. Score and generate partial report
5. Research NEXT 5 companies (new batch)
6. Append to report

Scenario 2: Fast Forward Impact

User: "Score Fast Forward portfolio for low-income US impact"

Steps:
1. Extract Fast Forward companies (firecrawl_extract)
2. Research in batches of 3 (tavily-search)
3. Apply low-income US impact rubric
4. Generate CSV + markdown report

Scenario 3: Healthcare Medicaid

User: "Find healthcare startups serving Medicaid populations"

Steps:
1. Extract with healthcare vertical schema (see SCHEMA-TEMPLATES.md)
2. Research with query: "[company] Medicaid low-income healthcare"
3. Filter to Medicaid focus
4. Score using healthcare impact rubric

Troubleshooting

Token Limit Hit:

Reduce batch size to 3 companies
Use search_depth: "basic"
Set include_raw_content: false
Generate incremental reports

Extract Returns Empty:

Check SCHEMA-TEMPLATES.md for validated schemas
Improve prompt specificity
Try fallback to firecrawl_scrape

Search Returns Poor Results:

Refine query: "[company name] mission target market"
Reduce max_results to 3
Try alternative search: "[company name] about"

Files Reference

SCHEMA-TEMPLATES.md: Production-tested extraction schemas
README.md: Setup instructions and MCP configuration

Output Deliverables

This skill generates ONLY research outputs:

✅ CSV file with all company data
✅ Markdown report with analysis

This skill does NOT:

❌ Create Linear/project tracking issues
❌ Integrate with CRM systems
❌ Send notifications

Use separate skills for pipeline management if needed.

Version: 2.1 (Token-Optimized) | Testing: Validated on YC, Fast Forward

README

Accelerator Research Agent

Production-ready Claude Desktop skill for researching accelerator portfolio companies using validated MCP tools.

🎯 What This Skill Does

Systematically research and analyze accelerator companies with AI-powered structured extraction:

Validated Tech Stack: Firecrawl Extract (100% success rate) + Tavily Search (90%+ enrichment)
Structured Extraction: JSON schemas tested on Y Combinator (20/20) and Fast Forward (5/5)
Impact Methodology: 5-tier rubric for evaluating mission alignment (customizable for any thesis)
Professional Outputs: CSV exports + markdown reports with documented scoring
Research-Focused: Pure research workflow - does NOT create tracking issues

🚀 Quick Start

1. Install MCP Servers

This skill requires 2 MCP servers configured in Claude Desktop:

Required (Both validated in live testing):

Firecrawl MCP - Structured extraction (firecrawl.dev)
- Tested: 100% success rate on Y Combinator, Fast Forward
- Free tier: 500 credits/month
Tavily MCP - AI-optimized search (tavily.com)
- Tested: 90%+ success on company enrichment
- Free tier: 100 RPM (6,000/hour)

NOT Recommended:

❌ Coresignal - Too expensive, doesn't cover new startups

📖 Setup instructions: Add to ~/Library/Application Support/Claude/claude_desktop_config.json

2. Upload Skill to Claude Desktop

Download this repository as ZIP
Extract files
Copy SKILL.md or reference it in your Claude Desktop workflow

3. Try It Out

Test Setup First (copy from SKILL.md):

{
  "name": "mcp__MCP_DOCKER__firecrawl_extract",
  "arguments": {
    "urls": ["https://www.ffwd.org/directory?portfolio=true"],
    "prompt": "Extract first 3 companies",
    "schema": { /* see SKILL.md for schema */ }
  }
}

Then Request Research:

"Research 10 companies from YC W25 focused on climate tech"

Claude will:

Extract YC W25 companies using firecrawl_extract with validated schema
Filter to 10 climate tech companies from JSON output
Research each company using tavily-search (batch of 10)
Score using climate tech impact rubric
Generate CSV + markdown report

📊 What You Get

Output Files:

CSV: All company data (Excel/Sheets compatible)
Markdown: Detailed research report with analysis

No Tracking Issues: This skill does NOT create Linear/project issues. Use a separate Linear skill for pipeline management.

💡 Example Use Cases

Impact Investor

"Research Fast Forward portfolio and score for low-income US impact"

Climate Tech VC

"Find climate tech companies from Techstars 2024 and evaluate carbon impact"

Healthcare Focus

"Research YC healthcare companies serving Medicaid populations"

🔧 Features

Validated MCP Tools

Firecrawl Extract (PRIMARY): Structured JSON extraction with 100% success rate
- Tested on Y Combinator (20/20 companies)
- Tested on Fast Forward (5/5 companies)
- Same cost as scrape, better output
Tavily Search: AI-optimized company research (90%+ success rate)
Fallback Tools: firecrawl_scrape, tavily-extract if needed

Production-Ready Schemas

Ready-to-use templates in SCHEMA-TEMPLATES.md
Validated patterns for YC, Fast Forward, Techstars
Vertical-specific schemas (healthcare, climate, fintech)
Comprehensive error handling

Impact Methodology

5-tier rubric (Direct → Minimal alignment)
Systematic evaluation framework
Customizable for different theses
Documented reasoning per company

Output Formats

CSV with all research data (Excel/Sheets compatible)
Markdown detailed reports
Impact distribution analysis

📝 Workflow (3 Phases)

Phase 1: Portfolio Scraping

PRIMARY: Use firecrawl_extract with JSON schema (100% success rate)

Returns structured JSON directly (no parsing needed)
Validated schemas in SCHEMA-TEMPLATES.md
FALLBACK: Use firecrawl_scrape if extract fails

Phase 2: Company Research

Deep research using tavily-search (90%+ success rate)

Founder information
Mission and target market
Key metrics (users, funding, employees)
Batch processing (10 companies at a time)

Phase 3: Impact Scoring & Reports

Apply 5-tier rubric with documented reasoning
Generate CSV (Excel/Sheets compatible)
Generate markdown report with analysis

🎨 Customization

Adapt Impact Rubric

Default: Low-income US impact

Easily adapt for:

Climate Tech: Emissions reduction → Greenwashing
Healthcare: Medicaid focus → Luxury medicine
Financial Inclusion: Unbanked → High-net-worth

See references/impact-scoring.md for guide.

🔒 Prerequisites

Required

Claude Desktop installed
Docker Desktop running
API keys for Tavily + Firecrawl
(Optional) Coresignal API key

💰 API Costs

Free Tier Research (0-500 companies/month)

Firecrawl: 500 credits free
Tavily: 100 RPM free (6,000/hour)
Cost: $0/month
Sufficient for: 10-20 accelerator portfolios

Paid Tier Research (500+ companies/month)

Firecrawl Starter: $30/month (5,000 credits)
Tavily Pro: $50/month (unlimited within higher rate limits)
Cost: $80/month total
Sufficient for: 100+ accelerator portfolios

Performance Benchmarks (from live testing):

10 companies: 2-3 minutes, $0 (free tier)
50 companies: 10-15 minutes, $5-10 (paid tier)
100+ companies: 30-45 minutes, $10-20 (paid tier)

📚 Documentation

SKILL.md - Complete skill guide (1,145 lines)
- Phase 1A: firecrawl_extract with validated schemas ⭐ PRIMARY
- Phase 2: tavily-search research patterns
- Phase 3: Impact scoring & report generation
- Tool-specific best practices
- Comprehensive troubleshooting (8 common issues)
SCHEMA-TEMPLATES.md - Ready-to-use extraction schemas
- Y Combinator schema (tested 20/20)
- Fast Forward schema (tested 5/5)
- Healthcare, climate, fintech vertical schemas
- Schema design guidelines (5 golden rules)
- Common errors & fixes

🛠️ Troubleshooting

Extract returns empty array {"companies": []}

Improve prompt: "Extract all portfolio companies including name, website, description..."
Simplify schema: Only require "name" field
Add waitFor: 5000 for JavaScript pages
Fallback to firecrawl_scrape

"MCP server not found"

Check ~/Library/Application Support/Claude/claude_desktop_config.json
Restart Claude Desktop completely
Verify Docker Desktop is running
Test with simple query: mcp__MCP_DOCKER__firecrawl_extract on example.com

Full troubleshooting guide: See SKILL.md Section 10 (8 common issues with solutions)

📁 File Structure

sourcing-agent-demo-skill/
├── SKILL.md                    # Main skill guide (1,145 lines)
├── SCHEMA-TEMPLATES.md         # Production-tested extraction schemas
└── README.md                   # This file - project overview

Key Sections in SKILL.md:

Phase 1A: firecrawl_extract (PRIMARY) - Lines 53-170
Phase 2: tavily-search research - Lines 232-284
Phase 3: Impact scoring - Lines 286-417
Example scenarios with extract - Lines 422-571
Troubleshooting (8 issues) - Lines 761-1001
Validation checklist - Lines 1034-1089

🔗 Next Steps

After running research:

Review CSV/Markdown reports
Import to your preferred system:
- Airtable/Google Sheets for database view
- Use separate Linear skill for tracking (if needed)
- CRM integration for deal pipeline
Customize impact rubric for your specific thesis
Scale to multiple accelerators using batch processing

🏆 Why This Skill is Production-Ready

Validated by Live Testing:

✅ firecrawl_extract: 100% success (Y Combinator 20/20, Fast Forward 5/5)
✅ tavily-search: 90%+ success on enrichment queries
✅ Free tier sufficient for 10-20 portfolios/month
✅ Comprehensive error handling for common issues
✅ Tested on real accelerators (not just examples)

Based on Obsidian Testing Notes:

"Firecrawl MCP Comprehensive Capability Assessment.md"
"Tavily MCP Comprehensive Capability Assessment.md"
"MCP Pairing Analysis - Accelerator Scout Optimal Stack.md"

📝 Version History

v2.0 - Production-Ready (Current)

✅ Rewrote Phase 1 to use firecrawl_extract as primary (100% success rate)
✅ Added SCHEMA-TEMPLATES.md with validated patterns
✅ Comprehensive troubleshooting for extract function (8 common issues)
✅ Decision trees for tool selection
✅ Removed Coresignal (too expensive, doesn't cover new startups)

v1.0 - Initial (Deprecated)

Used firecrawl_scrape (required manual parsing)
No validated schemas
Generic error handling

🔗 Resources

Status: ✅ Production-Ready | Testing: Validated on YC, Fast Forward | Version: 2.0

Installation

Option 1: Use slash command in Claude Code

/install-skill https://github.com/RS42-AI/scout-ai-lite

Option 2: Clone to skills directory

# Global (all projects)

git clone https://github.com/RS42-AI/scout-ai-lite ~/.claude/skills/scout-ai-lite

# Project-specific

git clone https://github.com/RS42-AI/scout-ai-lite .claude/skills/scout-ai-lite

Add MCP server to .cursor/mcp.json:

{
  "mcpServers": {
    "skillz": {
      "command": "npx",
      "args": ["-y", "skillz-mcp", "https://github.com/RS42-AI/scout-ai-lite"]
    }
  }
}

Restart Cursor after adding the configuration.

Option 1: Use Gemini CLI command

gemini extensions install https://github.com/RS42-AI/scout-ai-lite

Option 2: Clone to extensions directory

git clone https://github.com/RS42-AI/scout-ai-lite ~/.gemini/extensions/scout-ai-lite