accelerator-research-agent
RS42-AI/scout-ai-liteResearch accelerator portfolio companies using Firecrawl and Tavily MCPs. Generates structured CSV and markdown reports with systematic impact scoring. Optimized for token efficiency.
SKILL.md
name: accelerator-research-agent description: Research accelerator portfolio companies using Firecrawl and Tavily MCPs. Generates structured CSV and markdown reports with systematic impact scoring. Optimized for token efficiency.
Accelerator Research Agent
A token-optimized Claude Desktop skill for researching accelerator portfolio companies with systematic impact analysis.
When to Use This Skill
Activate when user asks to:
- "Research companies from [accelerator name]"
- "Analyze [accelerator] portfolio"
- "Score companies for impact" or "evaluate mission alignment"
- Mentions: YC, Techstars, Fast Forward, 500 Global, a16z
Prerequisites
Required MCP Servers (both tested and validated):
-
Firecrawl MCP - Structured extraction
- Free tier: 500 credits/month
- Use
firecrawl_extractfor JSON extraction
-
Tavily MCP - AI-optimized search
- Free tier: 100 RPM (6,000/hour)
- Use
tavily-searchfor company research
Core Workflow (3 Phases)
Phase 1: Portfolio Extraction
Goal: Get company list from accelerator portfolio page
Tool: firecrawl_extract (PRIMARY - 100% success rate)
Schema Pattern: See SCHEMA-TEMPLATES.md for tested schemas (YC, Fast Forward, Healthcare, Climate, Fintech)
Quick Schema (customize based on accelerator):
{
"name": "mcp__MCP_DOCKER__firecrawl_extract",
"arguments": {
"urls": ["PORTFOLIO_URL"],
"prompt": "Extract all portfolio companies including name, website, description, industry",
"schema": {
"type": "object",
"properties": {
"companies": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": {"type": "string"},
"website": {"type": "string"},
"description": {"type": "string"},
"industry": {"type": "string"}
},
"required": ["name"]
}
}
},
"required": ["companies"]
}
}
}
Token Optimization:
- Only require
"name"field - Use string types for all fields (more flexible)
- Add
"maxAge": 604800000for caching (7 days)
If Extract Fails - Use fallback:
{
"name": "mcp__MCP_DOCKER__firecrawl_scrape",
"arguments": {
"url": "PORTFOLIO_URL",
"formats": ["markdown"]
}
}
Then manually parse the markdown.
Phase 2: Company Research
Goal: Research each company using web search
Tool: tavily-search with token-efficient parameters
CRITICAL - Token Optimization:
{
"name": "mcp__MCP_DOCKER__tavily-search",
"arguments": {
"query": "[company name] mission target market",
"max_results": 3, // ✅ NOT 10! Saves 70% tokens
"search_depth": "basic", // ✅ NOT "advanced"! Faster
"include_raw_content": false // ✅ Critical - saves massive tokens
}
}
Batch Processing (IMPORTANT):
- Research 3-5 companies at a time (not 10-20)
- Generate incremental reports to avoid token limits
Research Query Pattern:
"[Company Name] mission target market product"
Extract from Results:
- Founder names
- Mission/tagline
- Target market demographic
- Product/service description
- Key metrics (users, funding, team size)
Phase 3: Impact Scoring
Goal: Score companies using 5-tier rubric
5-Tier Impact Rubric (Customizable):
⭐⭐⭐⭐⭐ Tier 1 - Direct Impact
- Primary target: Underserved populations
- Core product addresses fundamental challenges
- Impact central to business model
⭐⭐⭐⭐ Tier 2 - Strong Alignment
- Significant focus on underserved
- Clear pathway to reach target communities
- Impact is key differentiator
⭐⭐⭐ Tier 3 - Moderate Alignment
- Serves underserved as secondary market
- Impact through indirect channels
- Mixed revenue model
⭐⭐ Tier 4 - Weak Alignment
- Minimal underserved focus
- Impact is incidental or aspirational
- Primarily serves mainstream markets
⭐ Tier 5 - Minimal Alignment
- No focus on underserved
- Luxury/premium positioning
- Opposite of mission
Customization Examples:
- Climate Tech: Direct emissions reduction → Greenwashing
- Healthcare: Medicaid focus → Luxury medicine
- Fintech: Unbanked → High-net-worth
Phase 4: Report Generation
CSV Format (Excel/Sheets compatible):
Company Name,Website,Description,Industry,Impact Tier,Impact Reasoning,Founder,Funding
Markdown Format:
# [Accelerator] Portfolio Research Report
## Executive Summary
- Total companies researched: X
- Impact distribution: Tier 1 (X), Tier 2 (X), etc.
## High-Impact Companies (Tier 1-2)
### Company Name
- **Website**: [URL]
- **Impact Tier**: ⭐⭐⭐⭐⭐
- **Mission**: [Brief mission]
- **Target Market**: [Demographics]
- **Why High Impact**: [Reasoning]
- **Metrics**: [Users, funding, etc.]
[Repeat for each high-impact company]
## Moderate Impact Companies (Tier 3)
[Summarized list]
## Lower Priority Companies (Tier 4-5)
[Brief list]
Token Management Best Practices
Critical for Avoiding Limits:
- Batch Processing: Research 3-5 companies at a time
- Tavily Parameters:
max_results: 3(not 10)search_depth: "basic"(not "advanced")include_raw_content: false(saves massive tokens)
- Incremental Reports: Generate partial results, then continue
- Schema Efficiency: Only require essential fields
- Caching: Use
maxAgeparameter for portfolio pages
Common Scenarios
Scenario 1: YC Research
User: "Research 10 YC W25 climate tech companies"
Steps:
1. Extract YC W25 companies (firecrawl_extract + YC schema)
2. Filter to climate tech vertical (JSON filtering)
3. Research FIRST 5 companies (tavily-search, max_results=3)
4. Score and generate partial report
5. Research NEXT 5 companies (new batch)
6. Append to report
Scenario 2: Fast Forward Impact
User: "Score Fast Forward portfolio for low-income US impact"
Steps:
1. Extract Fast Forward companies (firecrawl_extract)
2. Research in batches of 3 (tavily-search)
3. Apply low-income US impact rubric
4. Generate CSV + markdown report
Scenario 3: Healthcare Medicaid
User: "Find healthcare startups serving Medicaid populations"
Steps:
1. Extract with healthcare vertical schema (see SCHEMA-TEMPLATES.md)
2. Research with query: "[company] Medicaid low-income healthcare"
3. Filter to Medicaid focus
4. Score using healthcare impact rubric
Troubleshooting
Token Limit Hit:
- Reduce batch size to 3 companies
- Use
search_depth: "basic" - Set
include_raw_content: false - Generate incremental reports
Extract Returns Empty:
- Check SCHEMA-TEMPLATES.md for validated schemas
- Improve prompt specificity
- Try fallback to
firecrawl_scrape
Search Returns Poor Results:
- Refine query: "[company name] mission target market"
- Reduce
max_resultsto 3 - Try alternative search: "[company name] about"
Files Reference
- SCHEMA-TEMPLATES.md: Production-tested extraction schemas
- README.md: Setup instructions and MCP configuration
Output Deliverables
This skill generates ONLY research outputs:
- ✅ CSV file with all company data
- ✅ Markdown report with analysis
This skill does NOT:
- ❌ Create Linear/project tracking issues
- ❌ Integrate with CRM systems
- ❌ Send notifications
Use separate skills for pipeline management if needed.
Version: 2.1 (Token-Optimized) | Testing: Validated on YC, Fast Forward
README
Accelerator Research Agent
Production-ready Claude Desktop skill for researching accelerator portfolio companies using validated MCP tools.
🎯 What This Skill Does
Systematically research and analyze accelerator companies with AI-powered structured extraction:
- Validated Tech Stack: Firecrawl Extract (100% success rate) + Tavily Search (90%+ enrichment)
- Structured Extraction: JSON schemas tested on Y Combinator (20/20) and Fast Forward (5/5)
- Impact Methodology: 5-tier rubric for evaluating mission alignment (customizable for any thesis)
- Professional Outputs: CSV exports + markdown reports with documented scoring
- Research-Focused: Pure research workflow - does NOT create tracking issues
🚀 Quick Start
1. Install MCP Servers
This skill requires 2 MCP servers configured in Claude Desktop:
Required (Both validated in live testing):
-
Firecrawl MCP - Structured extraction (firecrawl.dev)
- Tested: 100% success rate on Y Combinator, Fast Forward
- Free tier: 500 credits/month
-
Tavily MCP - AI-optimized search (tavily.com)
- Tested: 90%+ success on company enrichment
- Free tier: 100 RPM (6,000/hour)
NOT Recommended:
- ❌ Coresignal - Too expensive, doesn't cover new startups
📖 Setup instructions: Add to ~/Library/Application Support/Claude/claude_desktop_config.json
2. Upload Skill to Claude Desktop
- Download this repository as ZIP
- Extract files
- Copy
SKILL.mdor reference it in your Claude Desktop workflow
3. Try It Out
Test Setup First (copy from SKILL.md):
{
"name": "mcp__MCP_DOCKER__firecrawl_extract",
"arguments": {
"urls": ["https://www.ffwd.org/directory?portfolio=true"],
"prompt": "Extract first 3 companies",
"schema": { /* see SKILL.md for schema */ }
}
}
Then Request Research:
"Research 10 companies from YC W25 focused on climate tech"
Claude will:
- Extract YC W25 companies using
firecrawl_extractwith validated schema - Filter to 10 climate tech companies from JSON output
- Research each company using
tavily-search(batch of 10) - Score using climate tech impact rubric
- Generate CSV + markdown report
📊 What You Get
Output Files:
- CSV: All company data (Excel/Sheets compatible)
- Markdown: Detailed research report with analysis
No Tracking Issues: This skill does NOT create Linear/project issues. Use a separate Linear skill for pipeline management.
💡 Example Use Cases
Impact Investor
"Research Fast Forward portfolio and score for low-income US impact"
Climate Tech VC
"Find climate tech companies from Techstars 2024 and evaluate carbon impact"
Healthcare Focus
"Research YC healthcare companies serving Medicaid populations"
🔧 Features
Validated MCP Tools
- Firecrawl Extract (PRIMARY): Structured JSON extraction with 100% success rate
- Tested on Y Combinator (20/20 companies)
- Tested on Fast Forward (5/5 companies)
- Same cost as scrape, better output
- Tavily Search: AI-optimized company research (90%+ success rate)
- Fallback Tools:
firecrawl_scrape,tavily-extractif needed
Production-Ready Schemas
- Ready-to-use templates in
SCHEMA-TEMPLATES.md - Validated patterns for YC, Fast Forward, Techstars
- Vertical-specific schemas (healthcare, climate, fintech)
- Comprehensive error handling
Impact Methodology
- 5-tier rubric (Direct → Minimal alignment)
- Systematic evaluation framework
- Customizable for different theses
- Documented reasoning per company
Output Formats
- CSV with all research data (Excel/Sheets compatible)
- Markdown detailed reports
- Impact distribution analysis
📝 Workflow (3 Phases)
Phase 1: Portfolio Scraping
PRIMARY: Use firecrawl_extract with JSON schema (100% success rate)
- Returns structured JSON directly (no parsing needed)
- Validated schemas in
SCHEMA-TEMPLATES.md - FALLBACK: Use
firecrawl_scrapeif extract fails
Phase 2: Company Research
Deep research using tavily-search (90%+ success rate)
- Founder information
- Mission and target market
- Key metrics (users, funding, employees)
- Batch processing (10 companies at a time)
Phase 3: Impact Scoring & Reports
- Apply 5-tier rubric with documented reasoning
- Generate CSV (Excel/Sheets compatible)
- Generate markdown report with analysis
🎨 Customization
Adapt Impact Rubric
Default: Low-income US impact
Easily adapt for:
- Climate Tech: Emissions reduction → Greenwashing
- Healthcare: Medicaid focus → Luxury medicine
- Financial Inclusion: Unbanked → High-net-worth
See references/impact-scoring.md for guide.
🔒 Prerequisites
Required
- Claude Desktop installed
- Docker Desktop running
- API keys for Tavily + Firecrawl
- (Optional) Coresignal API key
💰 API Costs
Free Tier Research (0-500 companies/month)
- Firecrawl: 500 credits free
- Tavily: 100 RPM free (6,000/hour)
- Cost: $0/month
- Sufficient for: 10-20 accelerator portfolios
Paid Tier Research (500+ companies/month)
- Firecrawl Starter: $30/month (5,000 credits)
- Tavily Pro: $50/month (unlimited within higher rate limits)
- Cost: $80/month total
- Sufficient for: 100+ accelerator portfolios
Performance Benchmarks (from live testing):
- 10 companies: 2-3 minutes, $0 (free tier)
- 50 companies: 10-15 minutes, $5-10 (paid tier)
- 100+ companies: 30-45 minutes, $10-20 (paid tier)
📚 Documentation
-
SKILL.md - Complete skill guide (1,145 lines)
- Phase 1A:
firecrawl_extractwith validated schemas ⭐ PRIMARY - Phase 2:
tavily-searchresearch patterns - Phase 3: Impact scoring & report generation
- Tool-specific best practices
- Comprehensive troubleshooting (8 common issues)
- Phase 1A:
-
SCHEMA-TEMPLATES.md - Ready-to-use extraction schemas
- Y Combinator schema (tested 20/20)
- Fast Forward schema (tested 5/5)
- Healthcare, climate, fintech vertical schemas
- Schema design guidelines (5 golden rules)
- Common errors & fixes
🛠️ Troubleshooting
Extract returns empty array {"companies": []}
- Improve prompt: "Extract all portfolio companies including name, website, description..."
- Simplify schema: Only require
"name"field - Add
waitFor: 5000for JavaScript pages - Fallback to
firecrawl_scrape
"MCP server not found"
- Check
~/Library/Application Support/Claude/claude_desktop_config.json - Restart Claude Desktop completely
- Verify Docker Desktop is running
- Test with simple query:
mcp__MCP_DOCKER__firecrawl_extracton example.com
Full troubleshooting guide: See SKILL.md Section 10 (8 common issues with solutions)
📁 File Structure
sourcing-agent-demo-skill/
├── SKILL.md # Main skill guide (1,145 lines)
├── SCHEMA-TEMPLATES.md # Production-tested extraction schemas
└── README.md # This file - project overview
Key Sections in SKILL.md:
- Phase 1A:
firecrawl_extract(PRIMARY) - Lines 53-170 - Phase 2:
tavily-searchresearch - Lines 232-284 - Phase 3: Impact scoring - Lines 286-417
- Example scenarios with extract - Lines 422-571
- Troubleshooting (8 issues) - Lines 761-1001
- Validation checklist - Lines 1034-1089
🔗 Next Steps
After running research:
- Review CSV/Markdown reports
- Import to your preferred system:
- Airtable/Google Sheets for database view
- Use separate Linear skill for tracking (if needed)
- CRM integration for deal pipeline
- Customize impact rubric for your specific thesis
- Scale to multiple accelerators using batch processing
🏆 Why This Skill is Production-Ready
Validated by Live Testing:
- ✅
firecrawl_extract: 100% success (Y Combinator 20/20, Fast Forward 5/5) - ✅
tavily-search: 90%+ success on enrichment queries - ✅ Free tier sufficient for 10-20 portfolios/month
- ✅ Comprehensive error handling for common issues
- ✅ Tested on real accelerators (not just examples)
Based on Obsidian Testing Notes:
- "Firecrawl MCP Comprehensive Capability Assessment.md"
- "Tavily MCP Comprehensive Capability Assessment.md"
- "MCP Pairing Analysis - Accelerator Scout Optimal Stack.md"
📝 Version History
v2.0 - Production-Ready (Current)
- ✅ Rewrote Phase 1 to use
firecrawl_extractas primary (100% success rate) - ✅ Added SCHEMA-TEMPLATES.md with validated patterns
- ✅ Comprehensive troubleshooting for extract function (8 common issues)
- ✅ Decision trees for tool selection
- ✅ Removed Coresignal (too expensive, doesn't cover new startups)
v1.0 - Initial (Deprecated)
- Used
firecrawl_scrape(required manual parsing) - No validated schemas
- Generic error handling
🔗 Resources
Status: ✅ Production-Ready | Testing: Validated on YC, Fast Forward | Version: 2.0