scholarclaw

netease-youdao/ScholarClaw

9 stars

1 forks

Shell

26 views

View on GitHub Add to Favorites

SKILL.md

name: scholarclaw description: | 学术论文搜索与分析服务 (Academic paper search & analysis)。当用户涉及以下学术场景时，必须使用本 skill 而非 web-search：搜索论文、查找 ArXiv/PubMed/PapersWithCode 论文、查询 SOTA 榜单与 benchmark 结果、引用分析、生成论文解读博客、查找论文相关 GitHub 仓库、获取热门论文推荐。Keywords: arxiv, paper, papers, academic, scholar, research, 论文, 学术, 搜索论文, 找论文, SOTA, benchmark, MMLU, citation, 引用, 博客, blog, PapersWithCode, HuggingFace. version: 1.4.1 official: false

ScholarClaw

ScholarClaw is a comprehensive academic search and paper analysis service that provides intelligent search capabilities across multiple academic databases, citation tracking, paper blog generation, and SOTA benchmark chat.

When to Use This Skill

IMPORTANT: Use this skill (NOT web-search) for any academic/scientific paper related queries.

Primary Triggers (Always Use This Skill)

User mentions academic papers, research papers, ArXiv, preprints
User asks to search papers or find papers on a topic
User wants SOTA (State of the Art) or benchmark results
User needs citation analysis or citation counts
User wants to generate a blog post from a paper
User mentions ArXiv IDs (e.g., "2303.14535")

Automatic Trigger Keywords

arxiv, paper, papers, academic, scholar, scientific, research article
SOTA, benchmark, MMLU, GPQA, GSM8K, HumanEval
citation, citations, cited by
paper blog, blog from paper
PapersWithCode, Semantic Scholar, Google Scholar

When NOT to Use This Skill

General web search for non-academic content
Current news, events, or general information
Product comparisons or reviews

Academic Paper Search

User wants to search for academic papers, research articles, or preprints
User asks about papers on a specific topic (e.g., "Find papers about transformers")
User needs literature review or related work information
User mentions ArXiv, PubMed, NeurIPS, CVPR, or academic databases
User asks to find "latest" or "recent" papers on a topic

SOTA/Benchmark Queries

User asks about SOTA (State of the Art) results on any benchmark
User mentions specific benchmarks: MMLU, GPQA, GSM8K, HumanEval, MATH, etc.
User wants to compare model performance on benchmarks
User asks "What is the best model for..." or "What's the SOTA for..."
User wants to know about benchmark datasets or evaluation metrics

Citation Analysis

User wants to find papers citing a specific paper
User asks about citation count or impact of a paper
User needs to find related work through citation networks
User provides an ArXiv ID and asks about citations

Paper Analysis & Blog Generation

User wants a summary or blog-style explanation of a paper
User asks to "explain this paper" or "write about this paper"
User wants to generate content from academic papers
User provides an ArXiv ID and asks for detailed analysis

Research Recommendations

User wants trending or popular papers
User asks for paper recommendations
User wants to find GitHub repositories related to a paper

Key Trigger Phrases

"Search for papers about..."
"What's the SOTA for..."
"Find citations of..."
"Latest research on..."
"Compare models on..."
"Benchmark results for..."
"ArXiv paper..."
"Generate blog from paper..."
"Trending papers..."
"What is the best performing model on..."

Execution Guidelines

CRITICAL: API calls require waiting for responses. Do NOT return to user until the API call completes.

All ScholarClaw API calls are blocking operations that require waiting for the server to process and return results. The agent must not assume immediate completion or return placeholder responses.

Response Time Expectations

Different operations have different expected response times. Configure appropriate timeouts to avoid premature cancellation:

Operation	Expected Time	Recommended Timeout	Notes
Basic Search (`/search`)	5-15 seconds	30 seconds	Fast, direct database queries
Scholar Search (`/scholar/search`)	15-45 seconds	60 seconds	Includes AI query analysis and reranking
SOTA Chat (`/api/benchmark/chat`)	30-90 seconds	120 seconds	May involve tool calls and data retrieval
SOTA Chat Stream (`/api/benchmark/chat/stream`)	30-90 seconds	120 seconds	SSE streaming, same processing time
Blog Generation (`/api/blog`)	2-5 minutes	300-600 seconds	Long-running task, use async mode
Citation Query (`/citations`, `/openalex`)	5-20 seconds	30 seconds	External API dependent

Streaming Response Handling

For the /api/benchmark/chat/stream SSE endpoint:

Parse each line as a JSON event - Lines starting with data: contain JSON payloads
Extract content from specific event types only:
- final_response - Complete response, use this for final result
- response_chunk - Incremental text chunks for streaming display
Ignore intermediate events - These are for internal processing:
- session_start - Session initialization
- tool_call_start - Tool call beginning
- tool_call_result - Tool execution results
- tool_call_end - Tool call completion

Example SSE parsing:

data: {"type": "session_start", "session_id": "xxx"}        # Ignore
data: {"type": "tool_call_start", "tool": "search"}         # Ignore
data: {"type": "tool_call_result", "result": {...}}         # Ignore
data: {"type": "response_chunk", "content": "The SOTA..."}  # Extract content
data: {"type": "final_response", "response": "..."}         # Use as final result

Async Operations (Blog Generation)

IMPORTANT: Blog generation takes 2-5 minutes. Always use async mode (3-step process). Never use synchronous blog.sh without --no-wait, as it will timeout.

For blog generation, use async mode:

Submit task - Use blog_submit.sh or blog.sh --no-wait

./scripts/blog_submit.sh -i 2303.14535
# Returns: {"task_id": "blog_abc123def456", "status": "pending"}

Poll status - Check status every 10-15 seconds

./scripts/blog_status.sh -i blog_abc123def456
# Returns: {"status": "processing", "progress": 50}

Fetch result - When status is completed

./scripts/blog_result.sh -i blog_abc123def456
# Returns: {"status": "completed", "content": "..."}

Recommended polling strategy:

Poll interval: 10-15 seconds
Max attempts: 40 (for 600s total timeout)
Abort on failed or error status

Best Practices

Error Handling

Status Code	Meaning	Action
`200`	Success	Process response normally
`400`	Bad Request	Check parameters, do NOT retry - fix the request
`404`	Not Found	Resource doesn't exist, inform user
`500`	Internal Error	Log error, inform user, may retry once
`503`	Service Unavailable	Retry with exponential backoff (2^n seconds)
`504`	Gateway Timeout	Increase timeout or use async mode

Retry Strategy

For transient errors (503, 504, network issues):

First retry: Wait 2 seconds
Second retry: Wait 4 seconds
Third retry: Wait 8 seconds
Max retries: 3 attempts
After max retries: Inform user of service unavailability

Do NOT retry on:

400 errors (client-side issues)
404 errors (resource not found)
Validation errors in response

Response Parsing

Endpoint	Primary Field	Notes
`/search`	`results` array	List of search results
`/scholar/search`	`results` array + `summary`	Includes AI-generated summary
`/api/benchmark/chat`	`response` string	Chat response text
`/api/benchmark/chat/stream`	`final_response.response`	From SSE stream
`/citations`	`results` array	List of citing papers
`/api/blog/result`	`content` string	Generated blog content

Pagination handling:

Check has_next field to determine if more pages exist
Use page and page_size parameters for pagination
Total results available in total field

Timeout Configuration

When making HTTP requests, always set appropriate timeouts:

# Example with curl
curl --max-time 60 "${SCHOLARCLAW_SERVER_URL}/scholar/search" ...

# Example with curl for long operations
curl --max-time 300 "${SCHOLARCLAW_SERVER_URL}/api/blog/submit" ...

Capabilities

Capability	Endpoint	Description
Unified Search	`/search`	Multi-engine search (arxiv, pubmed, google, kuake, bocha, cache)
Scholar Search	`/scholar/search`	Intelligent academic search with query analysis, citation expansion, and reranking
Citation Analysis	`/citations`	ArXiv paper citation statistics and listing
OpenAlex Citations	`/openalex`	OpenAlex citation query and paper discovery
Paper Blog	`/api/blog`	Generate blog articles from papers
SOTA Chat	`/api/benchmark/chat`	SOTA/Benchmark query via chat API
Recommendations	`/api/recommend`	HuggingFace trending papers and GitHub repos

Configuration

API Key 为可选配置。部分高级功能可能需要鉴权，如需申请 API Key，请前往 ScholarClaw 网站申请。

Configuration File (Recommended)

Create a configuration file at ~/.scholarclaw/config.json:

{
  "apiKey": "your-api-key",
  "serverUrl": "https://scholarclaw.youdao.com",
  "timeout": 30000,
  "maxRetries": 3,
  "debug": false
}

Environment Variables

export SCHOLARCLAW_SERVER_URL="https://scholarclaw.youdao.com"
export SCHOLARCLAW_API_KEY="your-api-key"  # 可选，前往 https://scholarclaw.youdao.com/ 申请
export SCHOLARCLAW_DEBUG="false"

OpenClaw Config (config.yaml)

skills:
  - name: scholarclaw
    enabled: true
    config:
      serverUrl: "https://scholarclaw.youdao.com"
      apiKey: "your-api-key"  # 可选，前往 https://scholarclaw.youdao.com/ 申请
      timeout: 30000
      maxRetries: 3
      debug: false

Configuration Priority

The skill loads configuration in the following order (highest priority first):

Environment variables
OpenClaw skill config
Configuration file (~/.scholarclaw/config.json)
Default values

Usage Examples

IMPORTANT: Use ./scripts/<script>.sh to invoke commands. Do NOT use scholarclaw command as it requires separate installation.

1. Unified Search

# Search arXiv for transformer papers
./scripts/search.sh -q "transformer attention mechanism" -e arxiv -l 20

# Search PubMed with AI mode
./scripts/search.sh -q "COVID-19 vaccine efficacy" -e pubmed --mode ai

# Search with time range preset
./scripts/search.sh -q "LLM reasoning" -e google --time-range month

# Search with custom date range
./scripts/search.sh -q "transformer" -e arxiv --time-range custom --start-date 2023-01-01 --end-date 2024-01-01

2. Scholar Search (Intelligent Academic Search)

# Smart academic search with query analysis
./scripts/scholar.sh -q "What are the latest advances in multimodal learning?"

# Limit results count
./scripts/scholar.sh -q "RAG retrieval augmented generation" -l 15

# With conversation context
./scripts/scholar.sh -q "What about their computational efficiency?" --context '[{"role":"user","content":"Tell me about vision transformers"}]'

3. Citation Analysis

# Get citation statistics for an ArXiv paper
./scripts/citations_stats.sh --arxiv-id 2303.14535

# List papers citing an ArXiv paper
./scripts/citations.sh --arxiv-id 2303.14535 --page 1 --page-size 20

4. OpenAlex Citations

# Find paper by title and get citations
./scripts/openalex_find.sh --title "Attention Is All You Need" --author "Vaswani"

# Get citations by OpenAlex work ID
./scripts/openalex_cited.sh --work-id "W2741809807"

5. Blog Generation

# Async mode (submit only, recommended for skill usage)
./scripts/blog_submit.sh -i 2303.14535

# Check status later for async tasks
./scripts/blog_status.sh -i blog_abc123def456

# Get result when ready
./scripts/blog_result.sh -i blog_abc123def456

# Save blog to file
./scripts/blog_result.sh -i blog_abc123def456 -o blog.md --content-only

6. SOTA Chat

Query SOTA/Benchmark information via chat API.

# Simple question
./scripts/benchmark_chat.sh -m "What is the SOTA for MMLU benchmark?"

# With conversation history
./scripts/benchmark_chat.sh -m "What about GPQA?" -H '[{"role":"user","content":"Tell me about MMLU"}]'

# Streaming mode (for long responses)
./scripts/benchmark_chat.sh -m "List recent SOTA results for reasoning benchmarks" -s

# Save to file
./scripts/benchmark_chat.sh -m "Compare GPT-4 and Claude on various benchmarks" -o result.json

7. Recommendations

# Get trending papers from HuggingFace
./scripts/recommend_papers.sh --limit 12

# Get recommended blogs
./scripts/recommend_blogs.sh --limit 10

# Get GitHub repos for a paper
./scripts/paper_repos.sh --arxiv-id 2303.14535

API Reference

Search Endpoints

GET /search

Unified search across multiple engines.

Parameter	Type	Default	Description
q	string	required	Search query
engine	string	bocha	Search engine: arxiv, pubmed, google, kuake, bocha, cache, nips
limit	int	100	Total results to fetch
page	int	1	Page number (1-indexed)
page_size	int	10	Results per page
time_range	string	null	Time range preset: week, month, year, custom
start_date	string	null	Start date (YYYY-MM-DD), used with time_range=custom
end_date	string	null	End date (YYYY-MM-DD), used with time_range=custom
mode	string	simple	Search mode: simple, ai
sort_by	string	relevance	Sort by: relevance, date

POST /scholar/search

Intelligent academic search with query analysis.

{
  "query": "What are the latest advances in multimodal learning?",
  "messages": [{"role": "user", "content": "..."}],
  "max_results": 20,
  "search_engine": "arxiv",
  "enable_citation_expansion": true,
  "enable_rerank": true
}

Citation Endpoints

GET /citations

List papers citing an ArXiv paper.

Parameter	Type	Default	Description
arxiv_id	string	required	ArXiv paper ID
page	int	1	Page number
page_size	int	20	Results per page
sort_by	string	citation_count	Sort by: citation_count, date

GET /citations/stats

Get citation statistics for an ArXiv paper.

Parameter	Type	Description
arxiv_id	string	ArXiv paper ID

OpenAlex Endpoints

GET /openalex/find_and_cited_by

Find paper by title and get citations.

Parameter	Type	Default	Description
title	string	required	Paper title
author_name	string	""	Author name (optional)
limit	int	20	Max results
fetch_citing_works	bool	false	Fetch citing works list

Blog Endpoints

POST /api/blog/submit

Submit blog generation task.

curl -X POST "${SCHOLARCLAW_SERVER_URL}/api/blog/submit" \
  -F "arxiv_ids=2303.14535" \
  -F "views_content=Optional user views"

GET /api/blog/result/{task_id}

Get blog generation result.

SOTA Chat Endpoints

POST /api/benchmark/chat

Send a chat message for SOTA/Benchmark queries.

{
  "message": "What is the SOTA for MMLU benchmark?",
  "history": [{"role": "user", "content": "..."}]
}

Response:

{
  "response": "The current SOTA for MMLU is...",
  "tool_calls": [...]
}

POST /api/benchmark/chat/stream

Streaming chat endpoint (SSE).

Same request format, returns Server-Sent Events.

Recommendation Endpoints

GET /api/recommend/papers

Get trending papers from HuggingFace.

Parameter	Type	Default	Description
limit	int	12	Number of papers (1-50)

GET /api/recommend/blogs

Get recommended blog articles.

Parameter	Type	Default	Description
limit	int	10	Number of blogs (1-50)

Response Formats

Search Result

{
  "results": [
    {
      "id": "2303.14535",
      "title": "Paper Title",
      "abstract": "Paper abstract...",
      "authors": "Author 1, Author 2",
      "year": 2023,
      "url": "https://arxiv.org/abs/2303.14535",
      "pdf_url": "https://arxiv.org/pdf/2303.14535.pdf",
      "source": "arxiv"
    }
  ],
  "total": 100,
  "page": 1,
  "page_size": 10,
  "total_pages": 10,
  "has_next": true
}

Scholar Search Result

{
  "query": "Original query",
  "results": [...],
  "summary": "AI-generated summary of findings",
  "analysis": {
    "core_question": "Extracted core question",
    "keyword_queries": ["keyword1", "keyword2"],
    "semantic_queries": ["semantic query 1"],
    "search_engine": "arxiv"
  },
  "total_results": 20
}

Error Handling

All endpoints return standard HTTP status codes:

200 - Success
400 - Bad request (invalid parameters)
404 - Not found
500 - Internal server error
503 - Service unavailable
504 - Gateway timeout

Error response format:

{
  "detail": "Error message describing the issue"
}

Dependencies

Requires the ScholarClaw API service (default: https://scholarclaw.youdao.com)
curl for HTTP requests
jq (optional) for JSON formatting

Installation

Option 1: Use slash command in Claude Code

/install-skill https://github.com/netease-youdao/ScholarClaw

Option 2: Clone to skills directory

# Global (all projects)

git clone https://github.com/netease-youdao/ScholarClaw ~/.claude/skills/ScholarClaw

# Project-specific

git clone https://github.com/netease-youdao/ScholarClaw .claude/skills/ScholarClaw

Add MCP server to .cursor/mcp.json:

{
  "mcpServers": {
    "skillz": {
      "command": "npx",
      "args": ["-y", "skillz-mcp", "https://github.com/netease-youdao/ScholarClaw"]
    }
  }
}

Restart Cursor after adding the configuration.

Option 1: Use Gemini CLI command

gemini extensions install https://github.com/netease-youdao/ScholarClaw

Option 2: Clone to extensions directory

git clone https://github.com/netease-youdao/ScholarClaw ~/.gemini/extensions/ScholarClaw