switchailocal
traylinx/switchAILocalUnified LLM proxy for AI agents. Route all model requests through http://localhost:18080/v1. Provides FREE access to Gemini CLI, Claude CLI, Codex, and Vibe via your existing subscriptions. Use when: (1) making LLM calls using provider prefixes, (2) switching between CLI/Local/Cloud providers, (3) needing to attach local files/folders to prompts via CLI, (4) requiring intelligent routing between models, or (5) needing to monitor provider health and analytics.
SKILL.md
name: switchailocal description: Unified LLM proxy for AI agents. Route all model requests through http://localhost:18080/v1. Provides FREE access to Gemini CLI, Claude CLI, Codex, and Vibe via your existing subscriptions. Use when: (1) making LLM calls using provider prefixes, (2) switching between CLI/Local/Cloud providers, (3) needing to attach local files/folders to prompts via CLI, (4) requiring intelligent routing between models, or (5) needing to monitor provider health and analytics.
switchAILocal Proxy
Unified LLM proxy for AI agents. Always use http://localhost:18080/v1 as your base URL.
The killer feature: Use your paid CLI subscriptions (Gemini Pro, Claude Pro, etc.) via the API - it's FREE because you already pay for the subscription!
Quick Start
1. Make a request (FREE with CLI)
curl http://localhost:18080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "geminicli:",
"messages": [{"role": "user", "content": "Hello!"}]
}'
2. Configure Python Client
from openai import OpenAI
client = OpenAI(base_url="http://localhost:18080/v1", api_key="sk-test-123")
response = client.chat.completions.create(model="geminicli:", messages=[{"role": "user", "content": "Hi!"}])
🗺️ Skill Files
| File | Description |
|---|---|
| SKILL.md (this file) | Core workflow and endpoint reference |
| references/routing.md | Intelligent routing and matrix setup |
| references/multimodal.md | Vision and image processing |
| references/examples.md | Real-world agentic use cases |
| references/management-api.md | Full Monitoring & Operations API |
| references/steering.md | Conditional routing rules |
| references/hooks.md | Automation and event hooks |
| references/memory.md | Analytics and history |
⚠️ Critical: Model Format
NEVER use bare model names. Format is ALWAYS provider: or provider:model.
| ❌ Wrong | ✅ Correct | Why |
|---|---|---|
gemini-2.5-pro |
geminicli:gemini-2.5-pro |
Needs provider prefix |
claude-3-5-sonnet |
claudecli: |
claudecli: uses default |
llama3 |
ollama:llama3 |
Needs provider prefix |
🏗️ Provider Reference
1. CLI Providers (FREE!)
Uses your human's CLI subscriptions. Best for agents.
| Prefix | CLI | Subscription Required |
|---|---|---|
geminicli: |
gemini |
Google AI Premium/Pro |
claudecli: |
claude |
Claude Pro/Max |
codex: |
codex |
OpenAI Plus |
vibe: |
vibe |
Mistral Le Chat |
2. Local & Cloud
| Prefix | Source | Cost |
|---|---|---|
ollama: |
Local Ollama | FREE |
auto |
Local Cortex | FREE (Requires plugin) |
switchai: |
Traylinx Cloud | Per-token |
groq: |
Groq Cloud | Per-token |
🚀 Core Features
CLI Attachments & Flags
Pass local context and control autonomy via CLI extensions.
{
"model": "geminicli:",
"messages": [{"role": "user", "content": "Fix this code"}],
"extra_body": {
"cli": {
"attachments": [{"type": "folder", "path": "./src"}],
"flags": {"auto_approve": true, "yolo": true}
}
}
}
Streaming
Add "stream": true to any request for SSE token streaming.
🌲 Decision Tree
What do you need?
├─ FREE + Powerful + Files
│ └─ CLI Providers (geminicli:, claudecli:)
├─ FREE + Private + Fast
│ └─ Local Ollama (ollama:llama3.2)
├─ Ultra-Fast Production
│ └─ Groq Cloud (groq:llama-3.3-70b)
└─ I don't know, you pick
└─ Intelligent Routing (auto)
🛠️ Troubleshooting & Best Practices
| Problem | Fix |
|---|---|
| Connection error | Check if server is running on port 18080 |
| Model not found | Ensure you used the provider: prefix |
| 401 Unauthorized | Check API key in config.yaml |
Best Practices
- Prefer CLI Providers: They are free and support file attachments.
- Check Status: Use
GET /v1/providersto see what is active. - Use
auto: For simple tasks, let the router pick the best model. - Local for Privacy: Use
ollama:for confidential data.
Route wisely. Save tokens. Use CLI. 🚀