switchailocal
traylinx/switchAILocalUnified LLM proxy for AI agents. Route all model requests through http://localhost:18080/v1. Provides FREE access to Gemini CLI, Claude CLI, Codex, and Vibe via your existing subscriptions. Includes Intelligent Auto-Routing (Cortex) for autonomous model selection, a self-optimizing Lab, token conservation, and a management dashboard. Use when: (1) making LLM calls, (2) switching between CLI/Local/Cloud providers, (3) attaching local files to prompts, (4) needing intelligent routing, or (5) auto-setting up the best available models.
SKILL.md
name: switchailocal description: Unified LLM proxy for AI agents. Route all model requests through http://localhost:18080/v1. Provides FREE access to Gemini CLI, Claude CLI, Codex, and Vibe via your existing subscriptions. Includes Intelligent Auto-Routing (Cortex) for autonomous model selection, a self-optimizing Lab, token conservation, and a management dashboard. Use when: (1) making LLM calls, (2) switching between CLI/Local/Cloud providers, (3) attaching local files to prompts, (4) needing intelligent routing, or (5) auto-setting up the best available models. metadata: { "openclaw": { "emoji": "🔀", "requires": { "bins": ["ail", "go"] }, "install": [ { "id": "switchailocal-setup", "kind": "bash", "script": "./ail.sh setup", "label": "Run unified switchAILocal setup (deps, build, service)", } ] } }
switchAILocal Skill
Unified LLM proxy and intelligent router for AI agents. Always use http://localhost:18080/v1 as your base URL.
The killer feature: Use your paid CLI subscriptions (Gemini Pro, Claude Pro, etc.) via the API — it's FREE because you already pay for the subscription!
When to Use
✅ USE this skill when:
- An agent needs to make an LLM call to Gemini, Claude, Ollama, OpenAI, etc.
- You need to attach entire file directories to an LLM context (via
geminicliandclaudecli). - You need to start, stop, or manage the local switchAILocal server/bridge.
- You want to monitor provider health, analytics, or status.
- You want to auto-detect available models and configure optimal routing.
❌ DON'T use this skill when:
- The user asks about their calendar, emails, or non-LLM operations.
- Interacting with APIs that aren't OpenAI-compatible.
🤖 Agent Auto-Setup (Zero-Config Bootstrap)
Follow these steps IN ORDER to bootstrap switchAILocal from scratch on a user's machine.
Step 1 — Install & Start
Option A (npx — recommended, no git clone needed):
npx @traylinx/switchailocal
Option B (from source):
git clone https://github.com/traylinx/switchAILocal.git
cd switchAILocal
./ail.sh setup
source ~/.zshrc 2>/dev/null || source ~/.bashrc 2>/dev/null # Reload PATH
Step 2 — Detect Available Providers
Run these commands to discover what the user has installed:
# CLI Providers (FREE!)
which gemini && echo "✅ Gemini CLI found"
which claude && echo "✅ Claude CLI found"
which codex && echo "✅ Codex CLI found"
# Local Providers
curl -s http://localhost:11434/api/tags 2>/dev/null && echo "✅ Ollama running"
# Cloud API Keys (check environment)
[ -n "$OPENAI_API_KEY" ] && echo "✅ OpenAI key found"
[ -n "$ANTHROPIC_API_KEY" ] && echo "✅ Anthropic key found"
[ -n "$GEMINI_API_KEY" ] && echo "✅ Google AI key found"
Step 3 — Generate config.yaml
Based on detected providers, generate a minimal config:
host: ""
port: 18080
# Enable any detected CLI providers:
# geminicli: (uses `gemini` CLI — FREE with Google AI Premium)
# claudecli: (uses `claude` CLI — FREE with Claude Pro)
# codex: (uses `codex` CLI — FREE with OpenAI Plus)
# Enable Ollama if detected:
ollama:
enabled: true
base-url: "http://localhost:11434"
auto-discover: true
# Enable Intelligent Auto-Routing:
auto-routing:
enabled: true
weights:
availability: 0.35
quota: 0.25
latency: 0.2
success-rate: 0.2
discovery:
enabled: true
probe-on-startup: true
conservation:
enabled: true
simple-threshold-tokens: 500
lab:
enabled: true
adaptation-interval: 24h
max-weight-drift: 0.1
Step 4 — Start & Verify
ail start
# Verify it's running:
curl -s http://localhost:18080/v1/models | head -c 200
You should see a JSON response listing all available models/providers.
⚠️ Critical: Model Format
NEVER use bare model names. Format is ALWAYS provider: or provider:model.
| ❌ Wrong | ✅ Correct | Why |
|---|---|---|
gemini-2.5-pro |
geminicli:gemini-2.5-pro |
Needs provider prefix |
claude-3-5-sonnet |
claudecli: |
claudecli: uses default |
llama3 |
ollama:llama3 |
Needs provider prefix |
auto route me |
auto or auto:coding |
Use auto prefix only |
🏗️ Provider Reference
1. CLI Providers (FREE!)
Uses your human's CLI subscriptions. Best for agents.
| Prefix | CLI | Subscription Required |
|---|---|---|
geminicli: |
gemini |
Google AI Premium/Pro |
claudecli: |
claude |
Claude Pro/Max |
codex: |
codex |
OpenAI Plus |
vibe: |
vibe |
Mistral Le Chat |
2. Local & Cloud
| Prefix | Source | Cost |
|---|---|---|
ollama: |
Local Ollama | FREE |
auto |
Cortex Router | FREE (auto-selects) |
switchai: |
Traylinx Cloud | Per-token |
3. switchAI Cloud Aliases
| Alias | Upstream Model | Best For |
|---|---|---|
switchai-fast |
openai/gpt-oss-20b |
Fast tasks |
switchai-chat |
openai/gpt-oss-20b |
Conversation |
switchai-reasoner |
deepseek-reasoner |
Deep thinking |
🧠 Intelligent Auto-Routing (Cortex)
When the model is auto or auto:<intent>, the Cortex Router automatically selects the best available model using a composite scoring algorithm.
Basic Auto-Routing
curl http://localhost:18080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-test-123" \
-d '{"model": "auto", "messages": [{"role": "user", "content": "Hello!"}]}'
Intent-Based Routing
# Route to coding-optimized models
curl http://localhost:18080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-test-123" \
-d '{"model": "auto:coding", "messages": [{"role": "user", "content": "Write a Go sorting algorithm"}]}'
Supported intents: coding, reasoning, creative, fast, secure, vision, audio, web_search, research, chat, cli, long_ctx. Specialized slots — image_gen, transcription, speech, embedding — are used by their dedicated endpoints (see sections below), not by auto chat routing.
How Scoring Works
Each model is scored: FinalScore = (W_a×Availability + W_q×Quota + W_l×Latency + W_s×SuccessRate + TierBoost + PreferenceBoost) × ConservationMultiplier
The model with the highest score wins. The Lab continuously optimizes the weights.
For deep architecture details, see the local
docs-site/intelligent-systems/or the online docs-site.
📊 Management Dashboard & API
Dashboard UI
http://localhost:18080/management
Provides real-time visualization of provider health, auto-routing weights, Lab experiments, and the live routing journal.
Telemetry API
# Get current Lab status + live weights
curl http://localhost:18080/v0/management/autoroute/status
# Get recent routing decisions journal
curl http://localhost:18080/v0/management/autoroute/journal
For the full Management API reference, see references/management-api.md.
🚀 Quick API Usage
curl (simplest)
curl http://localhost:18080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-test-123" \
-d '{"model": "geminicli:", "messages": [{"role": "user", "content": "Hello!"}]}'
Python
from openai import OpenAI
client = OpenAI(base_url="http://localhost:18080/v1", api_key="sk-test-123")
response = client.chat.completions.create(
model="geminicli:",
messages=[{"role": "user", "content": "Hi!"}]
)
Node.js
import OpenAI from 'openai';
const client = new OpenAI({ baseURL: 'http://localhost:18080/v1', apiKey: 'sk-test-123' });
const response = await client.chat.completions.create({
model: 'auto',
messages: [{ role: 'user', content: 'Hello!' }],
});
CLI Attachments & Flags
Pass local context and control autonomy via CLI extensions:
{
"model": "geminicli:",
"messages": [{"role": "user", "content": "Fix this code"}],
"extra_body": {
"cli": {
"attachments": [{"type": "folder", "path": "./src"}],
"flags": {"auto_approve": true, "yolo": true}
}
}
}
Streaming
Add "stream": true to any request for SSE token streaming.
🎨 Image Generation
Generate images via the /v1/images/generations endpoint:
curl --location 'http://localhost:18080/v1/images/generations' \
-H "Authorization: Bearer sk-test-123" \
-H "Content-Type: application/json" \
-d '{
"model": "minimax:image-01",
"prompt": "A dog wearing a space suit on Mars, photorealistic",
"aspect_ratio": "16:9",
"response_format": "url"
}'
Parameters:
model— Always useminimax:image-01prompt— Text description of the desired imageaspect_ratio—1:1,16:9,9:16,4:3,3:4response_format—url(returns HTTP URL) orbase64(returns base64 encoded image)
Example Python:
response = client.images.generate(
model="minimax:image-01",
prompt="A serene Japanese garden with cherry blossoms",
aspect_ratio="16:9",
response_format="url"
)
image_url = response.data[0].url
🔊 Text-to-Speech
Generate audio from text via /v1/audio/speech. switchAILocal ships a MiniMax T2A adapter that translates the OpenAI-shape request to MiniMax's native /v1/t2a_pro API behind the scenes.
curl http://localhost:18080/v1/audio/speech \
-H "Authorization: Bearer sk-test-123" \
-H "Content-Type: application/json" \
-d '{
"model": "minimax:speech-02-hd",
"input": "Hello from switchAILocal",
"voice": "male-qn-qingse",
"response_format": "mp3"
}' --output hello.mp3
Voice IDs — use MiniMax-native, NOT OpenAI voice names:
| Voice ID | Description |
|---|---|
male-qn-qingse |
Male, calm |
female-shaonv |
Female, young |
audiobook_male_2 |
Male, narrator |
presenter_male |
Male, anchor-style |
clever_boy |
Male, youthful |
Formats: mp3 (default), pcm, flac, wav. Override bitrate, audio_sample_rate, channel at the top level if needed.
Rate limit: Plus plan = ~1–5 RPM (very strict). MiniMax error 1002 → HTTP 429 → ClassRateLimit in failover taxonomy. Plan throughput accordingly.
🎵 Music & Lyrics Generation
MiniMax-powered music suite — three capabilities under /v1/music/*.
Lyrics (fast, ~2-5s)
curl http://localhost:18080/v1/music/lyrics \
-H "Authorization: Bearer sk-test-123" \
-H "Content-Type: application/json" \
-d '{"mode":"write_full_song","prompt":"a happy pop song about sunshine"}'
Returns {song_title, style_tags, lyrics} with structure tags [Intro], [Verse], [Chorus], [Bridge], [Outro], [Inst].
Modes: write_full_song (default) or edit (extend existing lyrics).
Music generation (text-to-music, ~30-90s sync or ~20s TTFB streaming)
Generate a real song from lyrics. Plan daily quota: 100 songs. Two modes:
Sync (default): blocks until complete, returns JSON with base64 audio.
curl http://localhost:18080/v1/music/generations \
-H "Authorization: Bearer sk-test-123" \
-H "Content-Type: application/json" \
-d '{
"model": "minimax:music-2.6",
"lyrics": "[Verse]\nCode flows through the night\n[Chorus]\nDebugging makes it right"
}' | jq -r .data.audio | base64 -d > song.mp3
Returns {data: {audio: <base64>, format, size_bytes, duration_ms, sample_rate, channels, bitrate}, model, trace_id}. The adapter decodes MiniMax's hex-encoded audio server-side and hands base64 to clients (same convention as OpenAI b64_json).
Streaming (stream: true): returns raw audio/mpeg bytes as MiniMax generates them — first bytes in ~20s, ~50% less wire (adapter drops the duplicated terminal frame MiniMax emits).
curl -N http://localhost:18080/v1/music/generations \
-H "Authorization: Bearer sk-test-123" \
-H "Content-Type: application/json" \
-d '{"model":"minimax:music-2.6","stream":true,"lyrics":"[Verse]\nHello world"}' > song.mp3
Use streaming when UX matters (web player, agent progress signal). Use sync when you need the metadata block (duration_ms, sample_rate, bitrate) upfront — those fields are logged server-side during streaming but not exposed to the client.
Music cover (reference-audio style transfer)
Generate a cover of an existing track in a different style. Same endpoint, different model.
curl http://localhost:18080/v1/music/generations \
-H "Authorization: Bearer sk-test-123" \
-d '{
"model": "minimax:music-cover",
"prompt": "upbeat jazz cover with saxophone solo",
"audio_url": "https://example.com/reference.mp3"
}' | jq -r .data.audio | base64 -d > cover.mp3
Reference audio must be 6s–6min, ≤50MB, formats: mp3/wav/flac. Use audio_url OR audio_base64 (mutually exclusive).
Typical agent workflow: generate lyrics first (/v1/music/lyrics), then synthesize music from them (/v1/music/generations). Both calls share the same MiniMax daily quota bucket (100/day each).
🎤 Audio Transcription (ASR)
Transcribe audio to text via /v1/audio/transcriptions using groq-hosted whisper:
curl http://localhost:18080/v1/audio/transcriptions \
-H "Authorization: Bearer sk-test-123" \
-F model=whisper-large-v3 \
-F [email protected]
Returns {text: "..."}. Supports whisper-large-v3 (accurate) and whisper-large-v3-turbo (fast).
🌐 Web Search (Built-in Tool)
MiniMax M2.7 supports live web search as a native chat tool. Enable via the tools field:
curl http://localhost:18080/v1/chat/completions \
-H "Authorization: Bearer sk-test-123" \
-H "Content-Type: application/json" \
-d '{
"model": "minimax:MiniMax-M2.7",
"messages": [{"role":"user","content":"What happened today in tech news?"}],
"max_tokens": 2000,
"tools": [{"type": "web_search"}]
}'
Critical: set max_tokens >= 2000. Web search inflates prompt context to 6k–13k tokens (search results are folded in), so low budgets produce empty responses.
Response shape: the model does the search internally and returns the synthesized answer in choices[0].message.content. There are no tool_calls for the client to execute — MiniMax handles everything server-side.
ail CLI Reference
ail start # Start the local server
ail stop # Stop the local server
ail restart # Restart
ail status # Check status of server and bridge
ail logs -f # Follow server logs in real-time
ail update # Pull latest + rebuild
🌲 Decision Tree
What do you need?
├─ FREE + Powerful + Files
│ └─ CLI Providers (geminicli:, claudecli:)
├─ FREE + Private + Fast
│ └─ Local Ollama (ollama:llama3.2)
├─ Ultra-Fast Production
│ └─ Cloud Provider (switchai:switchai-fast)
└─ I don't know, you pick
└─ Intelligent Routing (auto)
🗺️ Skill References
| File | Description |
|---|---|
| SKILL.md (this file) | Core workflow and quick reference |
| docs-site/ | Comprehensive Manual: AI agents should crawl this directory for complete CLI, API, architecture, and SDK docs. |
| references/routing.md | Auto-routing config, intent matrix, Lab |
| references/management-api.md | Full Management & Telemetry API |
| references/examples.md | Real-world agentic use cases |
| references/multimodal.md | Vision and image processing |
| references/steering.md | Conditional routing rules |
| references/hooks.md | Automation and event hooks |
| references/memory.md | Analytics and history |
For full human-readable documentation: ail.traylinx.com
🛠️ Troubleshooting
| Problem | Fix |
|---|---|
| Connection error | ail status → ail start if not running |
| Model not found | Ensure you used the provider: prefix |
| 401 Unauthorized | Check API key in config.yaml |
| 403 Access Denied | Likely a WAF block; the proxy auto-retries |
auth_unavailable |
ail restart |
| No models listed | Check ail logs -f for provider errors |
Cross-Cutting Rules (ALL AGENTS MUST FOLLOW)
- ALWAYS use
provider:modelformat — never bare model names. - Prefer CLI Providers — they are free and support file attachments.
- Use
autofor simple tasks — let the Cortex Router pick the best model. - Use
ollama:for privacy — local models never send data externally. - Check
/v1/modelsbefore routing — verify the model exists. - Handle errors gracefully — 503 = provider down, use fallback chain.
Route wisely. Save tokens. Use CLI. 🚀