switchailocal

traylinx/switchAILocal

Unified LLM proxy for AI agents. Route all model requests through http://localhost:18080/v1. Provides FREE access to Gemini CLI, Claude CLI, Codex, and Vibe via your existing subscriptions. Includes Intelligent Auto-Routing (Cortex) for autonomous model selection, a self-optimizing Lab, token conservation, and a management dashboard. Use when: (1) making LLM calls, (2) switching between CLI/Local/Cloud providers, (3) attaching local files to prompts, (4) needing intelligent routing, or (5) auto-setting up the best available models.

2 stars

0 forks

127 views

View on GitHub Add to Favorites

SKILL.md

name: switchailocal description: Unified LLM proxy for AI agents. Route all model requests through http://localhost:18080/v1. Provides FREE access to Gemini CLI, Claude CLI, Codex, and Vibe via your existing subscriptions. Includes Intelligent Auto-Routing (Cortex) for autonomous model selection, a self-optimizing Lab, token conservation, and a management dashboard. Use when: (1) making LLM calls, (2) switching between CLI/Local/Cloud providers, (3) attaching local files to prompts, (4) needing intelligent routing, or (5) auto-setting up the best available models. metadata: { "openclaw": { "emoji": "🔀", "requires": { "bins": ["ail", "go"] }, "install": [ { "id": "switchailocal-setup", "kind": "bash", "script": "./ail.sh setup", "label": "Run unified switchAILocal setup (deps, build, service)", } ] } }

switchAILocal Skill

Unified LLM proxy and intelligent router for AI agents. Always use http://localhost:18080/v1 as your base URL.

The killer feature: Use your paid CLI subscriptions (Gemini Pro, Claude Pro, etc.) via the API — it's FREE because you already pay for the subscription!

When to Use

✅ USE this skill when:

An agent needs to make an LLM call to Gemini, Claude, Ollama, OpenAI, etc.
You need to attach entire file directories to an LLM context (via geminicli and claudecli).
You need to start, stop, or manage the local switchAILocal server/bridge.
You want to monitor provider health, analytics, or status.
You want to auto-detect available models and configure optimal routing.

❌ DON'T use this skill when:

The user asks about their calendar, emails, or non-LLM operations.
Interacting with APIs that aren't OpenAI-compatible.

🤖 Agent Auto-Setup (Zero-Config Bootstrap)

Follow these steps IN ORDER to bootstrap switchAILocal from scratch on a user's machine.

Step 1 — Install & Start

Option A (npx — recommended, no git clone needed):

npx @traylinx/switchailocal

Option B (from source):

git clone https://github.com/traylinx/switchAILocal.git
cd switchAILocal
./ail.sh setup
source ~/.zshrc 2>/dev/null || source ~/.bashrc 2>/dev/null  # Reload PATH

Step 2 — Detect Available Providers

Run these commands to discover what the user has installed:

# CLI Providers (FREE!)
which gemini  && echo "✅ Gemini CLI found"
which claude  && echo "✅ Claude CLI found"
which codex   && echo "✅ Codex CLI found"

# Local Providers
curl -s http://localhost:11434/api/tags 2>/dev/null && echo "✅ Ollama running"

# Cloud API Keys (check environment)
[ -n "$OPENAI_API_KEY" ]    && echo "✅ OpenAI key found"
[ -n "$ANTHROPIC_API_KEY" ] && echo "✅ Anthropic key found"
[ -n "$GEMINI_API_KEY" ]    && echo "✅ Google AI key found"

Step 3 — Generate config.yaml

Based on detected providers, generate a minimal config:

host: ""
port: 18080

# Enable any detected CLI providers:
# geminicli: (uses `gemini` CLI — FREE with Google AI Premium)
# claudecli: (uses `claude` CLI — FREE with Claude Pro)
# codex: (uses `codex` CLI — FREE with OpenAI Plus)

# Enable Ollama if detected:
ollama:
  enabled: true
  base-url: "http://localhost:11434"
  auto-discover: true

# Enable Intelligent Auto-Routing:
auto-routing:
  enabled: true
  weights:
    availability: 0.35
    quota: 0.25
    latency: 0.2
    success-rate: 0.2
  discovery:
    enabled: true
    probe-on-startup: true
  conservation:
    enabled: true
    simple-threshold-tokens: 500
  lab:
    enabled: true
    adaptation-interval: 24h
    max-weight-drift: 0.1

Step 4 — Start & Verify

ail start
# Verify it's running:
curl -s http://localhost:18080/v1/models | head -c 200

You should see a JSON response listing all available models/providers.

⚠️ Critical: Model Format

NEVER use bare model names. Format is ALWAYS provider: or provider:model.

❌ Wrong	✅ Correct	Why
`gemini-2.5-pro`	`geminicli:gemini-2.5-pro`	Needs provider prefix
`claude-3-5-sonnet`	`claudecli:`	`claudecli:` uses default
`llama3`	`ollama:llama3`	Needs provider prefix
`auto route me`	`auto` or `auto:coding`	Use `auto` prefix only

🏗️ Provider Reference

1. CLI Providers (FREE!)

Uses your human's CLI subscriptions. Best for agents.

Prefix	CLI	Subscription Required
`geminicli:`	`gemini`	Google AI Premium/Pro
`claudecli:`	`claude`	Claude Pro/Max
`codex:`	`codex`	OpenAI Plus
`vibe:`	`vibe`	Mistral Le Chat

2. Local & Cloud

Prefix	Source	Cost
`ollama:`	Local Ollama	FREE
`auto`	Cortex Router	FREE (auto-selects)
`switchai:`	Traylinx Cloud	Per-token

3. switchAI Cloud Aliases

Alias	Upstream Model	Best For
`switchai-fast`	`openai/gpt-oss-20b`	Fast tasks
`switchai-chat`	`openai/gpt-oss-20b`	Conversation
`switchai-reasoner`	`deepseek-reasoner`	Deep thinking

🧠 Intelligent Auto-Routing (Cortex)

When the model is auto or auto:<intent>, the Cortex Router automatically selects the best available model using a composite scoring algorithm.

Basic Auto-Routing

curl http://localhost:18080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-test-123" \
  -d '{"model": "auto", "messages": [{"role": "user", "content": "Hello!"}]}'

Intent-Based Routing

# Route to coding-optimized models
curl http://localhost:18080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-test-123" \
  -d '{"model": "auto:coding", "messages": [{"role": "user", "content": "Write a Go sorting algorithm"}]}'

Supported intents: coding, reasoning, creative, fast, secure, vision, audio, web_search, research, chat, cli, long_ctx. Specialized slots — image_gen, transcription, speech, embedding — are used by their dedicated endpoints (see sections below), not by auto chat routing.

How Scoring Works

Each model is scored: FinalScore = (W_a×Availability + W_q×Quota + W_l×Latency + W_s×SuccessRate + TierBoost + PreferenceBoost) × ConservationMultiplier

The model with the highest score wins. The Lab continuously optimizes the weights.

For deep architecture details, see the local docs-site/intelligent-systems/ or the online docs-site.

📊 Management Dashboard & API

Dashboard UI

http://localhost:18080/management

Provides real-time visualization of provider health, auto-routing weights, Lab experiments, and the live routing journal.

Telemetry API

# Get current Lab status + live weights
curl http://localhost:18080/v0/management/autoroute/status

# Get recent routing decisions journal
curl http://localhost:18080/v0/management/autoroute/journal

For the full Management API reference, see references/management-api.md.

🚀 Quick API Usage

curl (simplest)

curl http://localhost:18080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-test-123" \
  -d '{"model": "geminicli:", "messages": [{"role": "user", "content": "Hello!"}]}'

Python

from openai import OpenAI
client = OpenAI(base_url="http://localhost:18080/v1", api_key="sk-test-123")
response = client.chat.completions.create(
    model="geminicli:", 
    messages=[{"role": "user", "content": "Hi!"}]
)

Node.js

import OpenAI from 'openai';
const client = new OpenAI({ baseURL: 'http://localhost:18080/v1', apiKey: 'sk-test-123' });
const response = await client.chat.completions.create({
  model: 'auto',
  messages: [{ role: 'user', content: 'Hello!' }],
});

CLI Attachments & Flags

Pass local context and control autonomy via CLI extensions:

{
  "model": "geminicli:",
  "messages": [{"role": "user", "content": "Fix this code"}],
  "extra_body": {
    "cli": {
      "attachments": [{"type": "folder", "path": "./src"}],
      "flags": {"auto_approve": true, "yolo": true}
    }
  }
}

Streaming

Add "stream": true to any request for SSE token streaming.

🎨 Image Generation

Generate images via the /v1/images/generations endpoint:

curl --location 'http://localhost:18080/v1/images/generations' \
  -H "Authorization: Bearer sk-test-123" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "minimax:image-01",
    "prompt": "A dog wearing a space suit on Mars, photorealistic",
    "aspect_ratio": "16:9",
    "response_format": "url"
  }'

Parameters:

model — Always use minimax:image-01
prompt — Text description of the desired image
aspect_ratio — 1:1, 16:9, 9:16, 4:3, 3:4
response_format — url (returns HTTP URL) or base64 (returns base64 encoded image)

Example Python:

response = client.images.generate(
    model="minimax:image-01",
    prompt="A serene Japanese garden with cherry blossoms",
    aspect_ratio="16:9",
    response_format="url"
)
image_url = response.data[0].url

🔊 Text-to-Speech

Generate audio from text via /v1/audio/speech. switchAILocal ships a MiniMax T2A adapter that translates the OpenAI-shape request to MiniMax's native /v1/t2a_pro API behind the scenes.

curl http://localhost:18080/v1/audio/speech \
  -H "Authorization: Bearer sk-test-123" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "minimax:speech-02-hd",
    "input": "Hello from switchAILocal",
    "voice": "male-qn-qingse",
    "response_format": "mp3"
  }' --output hello.mp3

Voice IDs — use MiniMax-native, NOT OpenAI voice names:

Voice ID	Description
`male-qn-qingse`	Male, calm
`female-shaonv`	Female, young
`audiobook_male_2`	Male, narrator
`presenter_male`	Male, anchor-style
`clever_boy`	Male, youthful

Formats: mp3 (default), pcm, flac, wav. Override bitrate, audio_sample_rate, channel at the top level if needed.

Rate limit: Plus plan = ~1–5 RPM (very strict). MiniMax error 1002 → HTTP 429 → ClassRateLimit in failover taxonomy. Plan throughput accordingly.

🎵 Music & Lyrics Generation

MiniMax-powered music suite — three capabilities under /v1/music/*.

Lyrics (fast, ~2-5s)

curl http://localhost:18080/v1/music/lyrics \
  -H "Authorization: Bearer sk-test-123" \
  -H "Content-Type: application/json" \
  -d '{"mode":"write_full_song","prompt":"a happy pop song about sunshine"}'

Returns {song_title, style_tags, lyrics} with structure tags [Intro], [Verse], [Chorus], [Bridge], [Outro], [Inst].

Modes: write_full_song (default) or edit (extend existing lyrics).

Music generation (text-to-music, ~30-90s sync or ~20s TTFB streaming)

Generate a real song from lyrics. Plan daily quota: 100 songs. Two modes:

Sync (default): blocks until complete, returns JSON with base64 audio.

curl http://localhost:18080/v1/music/generations \
  -H "Authorization: Bearer sk-test-123" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "minimax:music-2.6",
    "lyrics": "[Verse]\nCode flows through the night\n[Chorus]\nDebugging makes it right"
  }' | jq -r .data.audio | base64 -d > song.mp3

Returns {data: {audio: <base64>, format, size_bytes, duration_ms, sample_rate, channels, bitrate}, model, trace_id}. The adapter decodes MiniMax's hex-encoded audio server-side and hands base64 to clients (same convention as OpenAI b64_json).

Streaming (stream: true): returns raw audio/mpeg bytes as MiniMax generates them — first bytes in ~20s, ~50% less wire (adapter drops the duplicated terminal frame MiniMax emits).

curl -N http://localhost:18080/v1/music/generations \
  -H "Authorization: Bearer sk-test-123" \
  -H "Content-Type: application/json" \
  -d '{"model":"minimax:music-2.6","stream":true,"lyrics":"[Verse]\nHello world"}' > song.mp3

Use streaming when UX matters (web player, agent progress signal). Use sync when you need the metadata block (duration_ms, sample_rate, bitrate) upfront — those fields are logged server-side during streaming but not exposed to the client.

Music cover (reference-audio style transfer)

Generate a cover of an existing track in a different style. Same endpoint, different model.

curl http://localhost:18080/v1/music/generations \
  -H "Authorization: Bearer sk-test-123" \
  -d '{
    "model": "minimax:music-cover",
    "prompt": "upbeat jazz cover with saxophone solo",
    "audio_url": "https://example.com/reference.mp3"
  }' | jq -r .data.audio | base64 -d > cover.mp3

Reference audio must be 6s–6min, ≤50MB, formats: mp3/wav/flac. Use audio_url OR audio_base64 (mutually exclusive).

Typical agent workflow: generate lyrics first (/v1/music/lyrics), then synthesize music from them (/v1/music/generations). Both calls share the same MiniMax daily quota bucket (100/day each).

🎤 Audio Transcription (ASR)

Transcribe audio to text via /v1/audio/transcriptions using groq-hosted whisper:

curl http://localhost:18080/v1/audio/transcriptions \
  -H "Authorization: Bearer sk-test-123" \
  -F model=whisper-large-v3 \
  -F [email protected]

Returns {text: "..."}. Supports whisper-large-v3 (accurate) and whisper-large-v3-turbo (fast).

🌐 Web Search (Built-in Tool)

MiniMax M2.7 supports live web search as a native chat tool. Enable via the tools field:

curl http://localhost:18080/v1/chat/completions \
  -H "Authorization: Bearer sk-test-123" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "minimax:MiniMax-M2.7",
    "messages": [{"role":"user","content":"What happened today in tech news?"}],
    "max_tokens": 2000,
    "tools": [{"type": "web_search"}]
  }'

Critical: set max_tokens >= 2000. Web search inflates prompt context to 6k–13k tokens (search results are folded in), so low budgets produce empty responses.

Response shape: the model does the search internally and returns the synthesized answer in choices[0].message.content. There are no tool_calls for the client to execute — MiniMax handles everything server-side.

`ail` CLI Reference

ail start      # Start the local server
ail stop       # Stop the local server
ail restart    # Restart
ail status     # Check status of server and bridge
ail logs -f    # Follow server logs in real-time
ail update     # Pull latest + rebuild

🌲 Decision Tree

What do you need?
├─ FREE + Powerful + Files
│   └─ CLI Providers (geminicli:, claudecli:)
├─ FREE + Private + Fast
│   └─ Local Ollama (ollama:llama3.2)
├─ Ultra-Fast Production
│   └─ Cloud Provider (switchai:switchai-fast)
└─ I don't know, you pick
    └─ Intelligent Routing (auto)

🗺️ Skill References

File	Description
SKILL.md (this file)	Core workflow and quick reference
docs-site/	Comprehensive Manual: AI agents should crawl this directory for complete CLI, API, architecture, and SDK docs.
references/routing.md	Auto-routing config, intent matrix, Lab
references/management-api.md	Full Management & Telemetry API
references/examples.md	Real-world agentic use cases
references/multimodal.md	Vision and image processing
references/steering.md	Conditional routing rules
references/hooks.md	Automation and event hooks
references/memory.md	Analytics and history

For full human-readable documentation: ail.traylinx.com

🛠️ Troubleshooting

Problem	Fix
Connection error	`ail status` → `ail start` if not running
Model not found	Ensure you used the `provider:` prefix
401 Unauthorized	Check API key in `config.yaml`
403 Access Denied	Likely a WAF block; the proxy auto-retries
`auth_unavailable`	`ail restart`
No models listed	Check `ail logs -f` for provider errors

Cross-Cutting Rules (ALL AGENTS MUST FOLLOW)

ALWAYS use provider:model format — never bare model names.
Prefer CLI Providers — they are free and support file attachments.
Use auto for simple tasks — let the Cortex Router pick the best model.
Use ollama: for privacy — local models never send data externally.
Check /v1/models before routing — verify the model exists.
Handle errors gracefully — 503 = provider down, use fallback chain.

Route wisely. Save tokens. Use CLI. 🚀

Installation

Option 1: Use slash command in Claude Code

/install-skill https://github.com/traylinx/switchAILocal

Option 2: Clone to skills directory

# Global (all projects)

git clone https://github.com/traylinx/switchAILocal ~/.claude/skills/switchAILocal

# Project-specific

git clone https://github.com/traylinx/switchAILocal .claude/skills/switchAILocal

Add MCP server to .cursor/mcp.json:

{
  "mcpServers": {
    "skillz": {
      "command": "npx",
      "args": ["-y", "skillz-mcp", "https://github.com/traylinx/switchAILocal"]
    }
  }
}

Restart Cursor after adding the configuration.

Option 1: Use Gemini CLI command

gemini extensions install https://github.com/traylinx/switchAILocal

Option 2: Clone to extensions directory

git clone https://github.com/traylinx/switchAILocal ~/.gemini/extensions/switchAILocal