browser-agent-server

niveshdandyan/happycapy-browser-agent

Deploy a full-stack multi-model Browser Agent system with FastAPI server, real-time dashboard, VNC streaming, and LLM Council mode. Use when the user asks to set up browser automation, build a browser agent, deploy an AI web agent, create a browser-use server, or needs multi-model browser automation with strategies like council, consensus, fallback chain, or planner-executor.

19 stars

6 forks

Python

58 views

View on GitHub Add to Favorites

SKILL.md

name: browser-agent-server description: Deploy a full-stack multi-model Browser Agent system with FastAPI server, real-time dashboard, VNC streaming, and LLM Council mode. Use when the user asks to set up browser automation, build a browser agent, deploy an AI web agent, create a browser-use server, or needs multi-model browser automation with strategies like council, consensus, fallback chain, or planner-executor.

Browser Agent Server

Full-stack AI browser automation system: FastAPI backend + real-time dashboard + Xvfb/VNC display + multi-model LLM strategies.

Architecture

agent_server.py  (FastAPI + WebSocket + browser-use 0.11.9)
dashboard.html   (single-file SPA: dark UI, live logs, VNC embed, model picker)
start.sh         (startup script with prerequisite checks)

5 Strategies: single, fallback_chain, planner_executor, consensus (per-step judge), council (multi-model failure recovery with loop detection)

Display: Xvfb :98 (configurable) -> x11vnc :5999 -> noVNC/websockify :6080

Live Screen: The dashboard uses WebSocket-streamed screenshots (0.5s intervals) as the primary embedded view. VNC is available via pop-out for interactive control. The server manages its own Xvfb, x11vnc, and noVNC processes automatically on startup.

Setup

1. Install system dependencies

sudo apt-get update && sudo apt-get install -y xvfb x11vnc x11-apps imagemagick novnc

2. Create Python venv and install packages

python3 -m venv /home/node/browser-agent-venv
/home/node/browser-agent-venv/bin/pip install browser-use==0.11.9 fastapi uvicorn[standard] websockets websockify
/home/node/browser-agent-venv/bin/python3 -m playwright install chromium

IMPORTANT: websockify must be installed in the venv (or available system-wide). The server auto-detects it from the venv's bin/ directory first, then falls back to system PATH.

2b. CRITICAL: Install Chromium shared library dependencies

Without this step, Chromium will fail with libatk-1.0.so.0: cannot open shared object file or similar errors, causing a 30-second timeout on browser launch.

/home/node/browser-agent-venv/bin/python3 -m playwright install-deps chromium

This installs ~40 system libraries (libatk, libasound, libxkbcommon, fonts, etc.) that Chromium requires at runtime. This is separate from playwright install chromium which only downloads the browser binary.

2c. Fix broken venv symlinks (if needed)

If the venv's python3 symlink is broken (e.g., after system upgrades), fix it:

ln -sf /usr/bin/python3 /home/node/browser-agent-venv/bin/python3

3. Deploy application files

Copy bundled scripts to a project directory:

DEST="./outputs/browser-agent"
mkdir -p "$DEST"
cp ~/.claude/skills/happycapy-browser-agent/scripts/agent_server.py "$DEST/"
cp ~/.claude/skills/happycapy-browser-agent/scripts/dashboard.html "$DEST/"
cp ~/.claude/skills/happycapy-browser-agent/scripts/start.sh "$DEST/"
chmod +x "$DEST/start.sh"

4. Configure environment

# Required: LLM API key (OpenAI-compatible gateway)
export AI_GATEWAY_API_KEY="your-key"

# Optional: custom port (default 8888)
export AGENT_PORT=8888

# Optional: display number (default 98, avoids conflict with system Xvfb on :99)
export DISPLAY_NUM=98

# Optional: virtual display resolution (default 1280x1024)
export SCREEN_WIDTH=1280
export SCREEN_HEIGHT=1024

# REQUIRED for sandbox environments: set the public noVNC URL for dashboard VNC pop-out
# Replace with the actual exported URL from step 6
export NOVNC_PUBLIC_URL="https://YOUR-NOVNC-URL/vnc.html?host=YOUR-HOST&port=443&encrypt=1&autoconnect=true&resize=scale&scaleViewport=true"

5. Start

cd "$DEST"
/home/node/browser-agent-venv/bin/python3 agent_server.py

The server automatically starts Xvfb, x11vnc, and noVNC. If an Xvfb is already running on the target display, it reuses it instead of failing.

6. Export ports (sandbox environments)

/app/export-port.sh $AGENT_PORT   # Dashboard (default 8888)
/app/export-port.sh 6080          # noVNC (for VNC pop-out, set NOVNC_PUBLIC_URL with exported URL)

Note: Port 3001 is reserved. Do not use it. If port 8888 is already in use, set AGENT_PORT to another value (e.g., 9222).

API Reference

Endpoint	Method	Description
`/`	GET	Dashboard HTML
`/api/models`	GET	Available models + strategies
`/api/agent/start`	POST	Start task (JSON body below)
`/api/agent/stop`	POST	Stop running task
`/api/agent/status`	GET	Current status, action_log, result
`/ws`	WebSocket	Real-time updates (step, status, screenshot, judge_verdict, council_verdict)

Start task body

{
  "task": "Go to google.com and search for AI",
  "max_steps": 50,
  "model_config_data": {
    "strategy": "council",
    "primary_model": "openai/gpt-4o",
    "secondary_model": "",
    "council_members": ["moonshotai/kimi-k2.5", "google/gemini-2.5-flash", "google/gemini-2.5-pro"]
  }
}

WebSocket start (from dashboard)

{
  "type": "start_task",
  "task": "...",
  "max_steps": 50,
  "model_config": { "strategy": "council", "primary_model": "openai/gpt-4o", "council_members": [...] }
}

Strategy Details

Strategy	How it works	When to use
`single`	One model, all steps	Simple tasks, cost-sensitive
`fallback_chain`	Primary runs; switches to secondary on error/rate-limit	Reliability
`planner_executor`	Strong model plans first; fast model executes	Complex multi-step
`consensus`	Primary acts; judge model validates every step in real-time	Quality-critical
`council`	Primary runs; on repeated failure/loop/stall, ALL council models convene to diagnose, advise, and replan	Hard tasks, anti-stall

Council Mode Details

Failure trigger: consecutive_failures >= 2
Loop trigger (3-tier): Strict fingerprint match (3 repeats), loose action-type match (4 repeats), same-URL stall with no progress (5 repeats)
Stall trigger: Single step running > 60 seconds
Feedback injection: Council verdict injected via ActionResult.long_term_memory (agent sees it next step)
Replan: Council can replace agent.state.plan with revised steps
Cooldown: 3 steps between loop-triggered councils to prevent meta-loops

Available Models (AI Gateway)

Configure in AVAILABLE_MODELS list in agent_server.py:

AVAILABLE_MODELS = [
    {"id": "openai/gpt-4o", "name": "GPT-4o", "tier": "fast", "vision": True},
    {"id": "moonshotai/kimi-k2.5", "name": "Kimi K2.5", "tier": "fast", "vision": True},
    {"id": "google/gemini-2.5-flash", "name": "Gemini 2.5 Flash", "tier": "fast", "vision": True},
    {"id": "google/gemini-2.5-pro", "name": "Gemini 2.5 Pro", "tier": "reasoning", "vision": True},
]

To add models: add to this list and they appear in dashboard dropdown + available as council members.

Troubleshooting

Browser launch timeout (`BrowserStartEvent timed out after 30.0s`)

Chromium is missing shared libraries. Fix:

/home/node/browser-agent-venv/bin/python3 -m playwright install-deps chromium

This installs libatk, libasound, libxkbcommon, fonts, etc. Must run after playwright install chromium.

Verify Chromium works

DISPLAY=:98 /home/node/.cache/ms-playwright/chromium-*/chrome-linux64/chrome --version

If it prints a version, it's working. If it errors with cannot open shared object file, run install-deps above.

Xvfb lock file error (`Server is already active for display :98`)

The server now auto-detects and reuses existing Xvfb processes. If you still get lock file errors:

rm -f /tmp/.X98-lock

To use a different display number:

export DISPLAY_NUM=97   # or any unused display number

websockify not found

The server auto-detects websockify from the Python venv first, then falls back to system PATH. Ensure it's installed:

/home/node/browser-agent-venv/bin/pip install websockify

Broken Python venv symlinks

If python3 in the venv is a broken symlink:

ln -sf /usr/bin/python3 /home/node/browser-agent-venv/bin/python3

Port already in use

export AGENT_PORT=9222   # or any free port (avoid 3001 - reserved)

noVNC not loading

Ensure novnc system package is installed (/usr/share/novnc/ must exist):

sudo apt-get install -y novnc

Dashboard screen not showing / tiny / wrong size

The dashboard uses WebSocket-streamed screenshots as the primary live view. If the screen appears wrong:

Hard-refresh the browser (Ctrl+Shift+R) to clear cached CSS
Increase Xvfb resolution: export SCREEN_WIDTH=1280 SCREEN_HEIGHT=1024
The default display :98 avoids conflicts with system-managed Xvfb on :99

Key Implementation Notes

browser-use ChatOpenAI returns ChatInvokeCompletion with .completion field (NOT .content)
agent.state.plan is mutable from on_step_end hook -- changes affect next step
ActionResult.long_term_memory gets injected into next step's context via MessageManager
agent.state.consecutive_failures tracks errors; reset on success
The on_step_end hook signature: AgentHookFunc = Callable[['Agent'], Awaitable[None]]
Dashboard is a single HTML file with inline CSS/JS (no build step)
noVNC served from system install at /usr/share/novnc/
Dashboard live screen uses object-fit: fill with absolute positioning for full panel coverage
Body uses flexbox layout (display: flex; flex-direction: column) to prevent viewport overflow
CSS Grid cells use min-width: 0 to prevent grid blowout from oversized content
Screenshots are streamed at native Xvfb resolution (no server-side resize) for best quality
The server reuses existing Xvfb if one is already running on the target display

Installation

Option 1: Use slash command in Claude Code

/install-skill https://github.com/niveshdandyan/happycapy-browser-agent

Option 2: Clone to skills directory

# Global (all projects)

git clone https://github.com/niveshdandyan/happycapy-browser-agent ~/.claude/skills/happycapy-browser-agent

# Project-specific

git clone https://github.com/niveshdandyan/happycapy-browser-agent .claude/skills/happycapy-browser-agent

Add MCP server to .cursor/mcp.json:

{
  "mcpServers": {
    "skillz": {
      "command": "npx",
      "args": ["-y", "skillz-mcp", "https://github.com/niveshdandyan/happycapy-browser-agent"]
    }
  }
}

Restart Cursor after adding the configuration.

Option 1: Use Gemini CLI command

gemini extensions install https://github.com/niveshdandyan/happycapy-browser-agent

Option 2: Clone to extensions directory

git clone https://github.com/niveshdandyan/happycapy-browser-agent ~/.gemini/extensions/happycapy-browser-agent