browser-harness
yangchuansheng/browser-harness-rustDirect browser control via CDP. Use when the user wants to automate, scrape, test, or interact with web pages. Connects to the user's already-running Chrome.
SKILL.md
name: browser-harness description: Direct browser control via CDP. Use when the user wants to automate, scrape, test, or interact with web pages. Connects to the user's already-running Chrome.
browser-harness
Easiest and most powerful way to interact with the browser. Read this file in full before using or editing the harness — it has to be in context.
Fast start
Read install.md first for first-time install or reconnect/bootstrap.
Rust now owns the runtime/control plane. The preferred entrypoint is the installed Rust-native CLI:
browser-harness --help
browser-harness ensure-daemon
browser-harness page-info <<'JSON'
{"daemon_name":"default"}
JSON
If the global command is not installed yet, use the repo-local fallback:
cd rust
cargo run --quiet --bin browser-harness -- --help
- For repo-local Rust-native use, invoke via
cargo run --quiet --bin browser-harness -- .... - First navigation is
new_tab(url), notgoto(url)—gotoruns in the user's active tab and clobbers their work.
The code is the doc.
Available interaction skills:
interaction-skills/connection.md— startup sequence, tab visibility, omnibox popup fixinteraction-skills/screenshots.md— viewport/full screenshots andmax_diminteraction-skills/network-requests.md— request/response waits and network-idle waitsinteraction-skills/dropdowns.md— native selects, overlays, and comboboxes
Domain skills are opt-in. Set BH_DOMAIN_SKILLS=1 for site-specific work;
then search domains/ and read every file in the matching domains/<site>/
directory before inventing selectors, flows, or private API calls.
Tool call shape
browser-harness page-info <<'JSON'
{"daemon_name":"default"}
JSON
The Rust-native CLI is the canonical interface. If you need to call it from Python, keep that as a thin subprocess wrapper outside the runtime layer.
Core operations now include upstream helper parity for SPA and framework-heavy pages:
browser-harness wait-for-element <<'JSON'
{"daemon_name":"default","selector":"[data-testid=submit]","timeout":10.0,"visible":true}
JSON
browser-harness fill-input <<'JSON'
{"daemon_name":"default","selector":"input[name=q]","text":"browser harness","clear_first":true,"timeout":5.0}
JSON
browser-harness wait-for-network-idle <<'JSON'
{"daemon_name":"default","timeout":10.0,"idle_ms":500}
JSON
browser-harness screenshot <<'JSON'
{"daemon_name":"default","full":true,"max_dim":1800}
JSON
browser-harness close-tab <<'JSON'
{"daemon_name":"default"}
JSON
bhrun exposes the same operation names for WASM guests through
bh_guest_sdk::{wait_for_element, fill_input, wait_for_network_idle, screenshot_with_max_dim, close_tab}.
Remote browsers
Use remote for parallel sub-agents (each gets its own isolated browser via a distinct BU_NAME) or on a headless server. BROWSER_USE_API_KEY must be set.
browser-harness create-browser <<'JSON'
{"timeout":120}
JSON
BU_NAME=work browser-harness ensure-daemon <<'JSON'
{"name":"work","env":{"BU_CDP_WS":"<cdpWsUrl-from-create-browser>","BU_BROWSER_ID":"<browser-id>"}}
JSON
BU_NAME=work browser-harness new-tab <<'JSON'
{"daemon_name":"work","url":"https://example.com"}
JSON
create-browser returns id, cdpWsUrl, and liveUrl. Share liveUrl with
the user if they need to watch the session. When the daemon is stopped with the
matching BU_BROWSER_ID, it PATCHes the cloud browser to stop, which ends
billing and persists profile state.
Profiles (cookies-only login state) live in interaction-skills/profile-sync.md
and map to:
browser-harness list-cloud-profilesbrowser-harness resolve-profile-name <name>browser-harness list-local-profilesbrowser-harness sync-local-profile
If you intentionally want Python around the Rust CLI, use the small
subprocess wrappers in docs/python-integration.md.
Search first
Domain skills are dormant unless BH_DOMAIN_SKILLS=1 is set. When enabled and
the task is site-specific, search domains/ first and read every file in the
matching domains/<site>/ directory before inventing a new approach.
Examples in domains/ use helper-style operation names such as http_get(),
goto(), new_tab(), wait(), page_info(), and js().
Treat those as conceptual Browser Harness operations that can be mapped to
browser-harness, bhrun, or a guest. They are not a requirement to use
Python, and they do not imply that every domain should have a Rust guest.
Only if you start struggling with a specific mechanic while navigating, look in interaction-skills/ for helpers. The available interaction skills are:
cookies.mdcross-origin-iframes.mddialogs.mddownloads.mddrag-and-drop.mddropdowns.mdiframes.mdnetwork-requests.mdprint-as-pdf.mdprofile-sync.mdscreenshots.mdscrolling.mdshadow-dom.mdtabs.mduploads.mdviewport.md
Useful commands:
rg --files domains
rg -n "tiktok|upload" domains
Always contribute back
If you learned anything non-obvious about how a site works, open a PR to domains/<site>/skill.md before you finish. Default to contributing. The harness gets better only because agents file what they learn. If figuring something out cost you a few steps, the next run should not pay the same tax.
Examples of what's worth a PR:
- A private API the page calls (XHR/fetch endpoint, request shape, auth) — often 10× faster than DOM scraping.
- A stable selector that beats the obvious one, or an obfuscated CSS-module class to avoid.
- A framework quirk — "the dropdown is a React combobox that only commits on Escape", "this Vue list only renders rows inside its own scroll container, so
scrollIntoViewon the row doesn't work — you have to scroll the container". - A URL pattern — direct route, required query params (
?lang=en,?th=1), a variant that skips a loader. - A wait that
wait_for_load()misses, with the reason. - A trap — stale drafts, legacy IDs that now return null, unicode quirks, beforeunload dialogs, CAPTCHA surfaces.
What a domain skill should capture
The durable shape of the site — the map, not the diary. Focus on what the next agent on this site needs to know before it starts:
- URL patterns and query params.
- Private APIs and their payload shape.
- Stable selectors (
data-*,aria-*,role, semantic classes). - Site structure — containers, items per page, framework, where state lives.
- Framework/interaction quirks unique to this site.
- Waits and the reasons they're needed.
- Traps and the selectors that don't work.
Do not write
- Raw pixel coordinates. They break on viewport, zoom, and layout changes. Describe how to locate the target (selector,
scrollIntoView,aria-label, visible text) — never where it happened to be on your screen. - Run narration or step-by-step of the specific task you just did.
- Secrets, cookies, session tokens, user-specific state.
domains/is shared and public.
What actually works
- Screenshots first: use
screenshot()to understand the current page quickly, find visible targets, and decide whether you need a click, a selector, or more navigation. - Clicking:
screenshot()→ look →click(x, y)→screenshot()again to verify the result. Coordinate clicks pass through iframes/shadow/cross-origin at the compositor level. - Bulk HTTP:
browser-harness http-getor a thin subprocess wrapper +ThreadPoolExecutor. No browser for static pages (249 Netflix pages in 2.8s). - After goto:
wait_for_load(). - Wrong/stale tab:
ensure_real_tab(). Use it when the current tab is stale or internal; the daemon also auto-recovers from stale sessions on the next call. - Verification:
print(page_info())is the simplest "is this alive?" check, but screenshots are the default way to verify whether a visible action actually worked. - DOM reads: use
js(...)for inspection and extraction when the screenshot shows that coordinates are the wrong tool. - Iframe sites (Azure blades, Salesforce):
click(x, y)passes through; only drop to iframe DOM work when coordinate clicks are the wrong tool. - Auth wall: redirected to login → stop and ask the user. Don't type credentials from screenshots.
- Raw CDP for typed gaps:
browser-harness cdp-raw <<'JSON' ... JSONorbh_guest_sdk::cdp_raw(...).
Design constraints
- Coordinate clicks default.
Input.dispatchMouseEventgoes through iframes/shadow/cross-origin at the compositor level. - Connect to the user's running Chrome. Don't launch your own browser.
cdp-useis only forCDPClient.send_raw. Prefer raw CDP strings over typed wrappers.- The Rust CLI stays thin.
browser-harnessis only a facade overbhctlandbhrun. - Use the Rust CLI directly. Prefer
browser-harness,bhrun,bhctl, or thin subprocess wrappers fromdocs/python-integration.md. - Don't add a manager layer. No retries framework, session manager, daemon supervisor, config system, or logging framework.
Architecture
Chrome / Browser Use cloud -> CDP WS -> bhd -> /tmp/bu-<NAME>.sock -> bhrun / bhctl -> browser-harness
- Protocol is one JSON line each way.
- Requests are
{method, params, session_id}for CDP or{meta: ...}for daemon control. - Responses are
{result}/{error}/{events}/{session_id}. BU_NAMEnamespaces daemon identity.BH_RUNTIME_DIRcontrols socket/pid placement;BH_TMP_DIRcontrols logs and screenshots.BU_CDP_WSoverrides local Chrome discovery for remote browsers.BU_CDP_URLpoints at a DevTools HTTP endpoint such ashttp://127.0.0.1:9222.BU_BROWSER_ID+BROWSER_USE_API_KEYlets the daemon stop a Browser Use cloud browser on shutdown.
Gotchas (field-tested)
- Chrome 144+
chrome://inspect/#remote-debuggingdoes NOT always serve/json/version. The Rust discovery path falls back toDevToolsActivePortwhen/json/versionreturns 404. - Try attaching before asking for setup. If
browser-harness page-infoalready works, skip the remote-debugging instructions entirely. Decide what to escalate from the harness's error message, not from whether Chrome is visibly running. - The remote-debugging checkbox is per-profile sticky in Chrome. Once ticked on a profile, every future Chrome launch auto-enables CDP — only navigate to
chrome://inspect/#remote-debuggingwhenDevToolsActivePortis genuinely missing on a fresh profile. - The first connect may block on Chrome's Allow dialog. If setup hangs, explicitly tell the user to click
Allowin Chrome if it appears, then keep polling for up to 30 seconds instead of treating follow-on errors as a new failure. DevToolsActivePortcan exist before the port is actually listening. Treat connection refused as "still enabling" and keep polling for up to 30 seconds.- Chrome may open the profile picker before any real tab exists. If Chrome opens both a profile picker and the remote-debugging page, tell the user to choose their normal profile first, then tick the checkbox and click
Allowif shown. - On macOS, if Chrome is already running, prefer AppleScript
open locationoveropen -a ... URL. It reuses the current profile and avoids creating an extra startup path through the profile picker. - Omnibox popups are fake
pagetargets. Filterchrome://omnibox-popup...and other internals when you need a real tab. - CDP target order != Chrome's visible tab-strip order. Use UI automation when the user means "the first/second tab I can see";
Target.activateTargetonly shows a known target. - Default daemon sessions can go stale.
ensure_real_tab()re-attaches to a real page. no close frame received or sentusually means a stale daemon / websocket. Restart the daemon once with:cd rust && cargo run --quiet --bin browser-harness -- restart-daemonbefore assuming setup is wrong.- If
restart_daemon()also hangs, kill Chrome entirely (pkill -9 -f "Google Chrome"), clean sockets (rm -f /tmp/bu-default.sock /tmp/bu-default.pid), reopen Chrome (open -a "Google Chrome"), wait 5s, then reconnect. This resets all CDP state. - Browser Use API is camelCase on the wire.
cdpUrl,proxyCountryCode, etc. - Remote
cdpUrlis HTTPS, not ws. Set it asBU_CDP_URLand resolve the websocket URL via/json/version, or set the already-resolved websocket asBU_CDP_WS. - Local discovery covers Chrome/Chromium, Chrome Canary, Edge channels, Brave, Arc, Dia, Comet, Windows Chrome SxS, and Flatpak profile directories. If the profile stores
DevToolsActivePort, the daemon can attach without hard-coded browser paths. - Use
BH_RUNTIME_DIRon systems with short Unix socket limits. Keep socket and pid files in a short path, whileBH_TMP_DIRcan point at a longer persistent log/screenshot directory. - Stop cloud browsers with
PATCH /browsers/{id}+{\"action\":\"stop\"}. - After every meaningful action, re-screenshot before assuming it worked. Use the image to verify changed state, open menus, navigation, visible errors, and whether the page is in the state you expected.
- Use screenshots to drive exploration. They are often the fastest way to find the next click target, notice hidden blockers, and decide if a selector is even worth writing.
- Prefer compositor-level actions over framework hacks. Try screenshots, coordinate clicks, and raw key input before adding DOM-specific workarounds.
- If you need framework-specific DOM tricks, check
interaction-skills/first. That is where dropdown, dialog, iframe, shadow DOM, and form-specific guidance belongs.
Interaction notes
interaction-skills/holds reusable UI mechanics such as dialogs, tabs, dropdowns, iframes, and uploads.domains/is the active home for site-specific workflows.