An offhand “why not?” became a Playwright for native iOS — written almost entirely by Claude Code.
1. Why I built it
I was building an iOS app with Claude Code, and the workflow came with a friction I’d just gotten used to: Claude changes a screen, then I pick up my phone, tap through the app, look at the result, and try to describe what I saw back to Claude. Claude was coding blind. It could write Swift all day, but it had no eyes on the running app and no hands to touch it.
The idea came from watching Claude work on the web. With Playwright connected, it drives a browser by itself — open a page, click, screenshot, read the page, fix, repeat — and checks the result automatically, with no human in the loop. So the “why not?” was simple: let Claude do the same thing to my iOS app. Take a screenshot, see the screen, tap a button, read the internal state. Do that, and the whole loop closes. No more me passing messages between phone and model.
So I brought the idea to Claude Code and we talked it through. The web has Playwright; iOS has XCUITest, but that’s a heavy, separate test runner — not something an agent can reach for in the middle of a chat.
That’s the key point: what I wanted was Playwright without the test framework. No test target, no XCTest setup, no build-and-run-the-suite cycle — just a live bridge into the debug build already running on my phone, that Claude can call any time. That’s what we built: a tiny HTTP server inside the app (debug builds only), an MCP server on the Mac that turns it into tools, and a USB tunnel between them. The agent gets eyes and hands; the production app carries none of it.
It also quietly fixed the other half of the problem: collecting debug data. Screenshots, a live state snapshot, even a recorded video of a flow — Claude grabs them itself, on demand, instead of me capturing each one and pasting it into the chat. The data it needs to reason about a bug now comes straight from the source.
2. How it was designed
The big picture
The tool is a chain of four small parts. Claude Code calls MCP tools; an MCP server on the Mac turns each call into an HTTP request; iproxy carries it over USB to the phone; and a tiny HTTP server inside the app does the actual work and answers back.
flowchart LR
CC["Claude Code"]
MCP["MCP server<br/>app-debug-mcp.py<br/>(Mac)"]
Proxy["iproxy<br/>USB tunnel"]
App["iOS app<br/>AppDebugServer<br/>(#if DEBUG)"]
CC <-->|"MCP, stdio"| MCP
MCP <-->|"HTTP 127.0.0.1:9876"| Proxy
Proxy <-->|"USB"| App
Each link is deliberately simple — a standard protocol with nothing custom in between:
| Part | Where | Job |
|---|---|---|
| Claude Code | the chat | Decides what to do; calls MCP tools |
app-debug-mcp.py | on the Mac | A FastMCP server; wraps each HTTP route as an MCP tool |
iproxy | terminal | Forwards Mac port 9876 → device port 9876 over USB |
AppDebugServer | in the app (#if DEBUG) | HTTP server that renders screenshots, runs actions, reports state, records video |
Why a USB tunnel at all? On iOS an NWListener can only bind to loopback (127.0.0.1), so the app’s server isn’t reachable over the network. iproxy bridges that gap — a Mac-side port mapped straight to the device.
Inside the app: the debug server
AppDebugServer is the only piece that matters on the device, and it stays small. An NWListener accepts a connection, the request is parsed by hand, a router dispatches on method and path, and each route reaches one capability.
flowchart TB
Listener["NWListener · 127.0.0.1:9876<br/>(parse HTTP by hand)"]
Router["Router (method + path)"]
Listener --> Router
Router --> R1["GET /screenshot"]
Router --> R2["GET /actions"]
Router --> R3["POST /activate"]
Router --> R4["GET /state"]
Router --> R5["POST /record/start · stop"]
R1 --> Window["Key window<br/>UIGraphicsImageRenderer → PNG"]
R2 --> Registry["AppDebugActionRegistry"]
R3 --> Registry
R4 --> Provider["State provider<br/>Encodable → JSON"]
R5 --> Replay["ReplayKit + AVAssetWriter → MP4"]
Views["SwiftUI views · .debugAction"] -. "register on appear<br/>unregister on disappear" .-> Registry
Six routes, one MCP tool each:
| MCP tool | Route | Returns |
|---|---|---|
screenshot | GET /screenshot | PNG of the current screen |
list_actions | GET /actions | Identifiers registered on the current screen |
activate | POST /activate | Runs a registered closure by identifier |
app_state | GET /state | JSON snapshot of the app’s live state |
record_start | POST /record/start | Starts a ReplayKit recording |
record_stop | POST /record/stop | Finishes and returns the MP4 |
A few notes on the harder corners:
- The server is minimal. It parses HTTP by hand — read to
\r\n\r\n, pull out method, path, andContent-Length, then read the body. No HTTP library, justNetwork.framework. It all runs on@MainActor, because screenshots and SwiftUI state reads both need the main thread. - Screenshots render the current window with
UIGraphicsImageRenderer; the Mac bridge base64-encodes the PNG so Claude can actually see it. State comes from the app’s provider, which flattens any non-Codabletypes into clean JSON. - Video is the one tricky piece, for a Swift 6 concurrency reason. ReplayKit fires its handlers on a background queue, while the server lives on
@MainActor. If those closures inherited main-actor isolation, the runtime would crash. So the handler factories arenonisolated, each frame rides back to the main actor inside an@unchecked Sendablebox, and anAVAssetWriterstitches the frames into H.264.
The split is the point: this in-app code is app-agnostic — it imports only system frameworks (Network, UIKit, AVFoundation, ReplayKit). The only app-specific pieces are the action closures your views register and a small state provider that returns an Encodable snapshot. So the whole thing drops into any iOS app.
Registry, not coordinates
How does activate actually tap something? The obvious first idea was tapping at (x, y) — window.hitTest plus accessibilityActivate() — and it’s kept as a fallback. But it’s flaky: SwiftUI gesture views without accessibility traits don’t respond, and coordinates shift with device and layout.
The registry is what holds up. Each screen calls .debugAction("header.settings") { ... } when it appears and removes it when it disappears. activate looks up the identifier and calls the closure directly — no hit-testing, no localization, no fragile geometry. Because registration follows the view lifecycle, the available set changes as screens come and go — which is why the rhythm is always list_actions → activate: ask what’s there now, then act. Those identifiers are the contract for automation; user-facing labels are localized and never used as keys.
This is the whole loop in one step:
sequenceDiagram
participant CC as Claude Code
participant MCP as app-debug (Mac)
participant Proxy as iproxy (USB)
participant App as iOS app (AppDebugServer)
CC->>MCP: list_actions()
MCP->>Proxy: GET /actions
Proxy->>App: GET /actions
App-->>CC: ["header.settings", "tab.history", ...]
CC->>MCP: activate("header.settings")
MCP->>Proxy: POST /activate {identifier}
Proxy->>App: POST /activate
App->>App: AppDebugActionRegistry → runs @MainActor closure
App-->>CC: {"ok": true}
CC->>MCP: screenshot()
MCP->>Proxy: GET /screenshot
Proxy->>App: GET /screenshot
App-->>CC: PNG bytes (Claude sees the image)
Implementation: small pieces, handed off
None of these parts is large on its own — a hand-rolled HTTP loop, a dictionary-backed registry, a view modifier, a recording pipeline, a Python bridge. That’s by design. Once the architecture was settled, each piece was small and well-scoped enough to hand to Claude and build end to end, one at a time. What that handoff actually looked like — and what stayed my job — is the subject of the last section.
3. How to use it
Debug only
Every line on the iOS side sits inside #if DEBUG — the server, the provider, and the start() call in the app’s entry point:
#if DEBUG
let provider = MyDebugStateProvider(/* your coordinators */)
let server = AppDebugServer(port: 9876, stateProvider: provider)
Task { @MainActor in server.start() }
#endif
Release builds compile none of it: no server, no open port, nothing left behind. (NWListener on iOS can only bind to loopback — which is also why the USB tunnel isn’t optional.)
MCP config
Project-scoped, in .mcp.json:
{
"mcpServers": {
"app-debug": {
"type": "stdio",
"command": "uv",
"args": ["run", "--script", "tools/app-debug-mcp.py"],
"env": { "APP_DEBUG_OUTPUT": "debug-output" }
}
}
}
Dependencies are declared inline, so uv pulls mcp and requests into an isolated venv on first run — no pip install. Output lands in debug-output/.
Each session
- Run the Debug build on a USB-connected, trusted device.
iproxy 9876 9876in a terminal (fromlibimobiledevice).- Ask Claude to drive it:
list_actions→activate, thenscreenshotorapp_stateto see the result. Wrap flows worth keeping inrecord_start…record_stop(wait for{"ok": true}— iOS shows the recording consent prompt first).
A breakpoint in Xcode pauses the process, so in-flight calls time out until you resume. Expected and harmless.
4. Last words
This whole tool came from one “why not?”, and Claude wrote almost all of it — the HTTP server, the registry, the ReplayKit pipeline, concurrency traps and all — while I mostly read diffs. A working debugging tool, in an afternoon. So with the same idea, you can build your own — for your app, in your stack — without much effort.
But don’t mistake “Claude wrote it” for “Claude did it alone.” It wrote the code; I did the rest, and the rest is what made it good:
- The idea. Claude never proposed giving itself eyes and hands — it built them the moment I pointed it at the goal. Seeing the possibility was mine.
- The decisions. Registry over coordinate tapping, debug-only over a test target, a thin HTTP bridge over a heavier framework. Claude laid out the trade-offs; the calls were mine.
- The review. I read the design, challenged the parts that felt wrong, and pushed back until it held up. Claude is fast and capable, not infallible — someone has to keep it honest.
That’s the shape of vibe coding when it works: not handing off the thinking, but doing more of it — about what to build, which way to go, and whether the result is actually right. Claude handles the code now. The judgment is still yours, and it matters more than ever.