I Vibe-Coded an iOS Debug Tool That Lets Claude Code Drive My App

foxgem

An offhand “why not?” became a Playwright for native iOS — written almost entirely by Claude Code.

1. Why I built it

I was building an iOS app with Claude Code, and the workflow came with a friction I’d just gotten used to: Claude changes a screen, then I pick up my phone, tap through the app, look at the result, and try to describe what I saw back to Claude. Claude was coding blind. It could write Swift all day, but it had no eyes on the running app and no hands to touch it.

The idea came from watching Claude work on the web. With Playwright connected, it drives a browser by itself — open a page, click, screenshot, read the page, fix, repeat — and checks the result automatically, with no human in the loop. So the “why not?” was simple: let Claude do the same thing to my iOS app. Take a screenshot, see the screen, tap a button, read the internal state. Do that, and the whole loop closes. No more me passing messages between phone and model.

So I brought the idea to Claude Code and we talked it through. The web has Playwright; iOS has XCUITest, but that’s a heavy, separate test runner — not something an agent can reach for in the middle of a chat.

That’s the key point: what I wanted was Playwright without the test framework. No test target, no XCTest setup, no build-and-run-the-suite cycle — just a live bridge into the debug build already running on my phone, that Claude can call any time. That’s what we built: a tiny HTTP server inside the app (debug builds only), an MCP server on the Mac that turns it into tools, and a USB tunnel between them. The agent gets eyes and hands; the production app carries none of it.

It also quietly fixed the other half of the problem: collecting debug data. Screenshots, a live state snapshot, even a recorded video of a flow — Claude grabs them itself, on demand, instead of me capturing each one and pasting it into the chat. The data it needs to reason about a bug now comes straight from the source.

2. How it was designed

The big picture

The tool is a chain of four small parts. Claude Code calls MCP tools; an MCP server on the Mac turns each call into an HTTP request; iproxy carries it over USB to the phone; and a tiny HTTP server inside the app does the actual work and answers back.

flowchart LR
    CC["Claude Code"]
    MCP["MCP server<br/>app-debug-mcp.py<br/>(Mac)"]
    Proxy["iproxy<br/>USB tunnel"]
    App["iOS app<br/>AppDebugServer<br/>(#if DEBUG)"]

    CC <-->|"MCP, stdio"| MCP
    MCP <-->|"HTTP 127.0.0.1:9876"| Proxy
    Proxy <-->|"USB"| App

Each link is deliberately simple — a standard protocol with nothing custom in between:

PartWhereJob
Claude Codethe chatDecides what to do; calls MCP tools
app-debug-mcp.pyon the MacA FastMCP server; wraps each HTTP route as an MCP tool
iproxyterminalForwards Mac port 9876 → device port 9876 over USB
AppDebugServerin the app (#if DEBUG)HTTP server that renders screenshots, runs actions, reports state, records video

Why a USB tunnel at all? On iOS an NWListener can only bind to loopback (127.0.0.1), so the app’s server isn’t reachable over the network. iproxy bridges that gap — a Mac-side port mapped straight to the device.

Inside the app: the debug server

AppDebugServer is the only piece that matters on the device, and it stays small. An NWListener accepts a connection, the request is parsed by hand, a router dispatches on method and path, and each route reaches one capability.

flowchart TB
    Listener["NWListener · 127.0.0.1:9876<br/>(parse HTTP by hand)"]
    Router["Router (method + path)"]
    Listener --> Router

    Router --> R1["GET /screenshot"]
    Router --> R2["GET /actions"]
    Router --> R3["POST /activate"]
    Router --> R4["GET /state"]
    Router --> R5["POST /record/start · stop"]

    R1 --> Window["Key window<br/>UIGraphicsImageRenderer → PNG"]
    R2 --> Registry["AppDebugActionRegistry"]
    R3 --> Registry
    R4 --> Provider["State provider<br/>Encodable → JSON"]
    R5 --> Replay["ReplayKit + AVAssetWriter → MP4"]

    Views["SwiftUI views · .debugAction"] -. "register on appear<br/>unregister on disappear" .-> Registry

Six routes, one MCP tool each:

MCP toolRouteReturns
screenshotGET /screenshotPNG of the current screen
list_actionsGET /actionsIdentifiers registered on the current screen
activatePOST /activateRuns a registered closure by identifier
app_stateGET /stateJSON snapshot of the app’s live state
record_startPOST /record/startStarts a ReplayKit recording
record_stopPOST /record/stopFinishes and returns the MP4

A few notes on the harder corners:

  • The server is minimal. It parses HTTP by hand — read to \r\n\r\n, pull out method, path, and Content-Length, then read the body. No HTTP library, just Network.framework. It all runs on @MainActor, because screenshots and SwiftUI state reads both need the main thread.
  • Screenshots render the current window with UIGraphicsImageRenderer; the Mac bridge base64-encodes the PNG so Claude can actually see it. State comes from the app’s provider, which flattens any non-Codable types into clean JSON.
  • Video is the one tricky piece, for a Swift 6 concurrency reason. ReplayKit fires its handlers on a background queue, while the server lives on @MainActor. If those closures inherited main-actor isolation, the runtime would crash. So the handler factories are nonisolated, each frame rides back to the main actor inside an @unchecked Sendable box, and an AVAssetWriter stitches the frames into H.264.

The split is the point: this in-app code is app-agnostic — it imports only system frameworks (Network, UIKit, AVFoundation, ReplayKit). The only app-specific pieces are the action closures your views register and a small state provider that returns an Encodable snapshot. So the whole thing drops into any iOS app.

Registry, not coordinates

How does activate actually tap something? The obvious first idea was tapping at (x, y)window.hitTest plus accessibilityActivate() — and it’s kept as a fallback. But it’s flaky: SwiftUI gesture views without accessibility traits don’t respond, and coordinates shift with device and layout.

The registry is what holds up. Each screen calls .debugAction("header.settings") { ... } when it appears and removes it when it disappears. activate looks up the identifier and calls the closure directly — no hit-testing, no localization, no fragile geometry. Because registration follows the view lifecycle, the available set changes as screens come and go — which is why the rhythm is always list_actionsactivate: ask what’s there now, then act. Those identifiers are the contract for automation; user-facing labels are localized and never used as keys.

This is the whole loop in one step:

sequenceDiagram
    participant CC as Claude Code
    participant MCP as app-debug (Mac)
    participant Proxy as iproxy (USB)
    participant App as iOS app (AppDebugServer)

    CC->>MCP: list_actions()
    MCP->>Proxy: GET /actions
    Proxy->>App: GET /actions
    App-->>CC: ["header.settings", "tab.history", ...]

    CC->>MCP: activate("header.settings")
    MCP->>Proxy: POST /activate {identifier}
    Proxy->>App: POST /activate
    App->>App: AppDebugActionRegistry → runs @MainActor closure
    App-->>CC: {"ok": true}

    CC->>MCP: screenshot()
    MCP->>Proxy: GET /screenshot
    Proxy->>App: GET /screenshot
    App-->>CC: PNG bytes (Claude sees the image)

Implementation: small pieces, handed off

None of these parts is large on its own — a hand-rolled HTTP loop, a dictionary-backed registry, a view modifier, a recording pipeline, a Python bridge. That’s by design. Once the architecture was settled, each piece was small and well-scoped enough to hand to Claude and build end to end, one at a time. What that handoff actually looked like — and what stayed my job — is the subject of the last section.

3. How to use it

Debug only

Every line on the iOS side sits inside #if DEBUG — the server, the provider, and the start() call in the app’s entry point:

#if DEBUG
let provider = MyDebugStateProvider(/* your coordinators */)
let server = AppDebugServer(port: 9876, stateProvider: provider)
Task { @MainActor in server.start() }
#endif

Release builds compile none of it: no server, no open port, nothing left behind. (NWListener on iOS can only bind to loopback — which is also why the USB tunnel isn’t optional.)

MCP config

Project-scoped, in .mcp.json:

{
  "mcpServers": {
    "app-debug": {
      "type": "stdio",
      "command": "uv",
      "args": ["run", "--script", "tools/app-debug-mcp.py"],
      "env": { "APP_DEBUG_OUTPUT": "debug-output" }
    }
  }
}

Dependencies are declared inline, so uv pulls mcp and requests into an isolated venv on first run — no pip install. Output lands in debug-output/.

Each session

  1. Run the Debug build on a USB-connected, trusted device.
  2. iproxy 9876 9876 in a terminal (from libimobiledevice).
  3. Ask Claude to drive it: list_actionsactivate, then screenshot or app_state to see the result. Wrap flows worth keeping in record_startrecord_stop (wait for {"ok": true} — iOS shows the recording consent prompt first).

A breakpoint in Xcode pauses the process, so in-flight calls time out until you resume. Expected and harmless.

4. Last words

This whole tool came from one “why not?”, and Claude wrote almost all of it — the HTTP server, the registry, the ReplayKit pipeline, concurrency traps and all — while I mostly read diffs. A working debugging tool, in an afternoon. So with the same idea, you can build your own — for your app, in your stack — without much effort.

But don’t mistake “Claude wrote it” for “Claude did it alone.” It wrote the code; I did the rest, and the rest is what made it good:

  • The idea. Claude never proposed giving itself eyes and hands — it built them the moment I pointed it at the goal. Seeing the possibility was mine.
  • The decisions. Registry over coordinate tapping, debug-only over a test target, a thin HTTP bridge over a heavier framework. Claude laid out the trade-offs; the calls were mine.
  • The review. I read the design, challenged the parts that felt wrong, and pushed back until it held up. Claude is fast and capable, not infallible — someone has to keep it honest.

That’s the shape of vibe coding when it works: not handing off the thinking, but doing more of it — about what to build, which way to go, and whether the result is actually right. Claude handles the code now. The judgment is still yours, and it matters more than ever.