CTRL K

    Startups & Tech · 1 Jun 2026 · 10 min read

    Maestro + Claude Code: your app tested in the simulator like Playwright tests the web

    You can have Claude navigate and test your app in the simulator like Playwright does on the web. With Maestro: one YAML, iOS and Android, zero instrumentation.

    TL;DR

    Claude Code already navigates your site on its own through Playwright: it clicks, fills, validates regressions. For mobile apps you can do the same thing, but nobody really explains how. I went digging. The answer is Maestro, an open source mobile E2E testing framework with flows written in YAML, plugged into Claude Code. A single test file runs the same on iOS and Android, on top of the compiled binary, without instrumenting the app. React Native, native, or Flutter, doesn’t matter. Claude inspects the screen, writes the flow, runs it, and fixes what breaks. And no: the right path is NOT “giving Claude access to the screen”. Screenshot by coordinate is the last resort, not the first.

    This post is the setup I put together to close on mobile the gap that Playwright already closed for the web. Here at Nextside it’s not a production pipeline yet. It’s the path I’m adopting, with the engineering broken down, the commands in hand, and the numbers from people who already walked it.

    It’s not “giving access to the screen”. It’s reading the tree.

    When I bring this up, the same question always comes, and I asked it myself at first: “can’t Claude just look at the screen and tap, like a human?”. It can. It’s called Computer Use: Claude controls the interface through screenshots and clicks on pixel coordinates. It launched in Claude Code in March 2026, it drives the simulator, it works for a demo.

    But it’s the wrong way for testing.

    The Playwright you already use never looked at a single pixel. It reads the accessibility tree, the structured tree that describes “button labeled Sign in, here”. It acts by element, not by coordinate. That’s why it’s fast and doesn’t hallucinate where to click.

    The difference is measurable in tokens: the accessibility tree of one screen comes in at around 10 tokens, and a screenshot of the same screen costs 1,600 to 6,300. Multiply that by every step of a twenty-screen test and you get why vision doesn’t scale in a QA loop.

    At the bottom there are three ways to make Claude touch the app, from best to worst:

    Anthropic itself orders it this way. The Claude Code tool hierarchy is MCP first, then shell, then Chrome, and it only falls to screen control when nothing else reaches: “native apps, simulators, and tools without an API”.

    Screenshot is the last resort, not the first.

    Maestro: one YAML, iOS and Android, zero instrumentation

    If the right way is to read the tree, I need a tool that exposes the simulator’s tree to Claude. There are several. I settled on Maestro, and for anyone keeping React Native and native apps, it wins on three concrete counts:

    A Maestro flow starts simple like this:

    appId: com.yourcompany.app
    ---
    - launchApp
    - tapOn: { id: "login_button" }
    - inputText: "user@nextside.tech"
    - tapOn: "Sign in"
    - assertVisible: "Welcome"

    appId, three dashes, and the commands in almost natural language: launchApp, tapOn, inputText, assertVisible. Anyone who’s never seen it gets it in ten seconds.

    Where it gets serious is reuse. The login repeats in every test, so you pull it out once and call it with runFlow:

    # flows/login.yaml
    appId: com.yourcompany.app
    ---
    - launchApp: { clearState: true }
    - tapOn: { id: "login_button" }
    - inputText: "user@nextside.tech"
    - tapOn: "Sign in"
    # flows/checkout.yaml
    appId: com.yourcompany.app
    ---
    - runFlow: login.yaml          # reuses the whole login
    - tapOn: { id: "product_42" }
    - scrollUntilVisible:
        element: { text: "Checkout" }
    - tapOn: "Checkout"
    - assertVisible: "Order confirmed"

    Change the login rule in one place, it holds across the twenty tests that call it. Notice the scrollUntilVisible and the clearState: true: Maestro has a command to scroll until it finds, clear state, change permission, set location. And it waits for the element to show up on its own, without you scattering sleep across the test. Sleep is a smell of a badly written test, here you don’t need it.

    Same file. iOS and Android. Without touching the app’s code.

    From zero to the first test

    The real “how to use it” starts before Claude. You need three things on the machine:

    With that in place, install Maestro in one command:

    curl -fsSL "https://get.maestro.mobile.dev" | bash
    # or, on macOS, via Homebrew:
    # brew install mobile-dev-inc/tap/maestro
    maestro --help   # confirms it's alive

    Boot a simulator (or emulator), install your app on it, and run the flow:

    maestro test flows/checkout.yaml     # one flow
    maestro test flows/                  # the whole folder

    That alone already gives you E2E tests running locally, with no AI at all. AI comes in so you stop writing these YAMLs by hand.

    The loop in practice: Claude writes the test looking at the app

    Connect Maestro to Claude Code in one command:

    claude mcp add maestro -- maestro mcp

    That hands Claude a handful of tools: inspect_screen (grabs the screen’s view hierarchy as compact JSON), run (executes a flow), and open_maestro_viewer (embeds the simulator in a window where you watch each command run in real time).

    The loop this unlocks changes the game:

    1. Claude inspects the screen live. It reads the tree, it doesn’t guess.
    2. Claude writes the flow YAML, without you hunting for element IDs by hand.
    3. Claude runs it on the simulator.
    4. Claude diagnoses what failed by looking at the hierarchy, and fixes the test itself.

    Step 4 is the one that saves the most sanity. When a tapOn: "Sign in" breaks because the button became “Log in” in a refactor, the manual flow is: test fails in CI, someone opens it, finds out, fixes the selector, pushes again. With the loop, Claude rereads the hierarchy, sees the label changed, switches to the stable id, and shows you the diff. You approve it or not. Maestro calls this self-healing. It’s test maintenance, the most tedious part of QA, coming off your back.

    In React Native, what makes this loop reliable is the testID. The one you already put on your components becomes Maestro’s id directly:

    <Button title="Sign in" testID="login_button" onPress={onLogin} />

    Prefer testID over text, always. Text changes with translation and with copy revisions. The testID only changes if you change it on purpose. And when you don’t know which selector exists on a screen, maestro studio opens a visual inspector in the browser: you click the element, it shows the available selectors and generates the YAML for the step. That’s how you teach Claude to aim at the right places in your app.

    MCP or Skill+CLI: which to use?

    Both work. The choice is about context. The MCP is plug-and-play: one command and Claude has the tools. The price is that every MCP loads the tools’ schema into the model’s context, and that eats tokens every session.

    The alternative is a Skill that teaches Claude to run maestro test flow.yaml straight in the terminal. Leaner, because you don’t pay the server overhead. The community itself is migrating from MCP to Skill+CLI for this reason. My rule: I start on the MCP to explore and prototype fast. Once the flow becomes routine, I wrap it in a Skill with the CLI and drop the server.

    The iOS toll (the part nobody posts)

    Now the honest part, because selling this as magic is a disservice.

    First: AI-generated tests get it right 70 to 80% on the first pass. Claude picks the wrong selector, forgets a wait. The flow that works is letting the AI generate v1, running it once to validate, and handing maintenance back to it. It’s not “send it and forget it”.

    Second, and heavy for anyone in mobile: iOS charges a toll. A dev documented setting up the same QA on both platforms. Android took 90 minutes, iOS went past six hours. His line sums up the whole decade of mobile automation. “Android hands you a WebSocket and says: here’s the app, do whatever you want. iOS hands you a locked door and a note asking you to use Xcode.”

    The good news is that Maestro abstracts away a good chunk of that toll, it’s the same tapOn on both. But two stones you’ll still step on in React Native:

    “You’re going to let a bot write and run the app’s tests? This is going to go wrong.”

    It’ll go wrong if you treat the generated test as truth and walk away. It won’t if you treat it as a draft the senior reviews, just like you already do (or should do) with code the AI writes. Maestro still hands you the versioned YAML: you can read it in the PR, disagree, fix it. The test is still yours. Claude just stopped making you type it from scratch.

    From a loose test to a routine

    A test you run by hand when you remember isn’t a safety net. It’s theater. The real gain shows up when the flow becomes an automatic routine. Since Maestro is just a command-line binary, it goes anywhere that runs a shell:

    maestro test flows/    # runs the whole suite; exits with an error code if it breaks

    That maestro test flows/ is the same line you run locally, in GitHub Actions on every PR, or in a nightly cron. That dev from the real case left the suite running as a scheduled task every morning at 8:47: it boots both simulators, sweeps the screens, analyzes, and files a report on whatever looks broken. The dev wakes up with QA already done.

    The cycle closes here. Claude writes the flow looking at the app, the flow becomes a versioned file, the file runs in CI. The AI builds the net, the machine pulls it every night.

    The AI writes the code and the test. You still decide what “works” means.

    We already talked here about the AI reviewing code but not testing software. Still true, with a new asterisk: now it TESTS, in the simulator, navigating the app like a user would. What it doesn’t do is decide what counts as “worked”.

    That judgment is yours. The acceptance criteria are yours. Maestro and Claude take the tedious part off your hands: booting the simulator, hunting for the button’s ID, typing the flow, running on both systems, fixing the selector that changed. They give back the time for the one thing the machine doesn’t do: looking at the app and deciding if it’s good.

    A good tool doesn’t replace judgment. It just removes the excuse of not having tested.