Your codebase is the new prompt: the MVP that scales (or turns to junk)

TL;DR

Your codebase is the new prompt. In an MVP built with an AI agent, what decides whether it scales in phases or turns into disposable junk isn’t the stack you picked. It’s whether the agent can still find its way around your repo six months from now. And you solve that in the organization: code by feature, frontend and backend in the same monorepo, decisions recorded in ADRs. Not in the cleverness of the prompt.

The number that anchors this: in a study of coding agent trajectories on real bugs, the attempts that solved the problem touched the same file as the correct patch 93.6% of the time. The ones that failed, 62.7%. Locating the right code is half the game, and being locatable is a property of your architecture, not of the model.

Architecture stopped being the tax you pay to go slow. It became the thing that keeps AI fast.

Junk isn’t what was built fast. It’s what was built blind.

Every technical founder who comes to me shows up with the same fear, and it’s legit: “I need to ship in weeks, but I don’t want to rewrite everything three months from now.” Then comes the belief I want to kill here:

“Architecture is a luxury for people with time. Ship now, fix it later.”

I hear this every week. And I agreed on some level, until AI changed the math. Because “fix it later” assumes a choice that no longer exists: either you ship fast, or you deliver something well-architected. To accept that choice is to accept that the MVP is born a disposable prototype, and that the “real” version comes later, from scratch.

Hold on. That dichotomy is dead, and AI is what killed it.

Before, good architecture cost time. You drew boundaries, separated responsibilities, wrote docs. Every hour of that was an hour that didn’t become a feature on screen. In an MVP with a deadline of weeks, cutting architecture looked like the rational trade-off. It was. Not anymore.

What changed: the code you generate today, for the most part, no longer goes straight from your head to the editor. It comes out of a coding agent: an AI agent that reads, edits, and runs your repository on its own, operating inside a harness, the platform that plugs the model into the code tooling. Claude Code and Cursor are two harnesses. And that agent has a trait that changes the whole calculation: it’s only as fast as your repo lets it be.

Vibe coding (the whole “ask, accept, deploy” without understanding what came out) is great for a weekend prototype. The problem is the bill, which isn’t linear. A 2025 paper formalized this as the flow-debt trade-off: the fluidity of generating code masks the debt piling up in parallel. Architectural inconsistency, a dependency nobody evaluated, the same problem solved five different ways. Around the sixth month, the cost of undoing the debt overtakes the value of what was built.

It turns into a ball and chain. And the cruel detail: the ball and chain doesn’t just slow down your team. It slows down the very agent that created it. The signals it relies on to find its way (consistent naming, predictable patterns, low coupling) were destroyed by the careless generation itself.

An MVP that turns to junk isn’t the one built fast. It’s the one built BLIND, leaving no trail for the AI or for the human who has to work on it later.

AI reads your repository, not your prompt

There’s a line from Matt Pocock that captures the shift: “your codebase, not your prompt, decides the quality of the AI’s output.” Sounds like an exaggeration. It isn’t.

Look at how Claude Code finds code in a large repository. It doesn’t use semantic search, it has no magic embeddings index. It does what a senior dev would do: navigates the filesystem, reads a file, and runs grep, the terminal’s old literal text search, to find exactly what it needs. Anthropic chose grep on purpose: embeddings go stale, the repo changes all the time, and a stale index lies.

The consequence is physical, not philosophical: “grep finds strings, not intent.” If the function that matters is called validateToken, the agent finds it on the first try. If the logic is scattered across five files loosely tied by imports, with generic names like handler or process, it digs around, loads too many files, and burns context before it even starts the work.

And here lives the number that opens this post. Researchers looked at coding agent trajectories on real SWE-bench bugs. The attempts that fixed the bug touched the same file as the correct patch 93.6% of the time. The ones that failed, 62.7%. Translating: the agent’s bottleneck is almost never “knowing how to code.” It’s finding the right snippet. Locating well is what separates the PR that merges from the one that rots.

Organizing by technical layer sabotages exactly that. When everything is controllers/, services/, models/, to touch checkout the agent opens five folders and loads files from another twelve features that live in the same folders. The context window becomes, in the words of an article I read about this, “a junkyard of irrelevant stuff.”

And it’s not just the AI that suffers. Technical layering is the old SRP violation, the first principle of SOLID, which Uncle Bob redefined as “gather together the things that change for the same reasons, and separate those things that change for different reasons.” Organizing by layer does the opposite: it shatters the feature (which changes together) across four folders, and piles into each folder code whose only thing in common is being “a controller.” The fix has a name, and it’s the subject of the next section.

Shouldn’t AI be smart enough to find it on its own?

It’s the question every CTO asks, and the honest answer is: it is, up to a point, and that makes your complacency worse. The agent does find it. It reads 25 files to answer about 3 functions, because without structure it didn’t know which 3 they were. It works, and it charges you in tokens, in time, and in hallucination when the context fills up with noise.

And here I have to be honest, because the simplistic version of this idea (“a bad codebase blocks the AI”) is overblown. It’s not that human and agent get stuck the same way. They have opposite strengths. The AI can brute-force its way through a chaotic repo: business rules scattered across twenty files, it burns a million tokens of context and finds it anyway. A human, in the same repo, would take days, or give up. In that case the AI is better than you.

It’s just that the human has a weapon the AI doesn’t have natively: the IDE. You fire an event with ApplicationEventPublisher in Spring, and IntelliJ shows you every @EventListener that listens to that event, in order, in one click. It’s a semantic index of the entire codebase, for free. The AI doesn’t have that: it falls back on a bunch of greps and on loading file after file into context, and that’s where context rot hits, the degradation of model quality as the window fills.

So the right framing isn’t “AI exposes bad architecture.” It’s: bad architecture charges a different toll from each one. From the human, in time and in IDE dependence. From the AI, in tokens and in context rot. An organized repo lowers the toll for both at once. That’s why the codebase is the new prompt: it is, literally, the context the agent reads before each task, and the cleaner it is, the less it pays to understand you.

Organize by feature, not by layer (and forget the architecture’s name)

The fix is more boring than it sounds, and it’s free: organize code by feature, not by technical layer.

Instead of controllers/, services/, repositories/ (where each feature is shattered across four folders), you make one folder per business capability: orders/, payments/, refunds/, each with its own controller, service, and data access inside. The name for this, in the literature, is vertical slice: a slice that runs from the edge (the request) to the bottom (the database), whole, in the same place. Jimmy Bogard nailed the golden rule: “minimize coupling between slices, and maximize coupling in a slice.”

For the AI, this is attention routing. The agent reads the folder name before opening any file, and infers the scope of the task right away. “Touch the refund” already sends it to refunds/, and everything that matters is placed right there together. Uncle Bob called this Screaming Architecture over ten years ago: the folder structure should scream what the system does, not which framework it uses. In 2011 it was aesthetics. Today it’s performance for whoever’s going to code. And whoever’s going to code is an agent.

Here’s a piece of honesty that disarms. In the briefing for this post, someone on the team wrote “use NGC architecture or whatever fits.” I went to look up what “NGC architecture” is. It doesn’t exist. It’s not a consolidated pattern; it’s probably a typo for N-tier, or just an acronym that slipped out. And you know what that proves? That the name matters less than you think. Clean, hexagonal, onion, N-tier: deep down they’re the same idea (business rules at the center, framework and database at the edge) with different vocabulary. What decides whether the agent, and your team, will be able to evolve the code isn’t the architecture’s badge. It’s the discipline of boundaries.

That said, don’t fall into the opposite extreme. Clean Architecture with four layers of abstraction in an MVP is over-engineering; someone compared it to playing Dark Souls: too many rules, too much ceremony, for a product nobody may even want yet. The point isn’t the purest architecture. It’s the most navigable one.

And there’s a trade-off, of course. Organizing by feature generates duplication: two slices validate something similar, three features hit the same table. The instinct is to abstract it all into a shared/, and then shared/ becomes the trash can that couples everyone together again. Sandi Metz has the best rule for this: “duplication is far cheaper than the wrong abstraction.” In an MVP, accepting a bit of copy-paste to keep slices independent almost always beats religious DRY. Shared only for real infra: database client, logging, auth. Never for business rules.

Monorepo and ADRs: stop making the AI (and your team) guess

Organizing inside the project solves half. The other half is what’s between the projects, and that’s where the monorepo comes in.

The idea: frontend and backend in the same repository. Together with the docs folder, the ADRs, the conventions. One history. There’s a line from Francis Dortort that closes the argument: “a repository boundary is a context wall. Every wall degrades the quality of AI-generated output.”

Think about the concrete case. You ask “add a field to the signup form.” In a setup with two separate repos, the agent needs two conversations with no memory of each other, and the contract between frontend and backend drifts along the way. In a monorepo, it’s a single transaction: it renames the field in the database, updates the API, adjusts the UI and the test, in a single context, in a single commit. DB, API, and UI without switching windows. It’s exactly the kind of cross-cutting change an MVP makes all the time.

Tooling? Start simple: pnpm workspaces with Turborepo handles most MVPs with very low friction. Nx when the scaling pain shows up, not before. And the honest trade-off: a monorepo without selective build tooling gives you a slow CI. If every commit rebuilds everything, the bill explodes. It’s a solvable problem, but it’s a problem you take on deliberately.

The ADR is the other piece, and the most underrated. ADRs I already explained in another post: a short, dated record of a technical decision and the why behind it. What changed with AI is the use. Without the ADRs in context, the agent ends up, in the words of an article, “deprived of architectural intent”: it sees the implementation, but not the reasoning. It knows you use Postgres. It doesn’t know why you ruled out Mongo, so it might “improve” your code by reintroducing exactly what you rejected. The ADR, together with a CLAUDE.md or AGENTS.md in the repo, is how you hand over intent on a silver platter, instead of praying it guesses.

Now the counterweight, because I’m not selling miracles. None of this is magic, and more documents isn’t always better. An ETH Zurich study tested context files and found that an auto-generated AGENTS.md WORSENED the success rate in several scenarios and raised inference cost by more than 20%. METR itself measured experienced senior devs getting 19% slower with AI in a controlled study, while believing, themselves, they were faster.

What that tells you: the gain doesn’t come from stuffing the repo with markdown. It comes from the non-obvious well recorded: the counterintuitive decision, the gotcha you can’t infer from the code. ADRs and conventions are a scalpel, not a flood. Good context, in Anthropic’s own words, is “the smallest possible set of high-signal tokens,” not the largest pile of tokens.

The MVP that scales is the one AI still understands tomorrow

Put it all together and the picture is simple. The MVP that scales doesn’t have a more expensive stack or a more sophisticated architecture than the MVP that turns to junk. It has boundaries. Code by feature, frontend and backend in the same place, decisions recorded. Three cheap disciplines that, added up, keep an AI agent productive in phase 2, phase 3, phase 4, instead of stuck at month six.

This doesn’t mean building everything. It means cutting the right thing, and what to cut and what to keep in an MVP became a post of its own. Martin Fowler has a technical debt quadrant every founder should know: debt can be deliberate and prudent (“we need to ship now and deal with the consequence later”) or reckless and blind (“we don’t have time for design”). The first is a legitimate business decision. The second is the prototype that’s going to blow up. Junk isn’t having debt. It’s not knowing you have it.

And what to cut first? Premature scale. Startup Genome looked at more than three thousand startups and found that 74% of the ones that died, died from scaling too early: optimization, microservices, distributed infra for a load that didn’t exist. Microservices in an MVP is the perfect example of reckless debt disguised as good engineering. Start with a monolith, modular, with clean boundaries. The boundary is what makes the next phase an extraction, not a demolition.

It was the same pattern I wrote about when code review became the bottleneck: AI sped up the individual, and the part that didn’t keep up became the brake. With architecture it’s the same, only earlier: the disorganized repo is the bottleneck you plant on day one and only feel on day one hundred and eighty.

Your MVP doesn’t need to be perfect to scale. It needs to be readable. The code AI still understands six months from now is the code that doesn’t turn to junk. The rest is a rewrite waiting for its date.