The 200-Line Rule That Changed How I Ship Code with AI
When I first let AI agents write code autonomously, the pull requests were enormous.
Eight hundred lines. Twelve hundred lines. One hit two thousand.
Reviews became theater. Nobody reads a two-thousand-line diff carefully. You scroll, you skim, you approve. Bugs hide in the noise. Merge conflicts compound. And the whole promise of AI-accelerated development starts to feel like AI-accelerated chaos.
So I added one constraint. Nothing over two hundred lines.
That single rule changed everything about how my team ships software.
The Problem With Big PRs
There is a well-documented relationship between pull request size and review quality. The larger the diff, the less attention each line receives. At two hundred lines, a reviewer can hold the full context in their head. At eight hundred, they are pattern-matching. At two thousand, they are pretending.
This is true when humans write the code. It becomes dangerous when agents do.
An AI agent can produce a thousand lines of coherent, well-structured code in minutes. It looks right. It passes linting. It might even have tests. But the subtle issues, the edge cases missed, the assumptions baked into the architecture, those only surface under careful review. And careful review does not happen on massive diffs.
The worst part is the false confidence. A clean CI run on a large PR feels like validation. It is not. It is a green checkmark on a surface-level scan.
How the 200-Line Constraint Works
My agent orchestrator sits between the task management system and the coding agents. When a feature comes in, the orchestrator decomposes it into sub-issues. Each one scoped to a single concern. Each gets its own agent run in a sandboxed git worktree.
A feature that would have been one massive PR becomes a stack of five to eight focused pull requests. Each under two hundred lines. Each reviewable in the time it takes to drink a coffee.
The stack lands in dependency order using Graphite. No manual rebasing. No merge queue headaches. The first PR in the stack is the foundation, and each subsequent one builds on it cleanly.
Here is what a typical decomposition looks like for a feature like adding event registration to a portal.
The first PR creates the API client functions and TypeScript types. The second adds the React hooks that consume those functions. The third wires the hooks into the existing components. The fourth adds the registration button with optimistic UI updates. The fifth handles error states and edge cases.
Each PR is independently reviewable. Each is independently revertible. If the error handling PR introduces a regression, you revert that one PR without touching the foundation.
The Constraint Forces Better Planning
This was the unexpected benefit.
You cannot hand an agent a vague ticket and expect a clean two-hundred-line PR for a new feature. The math does not work. A vague ticket produces vague code, and vague code sprawls.
The decomposition step is where the actual engineering happens. Breaking a feature into five sub-issues that each stand alone, that each have clear inputs and outputs, that each fit within two hundred lines. That requires understanding the architecture. That requires knowing which abstractions exist and which need to be created. That requires thinking before coding.
The irony is that the most valuable part of AI-assisted development is not the coding. It is the planning that the coding constraint forces you to do.
What Changes When PRs Are Small
Review quality goes up dramatically. A reviewer can actually read every line. They catch naming inconsistencies, missing error handling, and architectural drift that would vanish in a large diff.
Merge conflicts nearly disappear. When five agents are working in parallel on different concerns in separate worktrees, their changes rarely overlap. When they do, the conflict is small enough to resolve in seconds.
Deployment risk drops. Each PR is a small, reversible change. If something breaks in production, you know exactly which of the five PRs caused it. Rollback is surgical, not scorched earth.
And the agents themselves perform better. A tightly scoped task with clear acceptance criteria produces better output than an open-ended feature request. The agent does not need to make architectural decisions. Those were made during decomposition. It just needs to execute within a well-defined box.
The Rule Applies to Humans Too
I started this constraint for AI agents, but I now enforce it for my human engineers as well.
The conversations are different. Instead of reviewing a PR and saying "this is too big, can you split it," we now plan the splits before any code is written. The Linear issues are scoped to PR-sized chunks from the start.
Engineers who resisted at first now prefer it. The feedback loop is faster. PRs get reviewed in hours instead of days. The dopamine hit of merging comes more frequently. And the codebase stays healthier because every change is small enough to understand.
How to Start
If you are working with AI coding agents, add the constraint today. Two hundred lines is not a hard ceiling for every situation, but it is the right default.
Start with your task decomposition. Look at your next feature and ask how you would break it into pieces that each fit in two hundred lines. If you cannot, the feature is not well enough understood to hand to an agent.
Then look at your merge strategy. Graphite handles stacked PRs beautifully. If you are not using it, you are rebasing manually, and that tax adds up fast.
The agents do not need to be brilliant. They need clear, small scope.
Turns out that is true for every engineer on your team too.

