AI Agents Are an Infinite Queue of Fresh CS Grads with Eidetic Recall of Stack Overflow That You Shoot in the Head After They Complete Their First Ticket

Imagine running a dev shop where every morning, a new hire walks through the door. You sit them at a terminal with your codebase, documentation, and a cup of coffee. They've never seen your code before, but they've memorized every Stack Overflow answer ever written. They start typing immediately, generating solutions with absolute confidence. The code gets more elaborate as they go, factory patterns multiplying, abstractions nesting deeper, their performance degrading with each sip and each line of code. When the coffee runs out or they finish their task (whichever comes first), you shoot them in the head, wheel the chair out, and bring in the next one. Fresh coffee, fresh grad, same terminal; you don't even clean the blood off the keyboard anymore.

This is essentially how agentic-based LLM development works. LLMs don't function like traditional code generators -- they're better thought of as an infinitely scalable team of first-week junior developers. Eager, educated, zero context, and they're summarily executed before they can learn from their mistakes.

After 15+ years writing software in one form or another, I'll be the first to admit diving headlong into green-field projects with AI tools has helped me overcome the zero-start problem -- getting from nothing to something, the point where iteration becomes fun. It's brought back some of the joy I experienced writing games and robot controllers when I first started coding at the age of twelve, something that's been missing from the last several years of my professional practice.

But it's also had me making mistakes I know better than to make, getting lost down rabbit holes I'd never explore if I actually had to write the code myself.

Seconds to generate, years to maintain

Code generation isn't new -- we've been writing generators, templates, and macros for decades. What's different now is the speed, scope, and accessibility. You don't need to learn template syntax or wrestle with configuration. You just type. Generators went from a tool with real learning curves and setup costs to something every junior can use on day one.

You can generate an entire service architecture in minutes without understanding what you're building or why.

GitClear's 2024 analysis of 211 million lines of code found AI-generated code contains 8x more duplicate blocks[1] -- a pattern that's still directionally true even as models continue to evolve. More concerning: "moved code" (indicating refactoring and consolidation) dropped from 25% to under 10%. When you can create a new implementation in seconds, why spend minutes understanding the old one?

The METR study drives this home[2]: experienced open-source developers working on million-line codebases were 19% slower when using AI tools. These developers averaged 5 years of experience and 1,500 commits on their repositories. They know their codebases intimately -- the AI doesn't.

Google's 2024 DORA report found something even more troubling: a 25% increase in AI usage correlates with a 7.2% decrease in delivery stability[3]. The report actually found AI makes all four key metrics worse -- the first technology they've studied to achieve this dubious honor. Lead time increases, deployment frequency stagnates, recovery time lengthens, change failure rate (CFR) rises. Yet more than three-quarters of developers use AI daily despite nearly 40% having little or no trust in its output.

Why do we keep adopting something that empirically makes things worse? Because writing code feels productive. Shipping features looks fast. The costs -- debugging, stability, maintenance -- come later, after the quarterly review. We're optimizing for the visible (code written) over the valuable (working systems). Classic productivity theater, now with AI.

Cargo culting at 100 tokens per second

We've always had copy-paste programming. The difference is that AI-generated code looks good. Proper error handling, consistent style, appropriate patterns. It passes the sniff test in code review.

But it's still copy-paste. The explosion in duplicate code isn't a bug -- it's what happens when you optimize for generation speed over understanding. AI can't tell the difference between essential and accidental complexity. It reproduces Factory patterns and Service Layers because they're statistically common in training data, not because your problem needs them.

And since the generated code looks professional, it gets committed. Then it becomes training data. Then the next generation of models learns that this is what code should look like.

Junior developers are learning the wrong lessons. They see AI generate these complex architectures and think that's what professional code looks like. So they prompt for "enterprise-grade" solutions to problems that could be solved in 20 lines. The AI happily obliges. The cycle continues. Harness's State of Software Delivery 2025 report found developers now spend more time debugging this AI-generated code than they save from using it[4].

What Actually Works

Be explicit about simplicity. Don't ask for "production-ready." Ask for "the simplest possible implementation that does X." You can add complexity when you actually need it (you won't).

Design in types, implement with AI. Write your contracts first, then generate implementations. When the types are locked down, AI can't smuggle in architectural decisions through the back door.

Review for necessity, not just bugs. Every abstraction should solve a problem you have today, not one you might have tomorrow. If someone says "future flexibility," delete it.

Keep AI on a short leash. Narrow, specific tasks like "parse this JSON according to this schema" work well. "Design a data processing system" is asking for trouble -- that's architecture work, not implementation. I use automation to maintain context files with current task details and interface definitions, ensuring the AI always works within established boundaries.

Enforce standards with tools, not prompts. Don't waste time on elaborate prompt guardrails -- they don't work. Configure your build system to detect when AI tries to modify interfaces versus implementations. The compiler, linter, and pre-commit hooks are your real defenses. Enforcement beats instruction.

The Rule of Three still applies. Don't abstract until you've seen the pattern three times. AI will abstract complexity because that's what it's seen. That doesn't mean you need them.

Lessons from the Last Generation (of Generators)

Developers who've been using code generators for years (things like Rails scaffolding, T4 templates, even Lisp macros) learned some expensive lessons that we're ignoring with AI.

The successful pattern was always clear boundaries. Either never touch generated code (regenerate from source) or generate once and own it forever. The disasters came from mixing. With AI, we're constantly blending generated and handwritten code with no system for tracking which is which.

But here's what's different: AI generation is stochastic. You can't regenerate the same code twice. There's no template to fix, no source to regenerate from -- just a prompt that might produce completely different code tomorrow. This makes every piece of AI-generated code a one-way door. The moment it's merged, it becomes legacy code you own forever, complete with whatever architectural fever dream the model had that day.

They also learned to generate the boring, not the interesting. Rails scaffolding gives you CRUD, not your domain logic. ORMs generate data access, not business rules. We're often doing the opposite -- asking AI to architect our systems while we manually tweak formatting.

But there's a deeper problem with applying these lessons to AI: we're not really dealing with a code generator at all.

AI agents are an infinite stack of junior devs, not code generators

This stochastic nature forced me to reconsider my mental model -- AI agents aren't really like code generators at all. They're more like an arbitrarily scalable team of first-week junior developers: eager, educated, zero context, and they are summarily executed after their first PR.

This way of thinking about AI has been useful. The unnecessary abstractions and enterprise patterns look like 'good code' from a CS curriculum. The massive code duplication happens because they can't remember what their colleague (also them, five minutes ago) already wrote.

We're in the Uber-for-code phase. Just like rideshare companies burned billions of VC money subsidizing rides to capture market share, AI companies are burning compute costs to make code generation feel free. This distortion makes complexity cheap at the point of creation. When the subsidies end and true costs emerge, teams will be stuck with the technical debt they accumulated during the gold rush.

Traditional organizations already know how to manage thousands of juniors who don't talk to each other -- they enforce narrow task boundaries. "Implement validateEmail according to this spec." Never "design the authentication system." They define specifications first -- seniors write contracts, juniors implement. Sound familiar? It's exactly why type-driven development works with AI.

The crucial difference from real juniors: they learn from code review. AI never learns. Every conversation is a new junior who needs the same corrections. This is why prompt engineering fails where lint rules succeed -- you're not teaching, you're enforcing.

So we adapt the boundary patterns from code generation, but differently. Instead of "never edit generated code," it's "detect when AI edits interfaces." Your build system becomes an audit system:

  • Core interface changes trigger escalated review
  • Type modifications fail builds without explicit approval
  • Implementation details are fair game

Google's billion-line migration succeeded using this principle[5] -- they used AI to generate changes that then passed through their existing static analysis infrastructure (Kythe, Code Search, ClangMR) before landing. They didn't let AI architect the migration; they used it as a tireless junior developer working within their already-established safety rails.

What I've been trying:

  • Track which parts of your system are contracts versus implementation details -- you need different review standards for each (and hooks to update docs when interfaces change)
  • Write interfaces and type definitions first -- let AI implement the boring parts
  • Use your tests and compiler as your first line of defense against AI creativity -- tests catch behavior changes, types catch contract violations

You can see this approach in practice at github.com/trrad/tyche -- interfaces defined upfront, AI handling implementation details, clear boundaries enforced by the type system, and automated context management via justfile keeping the AI aligned with current architecture.

The compiler-as-defense principle scales with the problem. AI can generate complex code faster than you can review it, but it can't generate code that violates your type system faster than TypeScript can reject it. Since AI repeats the same antipatterns, you can encode your standards as rules. Configure dependency-cruiser or other AST based tools to build custom rules specific to your architecture -- enforce whatever boundaries make sense for your interfaces and monitor for architectural drift. Add Git hooks that flag when AI modifies your type and interface definitions, triggering escalated review and documentation sync. These binary decisions (can this file import that file? did an interface change?) are enforceable and automated.

The speed that makes AI dangerous -- generating entire architectures in minutes -- becomes manageable when your tooling can analyze those architectures just as fast. Your build system becomes an audit log for when AI tries to modify core contracts.

I am still figuring this out. But ignoring decades of code generation experience while rushing into the latest new technology headlong feels like exactly the kind of mistake our industry loves to make.

The Abstraction Arbitrage

Here's what's really happening: AI makes complex code cheap to produce but not proportionally cheap to maintain. And it's about to get worse -- when the VC subsidies dry up and these services start charging real costs, teams will be stuck paying to maintain all the complexity they generated during the gold rush. It's an arbitrage opportunity for teams that resist the urge to write more code with less understanding.

Use AI for the genuinely boring stuff -- writing tests from specs, generating boilerplate, implementing interfaces you've already designed. Keep your core architecture simple and human-designed (though AI can be a useful sounding board when you're working through design decisions alone). Let other teams drown in their generated complexity while you ship features they can't even find in their codebases anymore.

Every line of code is a liability. AI makes it easier to take on these liabilities, but doesn't appear to make them significantly easier to pay off. The teams that understand this will eat the ones that don't.

The real skill isn't prompt engineering. It's knowing when not to prompt at all.


  1. GitClear. "AI Copilot Code Quality: 2025 Data Suggests 4x Growth in Code Clones" https://www.gitclear.com/ai_assistant_code_quality_2025_research ↩︎

  2. METR. "Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity" https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/ ↩︎

  3. Google. "2024 State of DevOps Report" https://cloud.google.com/devops/state-of-devops ↩︎

  4. Harness. "State of Software Delivery 2025" https://www.harness.io/state-of-software-delivery ↩︎

  5. Google Research. "Accelerating code migrations with AI" https://research.google/blog/accelerating-code-migrations-with-ai/ ↩︎