Agents don't architect reliably: why system design still matters

There’s a specific failure mode I’ve seen repeatedly with agentic coding tools, and it’s always the same shape.

A developer asks an agent to add a feature. The agent produces clean, compilable, well-tested code. The developer reviews it, the tests pass, the PR gets merged. Two weeks later, someone notices the feature duplicates logic that already existed in a different module. Or it introduces a second source of truth for data that should have one. Or it uses a synchronous pattern in a codebase that’s otherwise async-first, creating a subtle consistency problem that surfaces under load.

The agent may have executed the local task well. The larger mistake was upstream: nobody asked the right question first: should this feature exist in this form, in this place, at this level of abstraction?

That question is architecture. Current agents can support it, but they do not answer it reliably for you.

That distinction matters more than it sounds. As implementation gets cheaper, the expensive failures shift upward: duplicated sources of truth, boundaries that calcify in the wrong place, migrations sequenced badly, performance constraints discovered too late. Teams with strong architectural judgement get faster with agents. Teams without it can accumulate technical debt at machine speed.

Why Architecture Is Hard for Agents

Let me be precise about what I mean by “architecture” here, because the word is overloaded. I don’t mean choosing between MVVM and TCA. I don’t mean drawing box-and-arrow diagrams. I mean the set of decisions that determine:

Where code lives — which module, which layer, which feature boundary
How components communicate — direct dependency, protocol abstraction, event bus, composition
What the invariants are — “every network request goes through the authenticated client,” “state mutations only happen in reducers,” “no module imports the app target directly”
What you deliberately don’t build — the features you reject, the abstractions you defer, the complexity you avoid

An agent can execute on all four of these if you specify them. What it still struggles to do reliably, in my experience, is derive them from the problem space. And this is the critical gap.

Consider a real example. My team was building a social feature — user profiles, follow/unfollow, activity feed. The agent produced brilliant isolated features. A ProfileFeature reducer. A FollowService protocol with clean async methods. An ActivityFeedFeature with pagination.

But it built them as independent features with their own data stores. The ProfileFeature fetched the user’s follow count from the API on every display. The ActivityFeedFeature fetched the same data independently. When a user unfollowed someone on the profile screen, the feed didn’t update until a manual refresh.

This isn’t a code quality problem. Each feature, viewed in isolation, was excellent. It’s a system design problem. The correct architecture was a shared SocialGraphStore that all features observed, with follow/unfollow actions propagating through a single source of truth. That’s a decision about system topology — about how data flows through the app — and in my experience, current agents still don’t make that kind of decision reliably enough to trust by default, especially when it requires understanding the relationships between features that don’t exist yet.

The Three Things Agents Often Miss

After extended day-to-day use of agentic tools, I’ve identified three categories of decision that consistently require human judgement.

1. Cross-Feature Invariants

Many mature codebases have rules that span multiple features but aren’t written down clearly in any one place. “Authentication state is always checked before accessing protected resources.” “Navigation never pushes more than three levels deep.” “Error states always provide a retry action.”

Some of these invariants live in the team’s collective understanding even when parts are documented elsewhere. They’re often enforced by code review, not by the compiler. An agent, working on a single feature in isolation, has limited access to these invariants unless they’re documented somewhere it can read or you explicitly provide that context.

I’ve started maintaining what I call an architecture context file — a markdown document that lives in the repo root and describes the system’s invariants:

# Architecture Invariants

## Data Flow
- All persistent state is managed through TCA reducers
- No feature may directly access UserDefaults; use the AppSettings dependency
- Network responses are always decoded into domain models, never used as raw DTOs in views

## Navigation
- Tab-level navigation uses NavigationPath, owned by AppNavigationState
- Modal presentations use the .sheet/.fullScreenCover pattern, never manual UIKit presentation
- Deep links are handled exclusively by the DeepLinkHandler in the App reducer

## Dependencies
- All external services are abstracted behind protocols in the Dependencies module
- Live implementations live in the app target; test implementations live in test targets
- No feature module may import another feature module directly; compose through the parent reducer

When I include this file in agent context, the output quality jumps noticeably. The agent respects the invariants it can read. But writing this document — and more importantly, deciding which invariants are worth enforcing — still requires human judgement. It requires understanding why each rule exists, what failure mode it prevents, and what tradeoff it accepts.

2. Evolutionary Design Decisions

Some decisions only make sense in the context of where the codebase is heading, not where it is today.

Six months ago, I chose to keep authentication as a simple singleton rather than a TCA dependency. Not because singletons are good architecture — but because we were mid-migration from UIKit to SwiftUI, and the authentication layer touched both worlds. Making it a TCA dependency would have required wrapping every UIKit view controller in a bridge that could receive the auth state, which was more disruption than the migration could absorb.

That’s a decision about sequencing — about what to compromise now to enable a better architecture later. An agent, asked to “add authentication,” will often produce a locally clean solution. It doesn’t know that the most elegant solution may be wrong for this moment in the codebase’s evolution.

I now think of myself as a codebase’s evolutionary biologist. I understand its history, its current constraints, and its trajectory. The agent is a talented surgeon who can operate on any organ but doesn’t know the patient’s medical history.

3. Constraint Identification

The most valuable architectural skill isn’t choosing between options. It’s identifying constraints that others haven’t noticed.

“This feature needs offline support” is a product requirement. “Offline support means we need a local persistence layer, which means our data models need to be Codable, which means our use of enums with associated values throughout the domain layer is going to be a migration headache, which means we should introduce a DTO layer now before the offline work starts” — that’s constraint identification.

Agents struggle with this because it requires traversing implications across architectural boundaries. An agent can build an excellent offline sync system. But it usually won’t tell you, unprompted, that building it will force changes in six other subsystems, and that you should restructure those subsystems first.

The Prompt-Architecture Gap

There’s a concept I’ve been thinking about that I call the prompt-architecture gap. It’s the distance between what you can express in a prompt and what you need the agent to understand about your system.

For small, isolated tasks, the gap is narrow. “Write a function that formats a Date as a relative string” — there’s almost no system context needed. The agent produces correct code immediately.

For feature-level work, the gap is wide. The agent needs to understand the dependency graph, the existing patterns, the team’s conventions, the deployment constraints, the performance characteristics of the target devices. You can bridge some of this gap by providing context files, pointing the agent at reference implementations, and constraining the output format. But there’s always a residual gap that only human understanding of the system can close.

The developers who are most effective with agentic tools are the ones who’ve learned to identify and bridge this gap before they start prompting. They front-load the architectural thinking, then use the agent for execution. The developers who struggle are the ones who expect the agent to handle both architecture and execution — and end up with code that’s locally correct but systemically wrong.

How I Use Agents Without Letting Them Design the System

The workflow that has held up best for me is simple:

Decide the shape before the prompt — module boundaries, ownership, invariants, and the failure modes that matter
Provide reference constraints — architecture notes, existing feature examples, and explicit “do not introduce” rules
Let the agent execute a narrow slice — one feature path, one migration step, one refactor boundary at a time
Review for system effects, not just code quality — duplicated logic, state ownership, dependency drift, and lifecycle mistakes
Codify the good decisions afterward — if the prompt needed hidden context, that context probably belongs in the repo

This is the part that often gets missed in discussions about AI-assisted development. The gain is not “the agent writes everything.” The gain is that a senior engineer can spend more time on topology, sequencing, and tradeoffs while delegating more of the mechanical implementation.

What This Means for Teams

The practical implication for engineering teams is this: agentic tools amplify existing architectural discipline. They don’t create it.

If your team already has clear architectural conventions, an invariants document, good test coverage, and a culture of thoughtful code review, agents will make you faster. Dramatically faster. The conventions guide the agent’s output, the tests catch behavioral and integration regressions, and the reviews catch the system-level problems that agents miss.

If your team doesn’t have those things — if your codebase has no clear patterns, if features are structured inconsistently, if code review is a rubber stamp — agents can make your problems worse. They’ll produce more code, faster, with less consistency, and the resulting codebase can be harder to maintain than if the code had been written slowly by hand.

I’ve seen both outcomes. The difference isn’t the tool. It’s the team.

If you’re leading a team, the practical question is no longer “should we use agents?” It’s “what system of architectural guardrails do we have in place before we accelerate implementation?” That is now a management and senior-engineering concern, not just an individual productivity preference.

The Senior Engineer’s New Job Description

When I started my career, being a senior engineer meant you could build complex features independently. You knew the frameworks, you understood the patterns, and you could ship reliably.

That part of the job is getting cheaper fast in many contexts. With good context, an agent can assemble complex features independently and navigate a surprising amount of public framework knowledge.

What it still can’t do consistently, in my experience, is the work that was always the real job of senior engineering:

Saying no to features that add complexity without proportional value
Identifying systemic problems before they manifest as bugs
Making decisions that are deliberately suboptimal locally to enable better global outcomes
Maintaining architectural coherence across a growing codebase over months and years
Teaching other engineers why the architecture is the way it is — not just what it is

These are still far less automatable than feature implementation. If anything, they’re more important now, because the volume of code being produced seems to be increasing faster than the volume of architectural thinking.

The developers I’m least worried about are the ones who spend most of their time in architecture discussions, design reviews, and mentoring. The developers I’m most worried about are the ones who viewed “I write code fast” as their primary differentiator.

I think the ability to produce code quickly is becoming commoditized. Good judgement isn’t.

The teams that will benefit most from agentic development are not the ones that remove senior judgement from the loop. They’re the ones that make that judgement more explicit, more teachable, and easier to apply consistently. That’s the leverage point I keep coming back to: not faster typing, but better system decisions.