The AI-Scalable Startup

When I joined my first startup after leaving a big tech company, I was excited to move fast. No bureaucracy, no committees, just ship code and grow.

On my second day, I pushed some code to unblock fast experimentation for an upcoming growth sprint. I proposed a change that let us safely roll out new React Native builds via CodePush to a subset of users, without waiting on full App Store releases.

The response came back from a senior engineer: “This is going to orphan old App Store installs. Did you learn about our release processes?”

I looked for the relevant docs. There weren’t any.

They continued: “There were a lot of decisions made to get us to this point, it’s not a vanilla CodePush setup.”

I asked where those decisions were documented. Slack threads? A design doc? Pull requests? There was nothing, just institutional memory.

“We’re a startup,” they said. “We don’t have time to write everything down. You need to learn how the system works first.”

So I did. I asked questions, traced through code paths, learned the unwritten rules. I scoped my changes smaller and deferred decisions to more tenured engineers. After a couple of months, I finally felt productive.

Then I tried to use AI to help with a refactor. I was working on part of our authentication and session-handling logic, code that had accreted over time and interacted with half a dozen other systems. I asked an AI tool to help restructure it to make a new experiment easier to run.

The code it produced was clean and readable. It followed modern best practices. But it also violated several invisible assumptions baked into the system: ordering guarantees, side effects relied on elsewhere, and implicit contracts that existed nowhere except in people’s heads.

I caught some of it in planning, but in a refactor that size, I naturally missed other parts. After a few hours of chasing down issues, I abandoned the refactor and rewrote it manually.

When I mentioned this experience, the reaction was predictable: “See? AI isn’t ready for real codebases.” A surprising take for a supposed “AI-native” company.

A few months later, I left.

I joined an even smaller company, maybe two dozen engineers. I assumed it would be worse: less structure, more chaos, even fewer docs. But it wasn’t.

On my first day, I asked where the documentation lived. “It’s pretty sparse,” the CTO admitted, “but the tests are comprehensive. Read the tests, they’re basically executable documentation.”

I pulled up a payment processing module. The tests were clear and behavioral. The module boundaries were obvious: PaymentGatewayFraudCheckReceipt. Each did exactly what it claimed to do, and nothing else.

On my third day, I shipped a PR. It passed CI. A senior engineer approved it in about twenty minutes with a single comment: “Nice catch on that edge case. Why haven’t we done this before?”

A week later, I tried the same AI experiment. I pointed Claude at our user authentication module and asked it to add support for OAuth providers. It followed the existing abstractions, generated the necessary code and tests, and respected the boundaries of the system. CI caught two small issues. I fixed them, pushed again, and merged. The whole thing took about ninety minutes instead of the two days I’d budgeted accounting for the learning I would need.

At this point it clicked for me. The change in my speed was drastic and notable, and the difference wasn’t company size or engineering talent. There was a deeper property, something about the company and its people, that was more amenable to absorbing new leverage, especially AI-assisted tooling, without falling apart. Some organizations are structured to compound new capabilities. Others resist them, often despite “company policy”, through their engineering architecture, and more insidiously, their culture.

Pillars of AI Scalability

These two startups exemplify the larger pattern that I had witnessed over the last few years. Through my consulting work, I had been inside engineering teams at a range of companies: fintech, healthtech, SaaS, marketplaces. Only a rare few truly moved fast with AI.

Looking back, I realized the difference usually came down to three things. They run from mostly technical to mostly behavioral: Risk Elasticity, Semantic Surface Area, and Socio-technical Friction. Let me walk through each one.

1. Risk Elasticity

At that first startup, every deploy was an event. Things broke regularly, and each deploy felt like a gamble. The system was flawed; there was too much surface area and test coverage was thin. Some tests were flaky enough that people ignored failures. The staging environment was perpetually out of sync because half-finished experiments lingered for weeks. You never quite knew whether a failure was your fault or leftover debris from someone else’s work.

In an environment like that, AI-generated changes are terrifying. AI is very good at producing lots of changes. That’s a feature until you don’t have the safety net to support it. If an AI proposes a change touching dozens of lines across multiple modules, you’re left with two bad options: merge on faith, or manually reason through every edge case. Either way, the promised speed evaporates.

Contrast that with the second company. There, the rule of thumb was simple: if CI passes, you can merge. CI was fast, about four minutes, and trustworthy. This meant it filtered out the failures that mattered, so engineers felt comfortable shipping quickly, knowing small bugs could be fixed or bigger problems reverted without affecting the user experience in any meaningful way.

I watched a junior engineer use Claude Code to build an entire new API endpoint: routing, validation, database access, and tests. It took her about forty-five minutes. CI caught a missed null check and a type mismatch. She fixed them, pushed again, green. Merged.

This is what I mean by Risk Elasticity: the ability to absorb a high volume of changes, especially AI-assisted ones, without destabilizing the system. AI-generated code isn’t uniformly bad. It’s randomly bad. If you can’t cheaply catch the 20% of cases where it’s wrong, you lose all the upside.

Teams with high Risk Elasticity have tests that validate behavior, not just coverage. They have types and linters that act as solid guardrails. They have fast, reliable CI that provide engineers with psychological safety to move fast. They have deterministic builds and the ability to cleanly reproduce, audit, and roll back as needed.

Teams without Risk Elasticity don’t move faster with AI, they just get more anxious.

2. Semantic Surface Area

At the first startup, I once had to modify how we calculated shipping costs. I found a function called processOrder(). It was roughly 800 lines long. It validated input, calculated tax, computed shipping, updated inventory, sent emails, and logged analytics events.

The actual shipping logic lived halfway down the file. It called a helper in utils/helpers.ts, which delegated to common/shared.ts, which in turn called a function in legacy/migration_temp.ts, a file that had been “temporary” for over a year. I eventually found the real logic. It was about thirty lines of straightforward code.

Could an AI navigate this? Possibly, with enough context and careful prompting. But every layer of indirection and generic naming made the task harder, for AI and humans alike. Most of the context window was wasted just figuring out where meaning lived.

At the second company, I needed to add a new discount type. I opened the pricing/ directory and found discounts.ts (all discount types and logic), calculations.ts (price calculation engine), rules.ts (business rules for when discounts apply), and their respective test files. Each discount was implemented as a class with a clear interface. I asked Claude to add a new one. It followed the existing pattern and generated a correct implementation almost immediately.

This is what I think of as Semantic Surface Area: How much meaning is visible from local context. Good Semantic Surface Area means names that reflect thoughtful domain concepts, not ad-hoc schemas in the database. It means locality of behavior, where opening a file shows you most of what you need to know. It means clear module boundaries and lightweight documentation for non-obvious decisions.

Poorly designed Semantic Surface Area looks like utils/ directories full of vague helpers, business logic smeared across layers, and no canonical source of truth. Large language models are very good at following semantic gradients. They infer patterns from naming, comments, local examples, and file structure. When your codebase is well-modeled, each token carries more meaning. When it isn’t, both humans and AI burn cycles just reconstructing intent.

Onboarding a new AI agent isn’t that different from onboarding a new engineer. If your system is legible, you get useful output quickly. If it isn’t, you don’t.

3. Socio-technical Friction

This was the hardest pattern to see, because it’s not really technical at all.

At the first startup, the biggest resistance to change wasn’t code review latency or process overhead, but rather ownership. Certain systems were “owned” emotionally by senior engineers. They’d been there early. They’d made hard calls under pressure. Over time, those decisions hardened into identity. I once proposed replacing a complex singleton structure running the Aho-Corasick algorithm with a lightweight custom string-matching implementation. The change reduced complexity and eliminated a class of bugs we’d hit repeatedly.

The response wasn’t about performance or correctness. “We’ve invested a lot in this system. It works.” The implication was clear: changing the system meant questioning the people who built it. The same reason why I had to abandon my React Native CodePush changes.

At the second company, I got a different reaction to a fundamental modeling change I proposed in my second week: “Oh thank god, that change is overdue. Can you handle the migration?”

That difference has nothing to do with intelligence. It has everything to do with ego, trust, and how tightly people tie their identity to existing systems. You can have strong Risk Elasticity and clean Semantic Surface Area, but if changes are socially expensive, AI won’t help you. The bottleneck isn’t code generation, it’s human gatekeeping.

Low-friction environments aren’t reckless. The guardrails are just encoded in software and norms, not in personal veto power. We experimented with having AI generate tests at the second company. At first they were mediocre, but over time we introduced test templates and agent instructions. Within a month, AI-generated tests became common in any notable pull request.

At the first company, a similar idea was shut down immediately: “What if the tests are wrong?” A committee discussion was proposed, and nothing shipped.

The biggest difference was our willingness to experiment with the new tools.

The Uncomfortable Truth

If your codebase is brittle, your abstractions are leaky, and every meaningful change feels socially expensive, it doesn’t matter how good the models get. You’ll still be stuck. This is a systemic problem, and it often requires top-down change. But there’s also a more direct lever, and it’s not your architecture. It’s your attitude.

I’ve watched senior engineers, smart and accomplished people, reject AI tools because they “aren’t polished” or “make too many mistakes.” These same engineers work with junior developers every day. Juniors make mistakes constantly, and we don’t reject them. We build systems that make their mistakes cheap and their learning fast.

AI is the same. It’s a junior developer that works 100x faster and costs $20 a month. If you can’t extract value from that, the problem isn’t the AI.

The best engineers I know have already adapted, removing themselves as bottlenecks. They’ve accepted a bit of messiness. They’ve given up the illusion of total control. They trust their systems to catch mistakes.

They’ve learned to be comfortable being a little uncomfortable.

Are We The Bottleneck?

Historically, software development has been limited by factors including CPU speed, memory limitations, and even organizational complexity.

In 2026, teams are further limited by their capacity for AI tool consumption. How do you sculpt an engineering organization that gets faster each time a new model is launched?

Everyone has access to the same models, so the differentiator is whether your organization can and is willing to use them, which is why Socio-technical Friction is so important. Teams that figure this out will compound their advantages. Teams that don’t will keep asking why AI “isn’t working for them.”

Don’t ask whether you’re AI-native. That’s a product question. Companies can be AI-native and still be AI-fragile. They can still fall over as new waves of AI tooling changes the very ground under which they are building.

Ask instead whether you’re AI-scalable.

Is your team able to absorb, amplify, and accelerate alongside AI capabilities, without collapsing into chaos?


Thanks to Victoria, Kevin, Jim, Mitchell, Jiang, and Grace for feedback on this piece.


← Back to home