The AI-Scalable Startup

When I joined my first startup after leaving a big tech company, I was excited to move fast. No bureaucracy, no committees, just ship code and grow. On my second day, I pushed a PR that updated our React Native app using CodePush—a hot update that would fix a critical crash affecting 15% of users.

The response came back from a senior engineer: “This completely breaks our feature flag system. Did you even read the architecture docs?”

I looked for the architecture docs. There weren’t any.

“Also, this doesn’t follow our error handling patterns.”

What patterns? I grep’d the codebase. Every file handled errors differently.

As a staff-level hire, I’d expected some autonomy. Instead, I found myself trying to extract information from engineers who seemed to view me as competition rather than a collaborator. “You need to figure this out,” they told me. “We don’t have time to document everything. We’re a startup.”

OK. I figured it out. I learned the unwritten rules by reading every PR from the last six months. I shipped smaller changes. After two months, I finally felt productive.

Then I tried to use AI to help me build a greenfield feature—a new analytics dashboard. The AI had no idea about any of the unwritten rules. It produced clean, well-structured code that followed industry best practices. When I opened the PR, the pushback was immediate: “This isn’t how we do things here. Why are you using AI for this? It doesn’t understand our architecture.”

I tried to explain that the architecture wasn’t documented anywhere. The response: “Exactly. Which is why AI can’t help. These tools aren’t ready for real codebases.”

Six months later, I left. I joined an even smaller company—barely 10 people, pre-seed funding. I assumed it would be worse: less structure, more chaos, even fewer docs.

It wasn’t.

On my first day, I asked where the docs were. “They’re pretty sparse,” the CTO admitted, “but the tests are comprehensive. Just read the tests—they’re basically executable documentation.”

I pulled up a payment processing module. The tests were clear, complete, and actually tested behavior, not just that functions returned non-null. The module boundaries were obvious: PaymentGateway, FraudCheck, Receipt. Each one did exactly what it said and nothing else.

On day three, I needed to refactor some Temporal workflow activities—complex async orchestration code. I shipped the PR. It passed CI. A senior engineer approved it in 20 minutes with one comment: “Nice catch on that edge case. Why haven’t we done this before?”

After a week, I tried the same experiment: could I use AI to build something meaningful? I pointed Claude at our user authentication module and asked it to add support for OAuth providers.

It worked. Nearly perfectly. The tests caught the two small issues. I fixed them, pushed again, and it merged. The whole thing took 90 minutes instead of the two days I’d budgeted.

That’s when I realized: the difference between these two companies wasn’t size, or funding, or even talent.

It was whether their organization itself could absorb AI capabilities.

The first company couldn’t. The second could. And after spending the next couple of years exposed to dozens of engineering teams across my consulting work, I kept seeing the same pattern. Some organizations are structured to leverage AI. Others actively resist it, not through policy, but through their very culture.


Patterns Across Engineering Orgs

Through my consulting work, I’ve worked deeply with engineering teams at companies ranging from early-stage startups to established tech companies. Fintech, healthtech, SaaS, marketplaces. Some teams could move fast with AI. Most couldn’t.

The ones that could had three things in common. I started categorizing the differences into three pillars, running from mostly technical to mostly behavioral: Risk Elasticity, Semantic Surface Area, and Socio-technical Friction. Let me walk through each one.

1. Risk Elasticity: Can you ship without fear?

At that first startup, every deploy was an event. We deployed Tuesday and Thursday afternoons, never near the weekend. Each deploy had maybe a 30% chance of breaking something. We’d gather on Slack, watch the monitoring dashboards, and hope nothing caught fire.

The problem wasn’t the engineers. It was the system. We had almost no test coverage. The few tests we had were flaky—they’d fail randomly, so everyone ignored them. Our staging environment was always broken because someone had deployed half-finished work three weeks ago and forgotten about it.

When you propose using AI to generate code in an environment like that, here’s what happens: The AI writes a perfectly reasonable change—maybe 50 lines, maybe 500. No human could possibly review that thoroughly by reading it. So either you merge it on faith (terrifying) or you spend hours manually testing every edge case (which defeats the point).

Compare that to the second company. They had a simple rule: if CI passes, you can merge. CI was fast (4 minutes), comprehensive, and reliable. The test suite actually caught bugs. Type errors were impossible—they used TypeScript strictly, with no any types.

I watched a junior engineer there use GitHub Copilot to build an entire new API endpoint: route handler, validation, database query, tests. The whole thing was AI-generated. It took her 45 minutes. When she pushed it, CI caught two issues (a missed null check and a type mismatch). She fixed them, pushed again, green. Merged.

That’s Risk Elasticity: the ability to absorb a high volume of changes, especially AI-generated ones, without everything breaking.

It’s not about being perfect. It’s about having systems that catch problems cheaply and automatically. Because the thing about AI-generated code is that it’s not uniformly bad, it’s randomly bad. It might be perfect 80% of the time and completely wrong 20% of the time, and you can’t predict which. If you can’t catch that 20% automatically, you’ve lost all value gained.

The companies that can move fast with AI have:

The companies that can’t have:

In those environments, AI doesn’t make you faster. It makes you more scared.

2. Semantic Surface Area: Can anyone understand what’s going on?

At the first startup, I was trying to understand how we calculated shipping costs. I found a function called processOrder(). It was 800 lines long. It did everything: validated the order, calculated tax, calculated shipping, updated inventory, sent confirmation emails, logged analytics events.

Hidden on line 492 was the shipping calculation. It called a function in a file called utils/helpers.ts. That function called another function in common/shared.ts. That function called a function in legacy/migration_temp.ts (which had been there for 18 months).

I spent two days tracing through this. When I finally found the actual logic, it was 30 lines of straightforward code.

Sure, modern AI with large context windows could probably trace this path and make the change. But it would need to load all four files into context, understand the implicit contracts between them, and hope that legacy/migration_temp.ts doesn’t have side effects that break something elsewhere. Even if it succeeds, you’re burning hundreds of tokens on indirection that shouldn’t exist. You’re also praying the AI doesn’t hallucinate which of the four layers is actually the source of truth.

Compare this to the second company. I needed to add a new discount type to our pricing system. I opened the pricing/ directory. There were five files:

pricing/
  discounts.ts      // All discount types and logic
  calculations.ts   // Price calculation engine  
  rules.ts          // Business rules for when discounts apply
  discounts.test.ts
  calculations.test.ts

I opened discounts.ts. Every discount type was a class with a clear interface. I asked Claude to add a new discount type. It looked at the existing classes, understood the pattern, and generated a perfect implementation in 30 seconds.

This is what I mean by Semantic Surface Area: how easy is it—for a human or an AI—to infer the meaning and structure of your code from local context.

High Semantic Surface Area looks like:

Low Semantic Surface Area looks like:

Here’s why this matters for AI: Large language models are incredibly good at following semantic gradients. They infer patterns from naming, comments, local examples, and file structure. When the codebase is well-modeled, AI can propose much more coherent changes that respect your domain model. Context windows are used efficiently because each token carries more meaning.

When your codebase is a maze of indirection and generic names, AI produces code that’s technically correct but semantically nonsense.

Spinning up new AI agents is a lot like repeatedly onboarding new engineers. If your onboarding process is straightforward and self-servable, you’ll need much fewer iterations to get the PRs you want.

3. Socio-technical Friction: Will anyone let this merge?

This is the one that surprised me the most and took the most reflection to arrive at, because it’s not really technical.

At the first startup, code review wasn’t the bottleneck—ego was. Reviews happened fast enough. But the culture was defensive. Every suggestion to improve something was met with resistance.

I remember trying to replace a homegrown string matching implementation with the Aho-Corasick algorithm from a well-tested library. Our implementation was O(n×m) for every search. The library was O(n+m) and battle-tested across millions of deployments. The performance improvement would have been 10-100x for our use case.

The senior engineer who’d written the original code blocked the PR: “I don’t see why we need this. Our version works fine. Adding dependencies adds risk.”

I pointed out that our version had already caused two production incidents where searches timed out. “Those were edge cases. We fixed them. Besides, I understand our code. I don’t understand this library.”

That’s when I understood what was happening. It wasn’t about the code. It was about identity. The senior engineers had built this system. Replacing their work felt like a referendum on their competence. They’d rather defend a suboptimal solution than admit something could be better.

When I later suggested using AI to help generate test cases for a new feature, the response was immediate pushback: “We can’t have AI writing tests. They’re clearly going to be flaky. What if they’re wrong?” (As if human-written tests are never wrong.) Someone demanded we establish a formal policy on AI tool usage before proceeding. The experiment died in committee.

The second company was different. Most PRs, if they passed CI, got approved within an hour or two. Reviews were collaborative: “Have you considered X?” not “This is wrong because Y.” When I suggested replacing some janky internal code with a modern external library, the response was: “Oh thank god, I’ve been wanting to do that for months. Can you handle the migration?”

More importantly, when we started experimenting with AI-generated tests, nobody panicked. “Let’s try it and see,” the CTO said. We did. At first, the tests were maybe 70% useful. We edited them, merged them, moved on. We did this feedback loop dozens of times. The prompts got better. The patterns got clearer. Within a month, AI-generated tests were maybe 90% useful.

Culture eats strategy for breakfast, and culture definitely eats AI adoption for breakfast.

You can have perfect Risk Elasticity and beautiful Semantic Surface Area, but if every change requires navigating fragile egos and defensive territoriality, AI isn’t going to help you. The bottleneck isn’t code generation, it’s human gatekeeping.

To be clear, low-friction environments don’t mean “YOLO to prod.” They mean the guardrails are encoded in software (tests, types, CI) and clearly defined processes, instead of slow, arbitrary, and brittle human bottlenecks.

The heart of AI-scalable companies isn’t a technical factor, but rather a cultural one.

It’s trust, low ego, and willingness to try things that might not work.


The Uncomfortable Truth

If your codebase is brittle, your abstractions are leaky, and your process is ruled by defensive ego, it doesn’t matter how good the models get. You’ll still be stuck.

This sounds like a systemic problem requiring top-down organizational change. And it is. But if you want a single, more direct lever to guide you and your team toward an AI-scalable way of working, consider this:

It’s not your architecture, it’s your attitude.

I’ve watched staff engineers—smart, accomplished people with long FAANG tenures—reject AI tools because “they’re not ready yet” or “they make too many mistakes.” Yet these same engineers work with junior developers all the time, and junior developers make mistakes constantly. We don’t reject them. We build systems (code review, tests, mentorship) to make their mistakes cheap and their learning fast.

AI is the same. AI is a junior developer that works 100x faster and costs $20/month. If you can’t figure out how to extract value from that, the problem isn’t the AI.

The engineers I respect most have already adjusted their mindset. They’ve accepted that things will be messier, that they won’t have complete mental control, that some code will be generated by machines and that’s fine as long as the systems catch bad outputs.

They’ve learned to be comfortable being a little uncomfortable.

And that removes them as the bottleneck in the equation.

The Software Development Bottleneck is You

We’re in the middle of a bottleneck shift, the same kind we’ve seen every 10-20 years in software.

In the 1980s, memory was tiny, CPUs were slow. We wrote in assembly or very low-level C because every cycle counted. The scarce resource was compute, and nothing else was reasonable.

In the 1990s and 2000s, hardware got dramatically faster and cheaper. High-level languages like Python, Ruby, and Java took off, trading execution speed for development speed. This pattern was viable for the first time, because most tasks became reasonably fast enough with better processors. The scarce resource became human developer time, not CPU cycles.

In the 2010s, with the rise of the internet giants, systems became distributed, products became SaaS, teams became huge. We invented microservices, strongly typed APIs, and a cottage industry of “platform engineering” to cope with the Mythical Man-Month problem: adding people adds communication overhead non-linearly. The scarce resource shifted to coordination and cognitive load.

In 2025, the bottleneck is AI absorption. Everyone has access to the same models. The differentiator is: can your organization actually use them?

The companies that figure this out will compound their advantages. Better systems → easier AI integration → faster shipping → better systems. The companies that don’t will wonder why AI “isn’t working for them.”

I think about those two startups a lot. Same industry, similar problems, access to the same AI tools. But one accelerated with AI, the other stagnated. The difference wasn’t the technology. It was whether the organization was built to grow alongside AI.

Don’t ask “Are you AI-native?” because that’s merely a product question.

Instead ask “Are you AI-scalable?” which is the organizational question.

Can your team absorb, amplify, and survive accelerating AI capabilities without collapsing into chaos?

Because if the answer is no, it doesn’t matter how good your product is. You’re optimizing for the wrong bottleneck, and your trajectory will falter.

And someone else will eat your lunch.


← Back to home