The AI-Scalable Startup
When I joined my first startup after leaving a big tech company, I was excited to move fast. No bureaucracy, no committees, just ship code and grow. On my second day, I pushed a PR that added a new payment method to our checkout flow.
The response came back from a senior engineer: “This completely breaks our fraud detection system. Did you even read the architecture docs?”
I looked for the architecture docs. There weren’t any.
“Also, this doesn’t follow our error handling patterns.”
What patterns? I grep’d the codebase. Every file handled errors differently.
“We’re a startup,” they told me. “We don’t have time to write docs. You need to shadow someone for a few weeks to learn how things work.”
OK. I shadowed. I learned the unwritten rules. I shipped smaller PRs. I asked more questions. After two months, I finally felt productive.
Then I tried to use AI to help me refactor some of our authentication code. The AI had no idea about any of the unwritten rules. It produced clean, well-tested code that would have silently broken three different parts of the system. I spent four hours debugging it before giving up and rewriting it manually.
“See?” said the senior engineer when I mentioned it. “AI isn’t ready for real codebases.”
Six months later, I left. I joined an even smaller company—barely 10 people, pre-seed funding. I assumed it would be worse: less structure, more chaos, even fewer docs.
It wasn’t.
On my first day, I asked where the docs were. “They’re pretty sparse,” the CTO admitted, “but the tests are comprehensive. Just read the tests—they’re basically executable documentation.”
I pulled up a payment processing module. The tests were clear, complete, and actually tested behavior, not just that functions returned non-null. The module boundaries were obvious: PaymentGateway, FraudCheck, Receipt. Each one did exactly what it said and nothing else.
On day three, I shipped a PR. It passed CI. A senior engineer approved it in 20 minutes with one comment: “Nice catch on that edge case. Why haven’t we done this before?”
After a week, I tried the same experiment: could I use AI to refactor something meaningful? I pointed Claude at our user authentication module and asked it to add support for OAuth providers.
It worked. Nearly perfectly. The tests caught the two small issues. I fixed them, pushed again, and it merged. The whole thing took 90 minutes instead of the two days I’d budgeted.
That’s when I realized: the difference between these two companies wasn’t size, or funding, or even talent. It was whether their organization itself could absorb AI capabilities.
The first company couldn’t. The second could. And after spending the next few years exposed to dozens of engineering teams, I kept seeing the same pattern. Some organizations are structured to leverage AI. Others actively resist it, not through policy, but through their very architecture.
In 2024, nearly a third of all global venture capital went into AI companies. VC expectations have never been higher. At this level of competition, you need every lever working for your growth. The question isn’t “do you use AI?” anymore. Everyone does. The question is:
Can your organization absorb, amplify, and survive accelerating AI capabilities without collapsing into chaos?
Let me tell you what I learned about why some companies can and some can’t.
The Pattern Emerges
After I left that second startup, I started paying attention. I joined a consulting company that put me inside engineering teams at a dozen different companies. Fintech, healthtech, SaaS, marketplaces. Some teams could move fast with AI. Most couldn’t.
The ones that could had three things in common. I started calling them Risk Elasticity, Semantic Surface Area, and Socio-technical Friction. Let me explain what I mean by walking through each one.
Risk Elasticity: Can you ship without fear?
Back at that first startup, every deploy was an event. We deployed on Fridays at 4pm because “if something breaks, the team is around to fix it.” (This should have been a red flag.) We deployed maybe twice a week, and each deploy had a 30% chance of something breaking.
The problem wasn’t the engineers. It was the system. We had almost no test coverage. The few tests we had were flaky—they’d fail randomly, so everyone ignored them. Our staging environment was always broken because someone had deployed half-finished work three weeks ago and forgotten about it.
When you propose using AI to generate code in an environment like that, here’s what happens: The AI writes a perfectly reasonable 200-line change. No human could possibly review that thoroughly by reading it. So either you merge it on faith (terrifying) or you spend three days manually testing every edge case (which defeats the point).
Compare that to the second company. They had a simple rule: if CI passes, you can merge. CI was fast (4 minutes), comprehensive, and reliable. The test suite actually caught bugs. Type errors were impossible—they used TypeScript strictly, with no any types.
I watched a junior engineer there use GitHub Copilot to build an entire new API endpoint: route handler, validation, database query, tests. The whole thing was AI-generated. It took her 45 minutes. When she pushed it, CI caught two issues (a missed null check and an off-by-one error). She fixed them, pushed again, green. Merged.
That’s Risk Elasticity: the ability to absorb a high volume of changes—especially AI-generated ones—without everything breaking.
It’s not about being perfect. It’s about having systems that catch problems cheaply and automatically. Because here’s the thing about AI-generated code: it’s not uniformly bad, it’s randomly bad. It might be perfect 80% of the time and completely wrong 20% of the time, and you can’t predict which. If you can’t catch that 20% automatically, you’re dead.
The companies that can move fast with AI have:
- Tests that actually test behavior, not just coverage numbers
- Types and linters that act as guardrails
- Small, well-bounded modules where changes don’t cascade unpredictably
- Fast, reliable CI that you actually trust
The companies that can’t have:
- Monoliths where touching billing can break auth
- “Test environments” that are permanently red
- Feature flags glued in ad-hoc ways that make behavior impossible to reason about
In those environments, AI doesn’t make you faster. It makes you more scared.
Semantic Surface Area: Can anyone understand what’s going on?
Here’s a specific thing that happened at the first startup. I was trying to understand how we calculated shipping costs. I found a function called processOrder(). It was 800 lines long. It did everything: validated the order, calculated tax, calculated shipping, updated inventory, sent confirmation emails, logged analytics events.
Hidden on line 492 was the shipping calculation. It called a function in a file called utils/helpers.ts. That function called another function in common/shared.ts. That function called a function in legacy/migration_temp.ts (which had been there for 18 months).
I spent two days tracing through this. When I finally found the actual logic, it was 30 lines of straightforward code.
Now imagine pointing an AI at that and saying “update the shipping calculation to include Saturday delivery.” The AI would need to understand that entire 800-line function, trace through four layers of indirection, and somehow know that legacy/migration_temp.ts is the actual source of truth. It can’t. I couldn’t, and I’m human.
Compare this to the second company. I needed to add a new discount type to our pricing system. I opened the pricing/ directory. There were five files:
pricing/
discounts.ts // All discount types and logic
calculations.ts // Price calculation engine
rules.ts // Business rules for when discounts apply
discounts.test.ts
calculations.test.ts
I opened discounts.ts. Every discount type was a class with a clear interface. I asked Claude to add a new discount type. It looked at the existing classes, understood the pattern, and generated a perfect implementation in 30 seconds.
This is what I mean by Semantic Surface Area: how easy is it—for a human or an AI—to infer the meaning and structure of your code from local context.
High Semantic Surface Area looks like:
- Names that reflect concepts, not implementation accidents (
Order,Shipment,Invoiceinstead ofCoreUtil,Helper,Manager) - Locality of behavior—when you open a file, most related behavior is visible or directly referenced
- Clear domain boundaries—if you’re working on pricing, you know where pricing lives
- Some documentation for non-obvious decisions
Low Semantic Surface Area looks like:
- The
utils/directory with 40 files named variations ofhelpers.ts - Business logic smeared across frontend, backend, and database triggers
- No canonical source of truth (every team defines “user” differently)
Here’s why this matters for AI: Large language models are incredibly good at following semantic gradients. They pattern-match on naming, comments, local examples, and file structure. When your codebase is well-modeled, AI can propose coherent changes that respect your domain model.
When your codebase is a maze of indirection and generic names, AI produces code that’s technically correct but semantically nonsense.
I realized: spinning up a new AI agent is like onboarding a new junior engineer. If your onboarding story is “shadow Sarah for six months and pray,” your AI story will look the same.
Socio-technical Friction: Will anyone let this merge?
This is the one that surprised me the most, because it’s barely technical at all.
At the first startup, here’s how code review worked: You opened a PR. A senior engineer would get to it… eventually. Maybe in two days, maybe in a week. They’d leave 15 comments, most of them nitpicks about formatting or asking you to justify architectural decisions you didn’t make (because there were no docs about the architecture).
You’d address the comments. They’d come back with more comments. This would go on for 3-4 rounds. Finally, they’d approve—but now you needed a second approval from another senior engineer. That would take another few days.
One time I opened a PR to replace our homegrown date parsing logic with a standard library. The code was clearly better—less buggy, well-tested, maintained by professionals. A senior engineer blocked it: “We’ve been using our date parser for two years. It works fine. Why add a dependency?”
I tried to explain that our date parser had three known bugs and we’d spent probably 40 hours debugging issues related to it. “Yeah, but those are fixed now. And besides, this is our code. We understand it.”
That’s when I understood what was happening. It wasn’t about the code. It was about identity. The senior engineers had built this system. Criticizing the system felt like criticizing them. They’d rather defend a bad abstraction than admit something could be better.
The second company was different. Most PRs, if they passed CI, got approved within an hour or two. Reviews were collaborative: “Have you considered X?” not “This is wrong because Y.” When I suggested replacing some janky internal code with a modern external library, the response was: “Oh thank god, I’ve been wanting to do that for months. Can you handle the migration?”
Culture eats strategy for breakfast, and culture definitely eats AI adoption for breakfast.
You can have perfect Risk Elasticity and beautiful Semantic Surface Area, but if every change requires three approvals and two architecture review meetings, AI isn’t going to help you. The bottleneck isn’t code generation—it’s human gatekeeping.
Low friction environments don’t mean “YOLO to prod.” They mean the guardrails are encoded in software (tests, types, CI) instead of slow, brittle human bottlenecks.
Here’s a concrete example of how this plays out: At the second company, we started experimenting with having AI write tests. Not production code—just tests. We’d write a new feature, then ask Claude to generate comprehensive test cases.
At first, the tests were like 70% useful. We’d edit them, merge them, move on. But we did this feedback loop dozens of times. The prompts got better. The patterns got clearer. Within a month, AI-generated tests were maybe 90% useful.
Now imagine trying that at the first company. Someone would have objected: “We can’t have AI writing tests, what if they’re wrong?” (As if human-written tests are never wrong.) It would have turned into a committee discussion. Someone would have wanted a policy. The experiment would have died before it started.
The difference isn’t technical. It’s cultural. It’s trust and ego and willingness to try things that might not work.
The Uncomfortable Truth
After seeing enough teams, the pattern became clear. The teams that move fast with AI are the same teams that moved fast before AI. AI doesn’t fix a broken organization—it amplifies what’s already there.
If your codebase is brittle, your abstractions are leaky, and your process requires three senior approvals for every minor change, it doesn’t matter how good the models get. You’ll still be stuck.
This sounds like a systemic problem requiring top-down organizational change. And it is. But there’s also a more direct lever:
It’s not your architecture, it’s your attitude.
I’ve watched senior engineers—smart, accomplished people—reject AI tools because “they’re not ready yet” or “they make too many mistakes.” These same engineers work with junior developers all the time. Junior developers make mistakes constantly. But we don’t reject them. We build systems (code review, tests, mentorship) to make their mistakes cheap and their learning fast.
AI is the same. It’s a junior developer that works 100x faster and costs $20/month. If you can’t figure out how to extract value from that, the problem isn’t the AI.
The best engineers I know have adjusted their mindset. They’ve accepted that things will be messier, that they won’t have complete mental control, that some code will be generated by machines and that’s fine as long as the systems catch bad outputs.
They’ve learned to be comfortable being a little uncomfortable.
Because here’s what I think is actually happening: We’re in the middle of a bottleneck shift, the same kind we’ve seen every 10-20 years in software.
In the 1980s, the bottleneck was hardware. Memory was tiny, CPUs were slow. You wrote in assembly because every cycle counted.
In the 1990s and 2000s, the bottleneck became developer time. Hardware got cheap. High-level languages took off. We traded execution speed for development speed.
In the 2010s, the bottleneck became organizational complexity. Systems became distributed, teams became huge. We invented microservices and platform engineering to cope with coordination costs.
Now the bottleneck is AI absorption. Everyone has access to the same models. The differentiator is: can your organization actually use them?
The companies that figure this out will compound their advantages. Better systems → easier AI integration → faster shipping → better systems. The companies that don’t will wonder why AI “isn’t working for them.”
I think about those two startups a lot. Same industry, similar problems, similar funding, access to the same AI tools. But one could move fast with AI and one couldn’t. The difference wasn’t the technology. It was whether the organization was built to absorb it.
In 2024, nearly a third of all global venture capital went into AI companies. The survivors won’t be the ones who just build AI products. They’ll be the ones who consume AI products most effectively.
The question for your startup isn’t “are you AI-native?”
It’s: Are you AI-scalable?
Can your organization absorb, amplify, and survive accelerating AI capabilities without collapsing into chaos?
Because if the answer is no, it doesn’t matter how good your product is. You’re optimizing for the wrong bottleneck.
And someone else will eat your lunch.
← Back to home