Spec-Driven Development: Everything Old Is New Again

April 28, 2026

Software engineering as we know it is being replaced. Not by AI. By specs.

“Given… When… Then… | claude” is a workflow, but it’s also a forcing function: you have to actually mean what you write.

The thing nobody talks about

Getting specs right is hard. Harder than code.

Code is honest - it runs or it doesn’t. A vague spec just produces confident-sounding garbage, and now the garbage compiles.

What’s actually hard about it:

knowing the why before the how
writing down the failure cases, not just the happy path
treating it as a live document, not a kickoff artifact you forget about a week later

And the thing people keep confusing: a spec is not documentation. Documentation describes what already exists. A spec describes what must be true. Everything else - docs, tests, the code itself - can be regenerated from it. If it can be regenerated, it’s not the asset. The spec is.

Codegen has been spec-driven for decades

Before we even get to TDD, there’s a whole lineage of tools that did exactly this: you write a spec, the code is generated from it, and you never touch the output.

It started with IDLs - interface definition languages. CORBA in the early 90s. You wrote a .idl file describing your interfaces, data types, and operations in a language-neutral format. Then you ran a compiler and got Java, C++, or whatever you needed on the other end. The IDL was the truth. The generated code was an artifact you threw away and regenerated whenever the spec changed.

WSDL did the same thing for the SOAP era. Painful to write by hand, but the idea was sound: describe your web service in XML, generate the client and server stubs, ship it. The spec lived in the .wsdl file. The code was output.

Google built Protocol Buffers internally around 2001 and open-sourced it in 2008. Define your messages and services in .proto files, run protoc, get code in whatever language you need. gRPC sits on top of that today and the model hasn’t changed. Facebook had the same problem and built Thrift around 2007 - same pattern, write a .thrift schema, generate code in a dozen languages, never edit the generated files.

OpenAPI brought it to HTTP APIs. Swagger/OpenAPI specs let you describe an entire REST API in YAML, generate server stubs, generate clients, generate documentation. Tools like openapi-generator spit out thousands of lines you’d never write by hand.

GraphQL took a different angle but same principle: the schema is the contract. You define types and resolvers in the schema language, and a whole ecosystem of tools generates typed clients, mocks, validators from it.

The pattern is identical every time. Define the contract in a language designed for specification, not implementation. Generate everything else. Treat the generated code as read-only.

There’s a warning buried in this history though. Autoconf and automake - the GNU build tools from the early 90s - were supposed to solve portability through specification. You wrote a configure.ac describing what your build needed, and it generated Makefiles for whatever platform you were on. In theory: spec-driven, clean. In practice: people started generating the configure.ac too. Then scripts to generate the scripts. The abstraction became so leaky and the mental model so lost that most projects treated the whole thing as a black box nobody touched. It worked, in context, but the understanding had long since left the building.

The same thing is starting to happen with AI specs. “The spec is wrong” becomes “let’s generate the spec from the existing code” becomes “let’s have the AI maintain the spec” becomes nobody actually knows what the system is supposed to do anymore. The spec is downstream of the implementation again, just with more steps in between. All that meta-activity is a tell that the hard part - actually thinking it through - got skipped.

It worked because the scope was contained. IDLs describe interfaces. Proto files describe messages. OpenAPI specs describe HTTP endpoints. The spec had a clear boundary and the codegen had a clear job.

What’s different now is that the scope is everything. Not just the interface layer - the whole system. That’s what makes it hard, and that’s what makes the discipline of writing good specs matter so much more.

We’ve been here before

None of this is new. That’s the point.

In the early 2000s, Kent Beck pushed test-driven development as a design discipline. Write the test first. Let the failing test tell you what to build. The test is a spec. The code is the answer to the spec. Simple.

Then Dan North took it further. Around 2003 he noticed that teams struggled with TDD not because writing tests was hard, but because they didn’t know what to test. The framing was wrong. “Test” implied verification after the fact. What he wanted was specification before the fact.

So he invented BDD - behaviour-driven development. And with it came a language: Given… When… Then. A format any stakeholder could read. A format that forced you to describe behaviour in terms of observable outcomes, not implementation details.

Aslak Hellesøy turned it into a tool. Cucumber, first released in 2008, let you write those Given/When/Then scenarios in plain English and run them as automated tests. Suddenly the spec was the test. The same document that the product manager wrote was the thing that turned red or green.

For a while it caught on. Teams wrote feature files. Business analysts collaborated with developers. Everyone was excited.

Then it quietly died in most places. Not because the idea was wrong - because the discipline was hard. Writing good Gherkin is genuinely difficult. Keeping it in sync with reality is harder. Most teams ended up with hundreds of brittle scenarios that nobody read and everybody feared changing. The spec became documentation again. Dead on arrival.

So we went back to writing code and hoping the tests would catch up.

Fast forward to now. The Given/When/Then format is back - except instead of running against hand-written step definitions, it’s running against Claude. The spec isn’t compiled into a test harness. It’s interpreted by a model that can fill in the gaps.

Which sounds like progress, and it is. It also drags back every problem the BDD crowd hit, plus a few new ones.

Infrastructure figured this out in 2013

Chad Fowler wrote about this in February. His argument: in-place mutation is the enemy of understanding. We learned that with servers. Snowflake servers. Config drift. Hand-applied fixes no one remembered.

So we stopped patching and started replacing. The server wasn’t the thing. The capability to regenerate was the thing.

He’s now saying the same is true for code. Editing AI-generated code in place is the software equivalent of SSHing into prod and tweaking a config file. You’re creating legacy systems in days instead of years.

The new rule: never upgrade code in place if you can regenerate it instead.

So now what runs in production?

This is where it gets messy and nobody has clean answers yet.

Which version is actually running? When was the spec last changed? When was the image built? Do those match?

We’re back to golden image thinking. Containers solved this for infrastructure - you shipped the whole thing, immutable, tagged. Specs + codegen might need the same treatment. The spec is the Dockerfile. The generated code is the image. You ship the image, not the edits.

But the tooling isn’t there yet. Right now it’s people grabbing at whatever sticks. That’s fine - the tools will catch up fast. The thinking has to come first.

What this means for teams

The job isn’t writing code anymore. It’s writing things that are true.

That’s a different muscle. Most teams haven’t built it. And here’s the uncomfortable bit - they should have. Writing clear requirements, thinking through failure cases, separating what from how: that’s just good engineering. If a team never developed that discipline, they weren’t particularly good at their job to begin with. AI didn’t create that gap. It just made it visible.

That’s why a lot of people are about to get swamped. Not because the technology is hard, but because the hard part was always the thinking and they’d been quietly outsourcing it to the code for years.

The same pattern shows up in how people talk about agents. Laypeople hear “autonomous AI agent” and imagine something between a robot and magic. A seasoned distributed systems engineer looks at it and thinks: cron job. Maybe a queue trigger. An event hook. A worker process that runs when something happens and stops when it’s done. The hype is a wrapper around concepts that have existed for decades. The engineer shrugs. So what.

That gap - between the mystified and the shrug - is where the real sorting will happen. Not AI literacy in the abstract, but whether you have enough foundation to see through the framing to the actual problem underneath.

Human time, computer time, garden time

Joe Beda, one of the people who built Kubernetes, put it simply: what used to happen in software engineering in “human time” can now happen in “computer time”. That’s the real shift. Not that AI writes code - it’s that the iteration loop collapsed. Hypothesis, experiment, result, adjust: that cycle used to take days or weeks because humans were in every step. Now it takes seconds.

That changes what makes sense. Command-and-control management of software feels off when the cost of trying something dropped to near zero. So you experiment instead. You garden.

But gardening has a trap. Peter Seibel, who ran Engineering Effectiveness at Twitter, wrote an essay called “Let a 1,000 flowers bloom. Then rip 999 of them out by the roots.” His point wasn’t that you should let everything grow. It’s that Twitter did - Scala written by people who wished it was Ruby, Scala written by people who wished it was Java, Scala written by people who wished it was Haskell, multiple monorepos, build systems that couldn’t talk to each other - and nobody’s job was to stop it. The garden became a jungle.

The second half of the title is the whole argument. Yes, run experiments. Yes, let a thousand flowers bloom. But then you have to do the hard thing: pick one, kill the rest, and make it official. That requires exactly the discipline the AI era keeps threatening to erode - someone who knows what the system is supposed to do and is willing to write it down and defend it.

Which brings it back to the spec.

Reimplementations are now a matter of days and tokens

Antirez - the person who built Redis - stopped resisting and wrote about it. A C BERT inference library in five minutes. His own Redis Streams redesign reproduced from a design document in twenty minutes. His conclusion: writing code yourself is no longer sensible for most projects. The fun moved. It’s now in knowing what to build.

The follow-up is the sharper point. Stallman built the entire GNU userspace by reimplementing UNIX - same behavior, fresh code, different license. Linus did the same with the kernel. Nobody calls that theft. Reimplementations were always legally and ethically fine. What AI changed is the cost. A reimplementation that used to take a team and months now takes an agent and a few days. The barrier was economic. That barrier is gone.

Which means: if your value is in code that can be described and re-derived from a description, that value is fragile. The spec, the design, the judgment about what to build - those are harder to replicate. The code itself is not.

It’s not about productivity. It’s about what counts as a moat. The answer is shifting toward: the spec.

Everything old is new again

Immutable infrastructure. Immutable code. Immutable specs.

The unit of deployment is changing. The thing that survives is the thing that was true enough to be written down.

Where this leaves you

Two camps are forming, fast.

If your team already wrote down behaviour before code, treated specs as live documents, argued about what before how - you’re going to have a great few years. The leverage just went up by an order of magnitude, and the muscle you spent years building is suddenly the most valuable thing in the room.

If your team never built that muscle, the gap is going to widen quickly. Not because the AI part is hard - the AI part is the easy part. The hard part is the discipline most teams skipped. That’s the part that now matters most, and it’s not something you pick up in a week.

If you’re reading this and recognising your team in the second camp, this is the moment to do something about it - not in six months. We’ve spent years working with teams on exactly this: requirements, contracts, specs, the discipline of writing things down so they stay true. Get in touch and let’s figure out where you actually stand and what to do about it.

Crafted with Hermes Agent and Claude Code.

Jan is Co-founder and Cloud Consultant at superluminar and AWS Certified Solutions Architect Professional. He writes here about AWS-specific topics. He can be found under the name @bracki on Twitter.