The Sprint Is a Beautiful Artifact

This post continues a thread I've been pulling on for the past few weeks — how AI is reshaping not just how we write code, but everything we've built around it. First it was the tools. Then it was the role. Then it was the entire vendor ecosystem. This week I want to talk about something that cuts across all of those: measurement. Specifically: we're still measuring things that no longer matter. And the instruments we're using to do it are starting to actively mislead us.

Dead Reckoning

I'll be honest — I learned about dead reckoning from a Tom Cruise movie, not a maritime history class. In Mission: Impossible – Dead Reckoning Part One, the concept anchors the entire plot. The fact that a film about a sentient AI consuming and outmaneuvering everything in its path gave me the central metaphor for a piece about AI-driven development is not lost on me. Ethan Hunt would appreciate the irony. Maybe.

Here's the actual concept, for those who — like me — are learning navigation history through action cinema: before GPS, sailors calculated position from proxies. Last known location, speed, heading, time elapsed. The math was sound. The problem was it was optimized for the variables available, not the variables that actually mattered.

GPS didn't make dead reckoning faster. It made it irrelevant, because the constraint — knowing your position — moved. You no longer needed to calculate from proxies. You could just know.

The sprint is our dead reckoning system. Story points, velocity, burndown charts — these are instruments designed to estimate position from proxies. We couldn't know how long software would take to build, because building was hard and uncertain. So we built a system to make our best guess, track progress against it, and adjust over time. The math was right. The methodology was thoughtful.

The problem is that the constraint — human coding speed and capacity — is no longer the limiting variable.

And yet most teams are still navigating by dead reckoning.

The Evidence Is Already In

A team at METR, an AI safety research organization, ran one of the most controlled studies of AI-assisted development published this year. They recruited sixteen experienced developers working on their own open-source repositories — people who knew their codebases deeply — and randomized which tasks they could use AI for and which they couldn't.

The result: developers using AI tools took 19% longer than without them. Not faster. Slower.

Here's the part that matters most: those same developers predicted, before the study, that AI would make them 24% faster. After experiencing the slowdown firsthand, they still believed AI had sped them up by 20%.

The instruments were lying. Or more precisely: the developers had no reliable instruments at all. They were navigating by feel, and their gut was confidently wrong by 40 points.

This isn't an argument against AI tools. It's an argument for measurement that actually reflects what's happening. If experienced engineers can't tell whether their primary tool is helping or hurting — while actively using it — then the question of how we measure velocity in an AI-assisted world isn't academic. It's urgent.

Renaming Isn't Rethinking

AWS recently published a framework called the AI-Driven Development Lifecycle. It proposes replacing sprints with "bolts" — shorter cycles measured in hours or days rather than weeks. It's worth reading.

But my honest reaction: it's renaming the furniture. Bolts instead of sprints is still batch-mode thinking applied to a world where batch-mode delivery isn't the constraint. You can make your units of work smaller and faster and still not address the fundamental question of what to measure and why.

Thoughtworks has gone deeper. They've put Spec-Driven Development on their Technology Radar as an emerging practice — define the specification with enough precision that an AI agent can execute on it, with serious requirements analysis and human governance baked in. They've named the thing that actually matters: specification quality. And they've started building a discipline around it.

There's a startup called Tessl, sitting on $125 million in funding, whose thesis goes further still: the specification is the only human-maintained artifact. Code is entirely machine output. Humans never touch it. If they're right, we're not just talking about a new measurement framework. We're talking about a different definition of what software development is.

The field is moving. But it's moving at the tool layer and the process layer. Nobody has seriously answered the measurement question: what do we actually track now?

Where Risk Actually Lives

Here's something about sprints and estimates that doesn't get said enough: they were never really about measurement. They were about risk management.

The estimate was how you said "we're not sure this is possible in the time you want, and here's what we know about the uncertainty." Velocity was how you made that uncertainty visible and manageable over time. The sprint was a container for absorbing risk in a controlled cadence.

That risk hasn't disappeared. It's relocated.

Risk used to live in the execution layer — can we build this, how long will it take, what will we hit along the way? AI has compressed that dramatically for well-understood problems. But it's expanded something else: the specification layer. When you can build the wrong thing perfectly in an afternoon, the cost of a mis-specified feature isn't a slow slog toward something mediocre. It's a fast sprint toward something useless.

"Is what we built actually right?" didn't used to need its own risk management framework, because building was slow enough that misunderstandings surfaced along the way. Now it does.

The sprint ceremonies that have survived into AI-assisted teams often feel like ritual without insight. The cadence persists. The vocabulary persists. But the thing they were designed to surface — uncertainty about whether humans could build the thing — is no longer where the uncertainty lives. Teams are holding retrospectives about execution when the real question is specification. They're tracking velocity when what they should be tracking is alignment.

A Different Scoreboard

The new metrics that matter aren't exotic. They're logical extensions of what we already care about, recalibrated for where the constraint actually lives.

Spec-to-ship cycle time is the new velocity. Not lines of code, not story points, not tasks closed. The interval from a human committing a specification to working code in production. This is a system-level metric, not an individual one — which is appropriate, because the system is increasingly the unit of production.

First-pass acceptance rate is the new quality signal. What percentage of AI-generated outputs pass human review without requiring the specification to be revised? High first-pass rate means the human-AI translation is working. Low means there's ambiguity, missing context, or the spec wasn't precise enough. Track this by feature, by team, by individual over time. You'll learn more about where friction actually lives than any burndown chart ever told you.

Rework ratio tells you whether you built the right thing. How often does production code require correction not from a deployment failure, but from misalignment between what was specified and what users actually needed? DORA quietly added a new metric this year — Rework Rate — which is the closest the mainstream methodology community has come to acknowledging that this is where the bodies are buried now.

Customer outcome metrics should ultimately arbitrate everything else. Did the thing we shipped move the number it was supposed to move? In a world where the cost of building drops dramatically, the question of whether we built the right things becomes the primary accountability mechanism. Velocity metrics made sense as a proxy for value when value was hard to deliver quickly. The proxy is no longer necessary.

DORA's four core metrics — deployment frequency, lead time, change failure rate, mean time to recovery — still matter. If anything, they matter more in an AI-accelerated environment, because the cost of shipping the wrong thing quickly is higher than shipping the right thing slowly. But they're the floor now, not the ceiling. They tell you how healthy your delivery pipeline is. They don't tell you if you're building anything worth delivering.

The One-Shot Rate

Here's the metric nobody's talking about yet. I think it's going to become the most revealing signal in the stack.

What percentage of specifications an engineer writes produce correct, production-ready code on the first pass — without requiring the spec to be revised, the prompt rewritten, or the output substantially reworked?

Call it the one-shot rate. I think it's going to become the defining measure of engineering skill in the next five years.

The engineer who can write a specification for a feature, a module, a system — one that requires minimal iteration to produce correct, maintainable, appropriately-scoped code — is doing something genuinely difficult. It requires knowing the problem space, the domain constraints, the existing system context, the edge cases, and the definition of done well enough to express all of it precisely before a single line of code is written. That's not a prompt engineering trick. That's engineering judgment made legible.

We've always had ways of distinguishing engineers who understood deeply from engineers who executed competently. Specification clarity is just the new surface where that distinction shows up. If you're hiring right now, I'd start asking about this explicitly: "Walk me through a spec you wrote that needed significant revision to produce the right output. What did you learn?" The answer will tell you more than a LeetCode score.

What Coding Standards Mean Now

One more thing that needs rethinking, and I'll keep it short: if AI is writing most of the code, what are coding standards actually for?

Historically, standards existed to make human-authored code legible to other humans. Style guides. Linting rules. Pattern conventions. That purpose doesn't disappear — but it shifts. In a world where AI generates most of the code and humans are reviewing it, standards become contracts more than conventions. The question isn't "how should we format this?" It's "what shape does any output need to conform to before it's acceptable?"

Security requirements, integration constraints, performance envelopes, architectural boundaries — these are the standards that matter in the new model, because they're what the AI needs to operate within and what the human reviewer needs to verify against. The line-by-line style review is becoming less relevant. The contract-level compliance check is becoming more important.

Most teams' standards documentation still reads like it was written for a human author. It should probably read more like a spec.

The Instruments We Need

Here's where I keep landing on all of this.

Every system we built around software development — the sprint, the velocity chart, the story point, the burndown — was optimized for a specific constraint. Human coding speed and capacity was the bottleneck, and we built thoughtful, functional instruments for measuring and managing it.

The constraint moved. The instruments haven't.

The METR study is what you get when instruments lie: experienced engineers, confidently wrong about whether a major change to their workflow is helping or hurting. Sprint ceremonies that have become ritual without insight. Velocity charts that measure commitment and completion without touching the question of whether we built the right thing at all.

The organizations that lead the next decade aren't the ones that adopted AI the fastest. They're the ones that do the harder work: retire the dead reckoning system, figure out what they actually need to know, and build measurement frameworks calibrated for where the constraint lives now.

The frameworks are being sketched. The white papers are starting to appear. The field is naming the problems.

But the measurement question — what we actually track, and why — is still largely unanswered.

That feels like an opportunity to me.