Chapter 07

Process Mastery

The Spec-Driven Sprint: Six Stages from Stakeholder Pain to Agent-Built Product

March 30, 202610 min read

Part 7 of the Agentic-Oriented Development series

The Chief Risk Officer wasn't angry. He was exhausted.

I was interviewing him at a large investment company, trying to understand how his team shared risk data across the organization. The answer surprised me. Once a month. That was it. Twenty risk analysts relied on a single Access database running on a computer under a desk. The process took days to compile. Any failure during that compilation meant starting the entire report over from scratch. And the data was already 30 days stale by the time anyone saw it.

His frustration was obvious. But here's what I didn't do. I didn't open a tool and start building.

Instead, I ran a process. I interviewed the CRO to understand the actual pain, not the assumed solution. Then I wrote a business proposal. Not a requirements document. A specification. It defined what success looked like before anyone wrote a line of code.

The spec was specific:

Success criteria: Risk data visible within 24 hours of market close. System resilient to single-point failures. Backed up and recoverable without manual intervention.
Inputs: The existing Access database schema, the Azure cloud environment, the CRO's reporting requirements.
Validation gates: A data integrity audit comparing old outputs to new, CRO sign-off on the dashboard UX, and an IT security review of the cloud configuration.

I walked the proposal through budgeting gates to secure corporate funding. Then I assigned my lead developer, UX designer, and full stack engineer to build out a plan. Not to build the product. To build the plan.

In less than three months, we moved that team of 20 analysts from monthly stale reports onto a mobile application with a cloud-native backend. The business went from a 30-day delay on risk data to 24 hours. The system was now resilient and backed up. No one's desktop computer was a single point of failure anymore.

The CRO never asked for a mobile app. He described pain. The process translated pain into a spec, and the spec into a product. Today, when you work with AI coding agents, the same discipline determines whether you ship or spin.

Specifications That Agents Can Execute

The distinction between a spec and documentation is where most developers get stuck. Chapter 6 introduced the shift from prompt thinking to workflow thinking. Now I want to make it precise.

Documentation tells humans what was built. Specs tell agents what to build. A spec constrains decisions before they happen, which makes it fundamentally different from a design doc that a human interprets after the fact. The spec IS the instruction set.

What makes a spec executable? Three properties:

Inputs are explicit. The agent knows what it's working with. No guessing about the tech stack, existing schemas, or environmental constraints.
Success criteria are testable. Every criterion can become a validation gate or automated test. If you can't write a test for it, it isn't a criterion. It's a wish.
Handoffs are defined. The spec says which agent (or human) owns which piece and where the boundaries are.

Here's a side-by-side comparison that makes this concrete.

Documentation (descriptive, for humans): "Authentication uses JWT tokens with 1-hour expiry."

Specification (prescriptive, for agents):

Input: Existing auth module, current token schema

Success criteria:
1. Token refresh endpoint returns new token within 500ms
2. Expired tokens trigger re-login flow with user-visible message
3. Security agent validates token storage location, expiry
   handling, and secrets exposure before merge

Handoffs:
- Backend agent implements refresh logic
- Frontend agent handles re-login UX
- Security agent reviews before merge

An agent reading the documentation knows WHAT exists. An agent reading the spec knows WHAT to build, HOW to validate it, and WHERE its boundaries are. That difference is everything.

This is what I call intent preservation. Specs don't just instruct agents on what to build. They anchor the original goal, the why behind the work, so it survives context resets and agent handoffs. When a context window clears or a new agent picks up the task, the spec carries the intent forward. Without that anchor, each new session starts from "what does the code do now?" instead of "what were we trying to achieve?"

Think back to Chapter 2. Specs prevent context pollution (low-value information crowding out what the agent actually needs). Without a spec, the agent fills its context window with assumptions, clarifying questions, and false starts. With a spec, the agent's context is scoped to exactly what it needs.

Specs are context hygiene.

And remember Chapter 5. Specs prevent the Swiss Army Agent anti-pattern. When you define handoffs in the spec, you naturally decompose work into specialist roles instead of dumping everything into one agent.

The vibe coding contrast makes this visceral. Imagine vibe coding that same authentication feature. "Add JWT token refresh." The agent asks eight clarifying questions. You answer five. The agent implements one interpretation. You iterate three times. The final version works, but nobody documented what "works" means. Six months later, a different agent refactors the module and breaks token expiry handling because there was no spec to validate against.

At small scale, vibe coding looks fast. At any real scale, it collapses.

Building Your First Spec-Driven Workflow in a Single Sprint

So where does the discipline come from?

Toyota formalized this kind of thinking decades ago. Their A3 problem-solving method requires teams to define the problem, root cause, and proposed countermeasure on a single page before taking action. It forces clarity before commitment. Spec-driven development applies that same lean discipline to agentic workflows. The difference is not in the thinking but in who executes. Structured specs reduce the interpretation surface so agents can act on them with far less ambiguity than humans navigating a design doc.

The six stages below map directly to what I did with the Investment Risk team. But they also map to how you should approach any agentic project. I call this the Agentic Development Lifecycle (ADLC): Discover, Define, Plan, Build, Deliver, Document. One lifecycle replacing the traditional two (product and software). Six stages, each a verb, because each requires deliberate action rather than passive documentation.

Step 1: Discover (Day 1)

In the Investment Risk story, I interviewed the CRO. I learned the real problem. He didn't need a faster Access database. He needed timely risk data that could survive a hardware failure. The stated problem and the actual problem were different.

In agentic development, the same principle holds. Before prompting, interview your stakeholder. Even if the stakeholder is you. What is the actual pain? What outcome matters?

I have found that agents can already help at this stage. Before writing the spec for the Investment Risk project, imagine prompting a PM agent: "The CRO wants monthly risk reports to update daily. Is that the right problem?" A good PM agent pushes back. "Risk of solving the wrong problem: High. Daily updates may not matter if data quality is poor or reports aren't actionable. Recommended: interview the CRO about decision latency, not just data freshness." That single reframe changed the spec from "make it faster" to "make it useful." This concept of using agents as stakeholder proxies deepens considerably in Chapter 8 when we cover Orchestration.

This is NOT user story grooming. In traditional Agile, you groom stories with humans. Here, you interview stakeholders AND agents. The agent brings a different perspective. It catches assumptions you didn't know you were making.

Step 2: Define (Day 1-2)

In the Investment Risk story, the business proposal had clear outcomes. Thirty-day delay reduced to 24 hours. Resilient and backed up. No single point of failure.

I have found that the hardest part of agentic development isn't the technology. It's getting teams to write success criteria before they touch their agent. It feels like overhead until the first time a vague goal sends an agent down a three-hour rabbit hole. After that, everyone writes criteria.

Not paragraphs. Testable statements. "Users can see risk data within 24 hours of market close" beats "improve data freshness." The first is testable. The second is a vibe.

The difference between these criteria and traditional acceptance criteria matters. Traditional acceptance criteria describe what a human QA team will check after the build. These success criteria are written so agents can validate autonomously during the build.

Step 3: Plan (Day 2-3)

In the Investment Risk story, I assigned a lead developer, UX designer, and full stack engineer with clear roles. Nobody wondered whose job was whose.

In agentic development, decide which specialists own which pieces. Frontend agent handles UI. Backend agent handles API. Security agent reviews auth.

Chapter 3 showed what happens when skilled teams build without a framework. The Investment Risk story shows what happens when process comes first.

The Governance Triad we introduced in Chapter 2 governs this step.

PM validates value. "Is this the right problem to solve?"
Architect validates architecture. "Does this fit the system?"
Team Lead validates implementation. "Can we build this in the sprint?"

Each triad member has veto authority within their domain. This is not bureaucracy. It is the separation of concerns applied to process.

I have seen this play out both ways. At one point, an architect vetoed what looked like a simple database change because it would have exposed sensitive data through a new API surface. The PM wanted it shipped by end of sprint. The architect said no. The team added the missing access control and shipped a few days later. But I have also seen a PM override an architect, ship fast, and spend the next sprint patching the security hole. The governance exists because individual pressure to ship will always outweigh collective caution unless you give caution a voice.

In traditional sprints, you assign tickets to humans who interpret them. Here, you assign scoped context to specialist agents with defined boundaries. That's the difference.

Step 4: Build (Day 3-8)

Agents execute against the spec, not against a loose prompt. This is where the spec earns its keep. I get it, this feels like overhead in the planning stages. But watch what happens when agents have real constraints to work within.

Consider how an agent reads and acts on a specification:

Spec excerpt:
  Success criterion: Token refresh fails gracefully
  with user-facing error message.
  Constraint: No silent auth failures.

The agent reads this and implements a try/catch with a custom error message, logs the failure for observability, and prevents the silent auth failure that would have gone unnoticed without the spec. Without it, the agent might have implemented a silent retry, technically functional but invisible to the user and impossible to debug in production.

And specs aren't just happy-path documents. In a source code migration at a large financial services company, we didn't just define what success looked like. We defined what failure recovery looked like. The rollback section of the spec said: "If >5% of repos fail to migrate: (1) Restore from hidden backup org, (2) Notify stakeholders via automated alert, (3) Revert DNS to old source code repository within 15 minutes." We built recovery before we needed it. That migration hit 100% success, zero lost repositories, zero downtime, which we attribute to the spec making failure recovery a first-class section rather than an afterthought.

I'll be honest. Spec-driven development has friction. On fast-moving prototypes where you're still discovering what the product even is, writing a full spec feels premature. But even there, a three-line success criteria list beats a blank prompt. The discipline scales. The absence of it doesn't.

Step 5: Deliver (Day 8-10)

Walk each success criterion. Did we hit it? Risk data visible within 24 hours? Check. Resilient to single-point failures? Check. Backed up and recoverable? Check.

Failed criteria don't mean start over. They become the input for the next iteration. From my experience, a criterion you missed is almost always a specification gap, not a project failure. The spec was incomplete, not the team.

This is where most teams stop. Chapter 9 shows how to make this systematic with evaluation loops that improve automatically.

Step 6: Document

The lifecycle doesn't end at delivery. What was built, why it was built that way, what alternatives were rejected, what constraints shaped the decisions. All of this is volatile knowledge trapped in a context window that will reset the moment the session ends.

Document is the stage that closes the loop. You serialize what the team learned into persistent, AI-readable artifacts so the next session, the next agent, or the next developer starts where this one left off instead of starting from scratch. Without it, every new session pays a context reconstruction tax, spending its first chunk of time re-discovering decisions that were already made.

Chapter 11 goes deep into Knowledge Engineering, the full framework for converting volatile session context into persistent knowledge that compounds across sessions.

Six stages, and every one of them feeds what comes next. Delivery without criteria is guesswork. Criteria without a plan create gaps. Plans without discovery solve the wrong problem. Documentation without delivery captures nothing worth preserving. The ADLC holds together because each stage constrains the next.

The Skill the Other Three Depend On

Chapter 6 called Process Mastery foundational. Here's what that looks like when it's missing.

Without process, orchestration is chaos. You spin up a frontend agent and a backend agent with no defined handoff. Both assume the other handles auth token validation. The result is a security gap that neither agent flagged because neither agent knew it was their job.

Without process, evals have no baseline. You run an eval. Response time is 200ms. Is that success or failure? Without a success criterion defined in the spec, you literally cannot answer.

Without process, TDD has no specification. You write a test for "user login works." But what does "works" mean? Token persistence? Session expiry? Graceful failure on bad credentials? The spec provides the definition that the test encodes.

The best engineering leaders I've worked with share a common trait, one I referenced in Chapter 6 from Jim Collins. A culture of discipline. Disciplined people, disciplined thought, disciplined action. Process is the first discipline. Without it, disciplined orchestration and disciplined evaluation have nothing to anchor to.

And there's a security dimension. Validation gates in the spec become security checkpoints. "Security agent must review auth token handling before merge" is a gate in the workflow. Not a policy document collecting dust in a shared drive.

What's Next

That CRO never asked me to build a mobile app. He told me about 20 analysts waiting 30 days for stale data compiled on a computer under someone's desk. The process found the product. Not the other way around.

Process defines the spec. But who executes it? Chapter 8 covers Orchestration, and the difference between "agent" and "subagent" isn't semantic. It's structural.

Developers who define success before prompting will outpace those who prompt and pray.