Part 6 of the Agentic-Oriented Development series
The first time I transformed a team's identity, it started with a standard corporate reorg and ended with me transforming my own identity.
I was at a large financial services company and I got called into an office. I was told my DevOps team was being moved from Cloud Platform to Cybersecurity. My task was to transform the DevOps team into a DevSecOps team. Our customers remained the same, 4,000 application developers across the enterprise. The difference now was more scope, more responsibility. We needed to enable developers to deliver code safely to production, not just quickly.
Over the course of years, we focused on making CI/CD secure by default. This meant providing pipelines, tools, and process for developers to both deploy their code while passing all of their security audits. Adding SAST, DAST, IaC code scanning, and SCA was just the beginning. Tools alone do not create cultural change. I worked with our internal training to build an Application Security curriculum and require all developers to complete the course. Security was not an afterthought. It became a prerequisite.
Finally we needed to measure if we were making a difference. There were many metrics, but driving the average Critical vulnerability exposure from an average of 120 days to under 30 was the most memorable. Was it perfect? No, but it was a huge improvement at the time. We achieved results, not just by adding tools like security scanning, but by adding processes that were frictionless to the developer. We made the right behavior the easy behavior. These changes fundamentally impacted how 4,000 engineers integrated security into their DevOps pipelines, not because we gave them new tools, but because we made the process easy (good developers are inherently lazy).
The same transformation is happening now. Prompt engineering got you here. It will not get you there.
That DevSecOps transformation required layers of change that match the five skills I will share here: new workflows, new orchestration, new metrics, new testing approaches, and new ways of preserving what we learned. We documented what configurations worked, why certain thresholds were chosen, and what failures taught us. That institutional knowledge is what let new team members contribute immediately instead of re-learning every lesson. The shift from prompt engineering to agentic engineering requires the same.
The Agentic Maturity Ladder
Not everyone starts at the same place. Before covering the skills, you need to know where you are and where this series takes you.
I think of agentic capability as a ladder with four levels:

Level 1: Vibe Coder. No framework. Hopes the AI figures it out. This is the anti-pattern we introduced in Chapter 1. If you are reading this series, you have already moved past it.
Level 2: Prompt Engineer. Most developers start here when they pick up AI coding tools. They craft prompts carefully, work in single interactions, and iterate manually until something works.
Level 3: Context Engineer. This is where Part I of this series has taken you. You manage context windows as finite resources. You understand memory patterns like context pollution and context exhaustion (Chapter 2). You design context inheritance chains (Chapter 4). You know when to delegate to sub-agents to preserve context isolation.
Level 4: Agentic Engineer. Workflow thinking replaces prompt thinking. You orchestrate specialists instead of overloading one model. You build systematic evaluation loops. You use tests as specification, not verification. Part II takes you from Level 3 to Level 4.
What separates Level 3 from Level 4? Five skills.
How the Skills Build on Each Other
These five skills are not independent. They form a dependency chain:

Process Mastery is foundational. Without defining what success looks like, you cannot orchestrate agents toward a goal or write tests that verify outcomes. Start here. The other four skills depend on it.
Skill 1: Process Mastery
The first skill is the shift from prompt thinking to workflow thinking.
Prompt thinking focuses on single interactions. You craft the perfect prompt. You iterate until it works. You cross your fingers for next time. Every session feels like starting from scratch because, in a sense, it is.
Workflow thinking designs the process before you touch a prompt. You define inputs, outputs, handoffs, and validation points. You know what success looks like before you start.

What workflow thinking looks like in practice:
Consider this task, "Add user authentication to this app." A prompt thinker opens their AI tool and types: "Add user authentication with login and logout." They iterate until something works, then move on.
A workflow thinker starts differently. Before touching the AI, they write:
AUTHENTICATION WORKFLOW SPEC
Inputs:
- Existing app structure (React frontend, Node backend)
- User table schema
Success criteria:
1. Users can register with email/password
2. Users can log in and receive a session token
3. Protected routes reject unauthenticated requests
4. Sessions expire after 24 hours
Validation gates:
- [ ] Registration creates user in database
- [ ] Login returns valid JWT
- [ ] Protected endpoint returns 401 without token
- [ ] Token rejected after 24 hours
Handoffs:
- Frontend agent: Login/register forms
- Backend agent: Auth endpoints and middleware
- Security review: Token handling, password storageNow each agent knows exactly what to build. Each validation gate can become a test. The workflow thinker spent 10 minutes defining success; the prompt thinker will spend hours debugging ambiguity.
In my DevSecOps transformation, we did not just add security scanning. We created a process where every pull request triggers SAST, every dependency change triggers SCA, every deployment triggers DAST. The process ran without developer intervention. That is workflow thinking applied to security.
The same principle applies to agentic development. Spec-driven development requires you to define before you implement. The Governance Triad we introduced in Chapter 3 (PM Agent, Architect Agent, Team Lead Agent) is process governance. The PM validates value. The Architect validates architecture. The Team Lead validates implementation. Process is not wasted time or tokens. It is how you scale from prompting to orchestrating.
Security through process, not audits. In DevSecOps, we did not just review code for vulnerabilities after release. We built security into the development process. Agentic workflows need the same. Security is part of the orchestration, not something to be run at the end.
Skill 2: Orchestration
The second skill is moving from one model doing everything to coordinating specialists.
Prompt engineers talk to one model with one prompt. They try to make the model do everything. This is the Swiss Army Agent anti-pattern we identified in Chapter 5 (overloading one agent with too many responsibilities until it performs none of them well). You end up with an overloaded context window, conflicting optimization goals, and degraded quality on all tasks.
Agentic engineers think like conductors, not performers. They coordinate specialists. They know when to delegate and when to handle something directly. The four factors from Chapter 5 tell you when to specialize: distinct expertise needed, different context required, conflicting optimization goals, or high-frequency tasks.

From my DevSecOps experience, we did not ask one tool to do everything. SAST handled static analysis. SCA handled dependencies. DAST handled runtime testing. Each specialized. Each reported to a unified dashboard. Orchestration of the best tool for the job.
I have found that orchestration beats gigantic all-encompassing prompts every time. Workflow orchestration is cheaper, faster, and more reliable than pure AI autonomy. At the end of the day, nobody cares if it is an AI agent or a simple script, as long as it works.
The Agent Composition Model
Understanding orchestration requires understanding what you are orchestrating. An agent system is not a single thing. It is a composition of three distinct layers:

Instructions define what the agent should do. System prompts, task definitions, behavioral rules. This is the layer most prompt engineers focus on.
Guardrails determine what the agent is allowed to do. Pre-action checks, permission gates, post-action validation. Here is the critical insight: guardrails are executable code, not prompts. You cannot prompt your way to security. You enforce it through code that runs before and after every agent action.
Tools define what the agent can do. Function calling, API integrations, external systems. This is the capability layer. When designing tools, the VOICE principles from Chapter 5 still apply: Visible, Outcome-oriented, Isolated, Composable, Error-aware.
The execution boundary separates reasoning from side effects. Everything above it is reversible. Everything below it affects the real world.
Orchestration design IS security design. The orchestrator is your security control plane. Each agent handoff is a trust boundary. Each agent gets scoped permissions based on its role. Planner agents do not need execution rights. Executor agents do not need access to secrets. Reviewer agents do not need write access. Context scoping matters too: sensitive data should not flow to agents that do not need it.
In prompt engineering, security is reactive. You review what the model produced. In agentic engineering, security must be proactive. You design the boundaries before agents run. The composition model makes this concrete. Instructions tell the agent what it should do. Guardrails enforce what it is allowed to do. Tools define what it can do. Security lives in the guardrails layer.

In DevSecOps, we learned that security is not a phase. It is a property of every phase. The orchestrator determines what each agent can see, do, and affect. Get that wrong, and no amount of post-hoc review will save you.
Skill 3: Evals and Loops
The third skill is moving from subjective "Did it work?" to systematic measurement.
Prompt engineers evaluate results subjectively. They run something, eyeball the output, decide if it is good enough. No memory of what failed. No data on patterns.
Agentic engineers build measurement into the system. They evaluate quality, debug issues, change behavior, and repeat. This is the AI Evals Flywheel:

From my DevSecOps work, we tracked one metric religiously. Average Critical vulnerability exposure. We started at 120 days. By the end of the year, we got it under 30. We did not hope it would improve. We measured it, debugged the bottlenecks, changed the process, and measured again. That is the flywheel.
What an eval actually looks like:
Evals do not have to be complex. Start simple. Here is a concrete example for an agent that generates API endpoints:
EVAL: API Generation Quality
Metric: Percentage of generated endpoints that pass validation
Test cases (run after each generation):
1. Does the endpoint return valid JSON?
2. Does error handling return appropriate status codes?
3. Are required fields validated before processing?
4. Does the endpoint respect rate limiting?
Logging (capture for each run):
- Timestamp
- Prompt used
- Pass/fail for each test case
- Time to generate
- Audit trail (in regulated industries, every agent action needs
an immutable audit trail, not just a log file)
Weekly review:
- Which test cases fail most often? (Pattern detection)
- What prompts correlate with failures? (Root cause)
- What changed in prompts that improved results? (Learning)After two weeks, you might discover that 60% of failures come from missing input validation. That pattern is invisible without systematic tracking. Now you know to add "validate all input fields" to your prompt template, or to create a dedicated validation agent.
Three levels of evaluation structure this work:
Level 1: Unit tests and assertions. Fast, cheap, runs on every change.
Level 2: Model and human evaluation. Requires logging traces. Use binary ratings (good/bad) over complex scales. One case study found three issues accounted for 60% of all failures. You cannot find that pattern without systematic data.
Level 3: A/B testing. For mature products in production. Standard validation after significant changes.
The bottom-up approach matters. Do not start with generic metrics. Examine actual failures. Let metrics emerge from real problems.
Skill 4: Test-Driven Development
The fourth skill is moving from manual verification to tests as specification.
Prompt engineers verify manually. They generate code, check if it works, copy-paste into a test file, run it once, ship it. Every verification is a one-time event that leaves no trace.
Agentic engineers write tests first. Tests define the contract. Agents generate code against your tests as specification, not based on what looks right. The verification is automatic, repeatable, and cumulative.

Why does TDD matter more with agents? Three reasons.
First, agents generate code faster than you can review it. Without automated tests, you are trusting without verifying. Tests are your safety net.
Second, tests become your specification language. When you write a test before prompting, you are telling the agent exactly what success looks like. The test is unambiguous in a way that natural language prompts never are.
Third, tests accumulate institutional knowledge. Every edge case you catch becomes a permanent part of your test suite. The system learns even when context windows reset.
In DevSecOps, every pull request triggered automated security scans. If SAST found a vulnerability, the build failed. No exceptions. The same principle applies to agent-generated code. If tests fail, the code does not merge. Agents do not get special treatment.
Security tests as specification, not policy documents. "Agent must not access files outside project directory" becomes a test, not a hope. The shift is from "trust the agent followed the rules" to "verify the agent followed the rules." This is the same shift we made in DevSecOps when we stopped relying on developer training and started enforcing security through automated gates.
Skill 5: Knowledge Engineering
The fifth skill is moving from code and forget to code and persist.
Prompt engineers produce code. When the session ends, the context vanishes. The next session starts blind. The agent does not know what was built yesterday, what decisions were made, or what traps to avoid. Every new conversation is a new hire with no onboarding.
Agentic engineers produce code AND knowledge artifacts. They capture not just what was built, but why it was built that way. They convert volatile session context into persistent, AI-readable knowledge that future agents and humans can consume.
I think of it as two distinct things. Context is what the agent knows right now. It is volatile, session-scoped, and dies when the window resets. Knowledge is what the codebase knows permanently. It is persistent, repo-scoped, and survives across sessions, agents, and humans. The gap between them is where institutional understanding goes to die.
The OOP parallel is object serialization. When you serialize an object, you persist its state beyond the process lifetime. When you deserialize it, you rehydrate that state in a new process. Knowledge Engineering does the same thing for agent understanding. You serialize decisions, rationale, and architectural context into repo artifacts (specs, ADRs, CLAUDE.md files). The next agent session deserializes those artifacts back into working context. I call this The Knowledge Loop: Build, Serialize, Persist, Deserialize, Build. It is the feedback mechanism that makes the ADLC circular rather than linear.
Skip this step and you pay the context reconstruction tax. That is the cost of every new session spending its first 15 minutes figuring out what already exists, why certain patterns were chosen, and what constraints apply. Multiply that across a team of developers, each running multiple agent sessions per day, and you are burning hours reconstructing understanding that should have been written down once.
From my DevSecOps transformation, we did not just build pipelines and run scans. We documented what configurations worked, why certain thresholds were chosen, and what failures taught us. When a new engineer joined the team, they did not have to rediscover that our SAST thresholds were set lower for legacy apps because of technical debt, or that certain SCA exceptions existed because of vendor lock-in. That knowledge was written down where people could find it. The same applies to agents. Without knowledge persistence, every new session is a new hire with no onboarding materials.
The Identity Shift
These five skills reinforce each other. Process enables orchestration by defining what each agent should do. Orchestration generates data by producing traceable outputs. Evals improve process by revealing what works. Tests validate output by catching what does not. Knowledge closes the loop by persisting what was learned for the next cycle.
The best engineering leaders I have worked with share a common trait. A culture of discipline. Disciplined people, disciplined thought, disciplined action. The same framework applies to agentic engineering. Disciplined process. Choose the right approach before you start. Disciplined orchestration. Coordinate specialists with clear boundaries. Disciplined evaluation. Measure, debug, improve, repeat.
Common transition mistakes:
- Jumping to orchestration before defining process. This is the most common mistake. You spin up multiple agents with no defined success criteria, and they coordinate toward nothing. Always start with Process Mastery.
- Starting with tools before process puts the cart before the horse
- Orchestrating without measuring is flying blind
- Evaluating without changing is analysis paralysis
- Testing after instead of before is reactive, not proactive
- Building without persisting knowledge means every session starts from zero
In DevSecOps, the hardest lesson was that security could not be bolted on. It had to be designed in. The same is true for agentic systems. Security is not a sixth skill. It is a thread through all five.
Skill Proficiency Markers
How do you know when you have internalized each skill? Here are the markers I look for:

You do not need to master all five simultaneously. Process Mastery comes first. Once you consistently define success before starting, the other four skills accelerate naturally.
What's Next
Prompt engineering is not going away. You will still need to craft good prompts. But crafting prompts without process, orchestration, evals, tests, and knowledge persistence is like writing code without version control, CI/CD, or deployment pipelines. You can do it. You just cannot scale it.
Part I gave you the principles: encapsulation, abstraction, inheritance, polymorphism. Part II gives you the practices. These five skills are the bridge.
Chapter 7 goes deep into Process Mastery: how to build your first spec-driven workflow in a single sprint. We will explore why specifications that agents can execute beat documentation that humans interpret. And Chapter 11 takes Knowledge Engineering further, introducing the full framework for converting volatile context into persistent, AI-readable knowledge that compounds across sessions.
The DevSecOps transformation took a year because we started late. Organizations that wait on agentic engineering will spend twice as long catching up. Part II gives you the tools to start now.