Lead Time to Value.
Measuring the full pipeline in the agentic era.
DORA, Flow Time, and DX dashboards all stop at different clocks. This talk is about what's missing in the middle. Measuring it just got more important.
"How fast does Monday's decision become value our customer sees?"
- Deploy frequency
- Cycle time on a Jira ticket
- PR review latency
- Story points per sprint
- How long from decision to live in front of users?
- Where does that time actually go?
- Are agents making it faster, or just making the typing faster?
Every dashboard you've bought almost answers this question. None of them measures the whole thing.
Four "lead times". They don't measure the same thing.
Every framework is right about its own clock. None of them measures the whole pipeline. Each row below is a canonical definition; the bar shows the segment of the pipeline it actually covers.
Reinertsen named it before software did: "fuzziness consumes time." Software adopted Lean and DevOps but largely skipped instrumenting the front end of its own value stream.
Lead Time to Value, defined.
"The elapsed time from when work is requested until it delivers measurable value to customers or the business."
Lead Time to Ship
Decision made → change running in production. Bounded by events that can be timestamped if we choose to instrument them.
Lead Time to Realize
In production → outcome signal (revenue, retention, NPS delta). Lagging, noisy, confounded. A different measurement problem.
"Every day a feature waits to deliver value is a day your organisation pays for work it hasn't yet benefited from." — Farr, 2025. The bottleneck moved into Lead Time to Ship, which is why this deck focuses there. Lead Time to Realize matters too, but it can't be the only clock you steer by.
DORA's clock starts at git commit. By design, not by mistake.
The time from a product decision being made to the first commit implementing it. Pre-commit half of Lead Time to Ship.
"The elapsed time from a code change being committed to the same change running in production." — dora.dev
Two collapses. One hold-out.
The gap that didn't matter to measure is now the gap that matters most.
Yesterday
Today
- DORA 2025: "AI accelerates software development, but that acceleration can expose weaknesses downstream… an increase in change volume leads to instability."
- InfoQ / Agoda 2026: "Human authority is migrating upward in the abstraction stack — from writing code to defining and governing intent."
Where the time actually lives.
Lead Time to Ship = Intent Lead Time + DORA Lead Time for Changes. They dovetail at the commit. No overlap. No gap.
Reused from the Intent Lead Time guide. The fade across the ILT segments isn't decorative. It tracks how much each sub-latency still matters in an intent-driven workflow (slide 8).
Naming the four sub-latencies of ILT.
| Sub-metric | Clock | What slows it | Survives an agent flow? | |
|---|---|---|---|---|
| Capture | decision made → recorded | "we'll document it later" | the only one | |
| Sequencing | recorded → ticket | triage queue, sprint cadence | collapses to zero | |
| Pickup | ticket → assigned | backlog prioritization | collapses to zero | |
| Activation | assigned → first commit | spec ambiguity | collapses to zero |
Tickets are a project-management technology designed for queueing work to humans. When the picker is an agent, the spec is the assignment. Sequencing, pickup, and activation are scaffolding from the assembly-line era. In a truly intent-driven workflow, three of the four collapse to zero. Only Capture remains.
Where teams currently land.
First-principles estimates, not benchmark data. No tool currently measures this.
Reused from the Intent Lead Time guide. Caveat the guide already names: "These bands are first-principles estimates, not benchmarked data. No tool currently measures ILT; these numbers are starting points for the community to refute, refine, or replace." That caveat is the deck (slide 10). But you don't need a tool to find your band — hand-sample 3 features and you have a baseline (slide 14).
Why this is a new hard measurement problem.
DORA was easy because Git is an event log. ILT is hard because product decisions aren't.
No native event
Commits and deploys have timestamps. "We decided" doesn't.
No schema
Decisions live in Slack threads, transcripts, doc revisions, head-nods. None of those is structured data.
No agreed origin
Was the decision made when it was spoken, agreed, or captured? Three different timestamps.
Fuzzy causality
Mapping a deployed line of code back to the decision that birthed it requires intent tracing the toolchain wasn't built for.
Pre-agents, this didn't matter. Implementation was the long pole and dominated the equation. Post-agents, implementation collapsed, and these four obstacles became the dominant terms. The measurement problem became newly urgent at the exact moment it became newly hard. Hard ≠ helpless: slide 14 names what to do this week without any of the missing primitives.
What today's dashboards actually measure.
Every vendor positions its data sources as "the value stream." Most measure post-Jira time. The pre-Jira window remains uninstrumented.
| Vendor / framework | Clock starts | Clock stops | Sees decision time? | Category |
|---|---|---|---|---|
| Planview / Tasktop · Flow Time | value-stream entry (first active state) | delivered to customer | partial | VSM platform |
| GitLab · Value Stream Analytics | issue created | merged | no | VSM (devops-side) |
| Jellyfish, LinearB, Faros, DX, Swarmia | ticket "in progress" or first commit | merged / deployed | no | SEI (engineering intelligence) |
| DORA | commit | production | no | delivery performance |
| What's missing | decision | → captured → ticketed → committed | unmeasured | — no category yet |
Forrester named Planview a Leader in the VSM Wave Q2 2025. Gartner now publishes a separate Software Engineering Intelligence (SEI) category. The bifurcation matters: even the VSM tools that span the longest window can't see the time before a request becomes a tracked work item.
The pushback, steelmanned.
"Value is unmeasurable. Outcomes are lagging and noisy."
True for outcome value. Not true for ship value. The honest move is to call this what it is: a flow-time proxy from decision to running in production, not a value-realization metric. Ship value is measurable. Outcome value is for a different deck and a different time horizon.
"DORA is sufficient. We don't need more dashboards."
DORA's strength is its narrowness. Four numbers, every team can compute them, a decade of benchmarks. The fix isn't ten new dashboards. It's one companion metric on the side DORA was never built to see.
"VSM is vendor-speak. It's just Jira rollups in Lean clothing."
Often, yes. The Flow Framework is genuinely useful; most VSM dashboards are repackaged ALM. Argue the metric on its merits, not its label. Pre-Jira measurement is missing whatever vocabulary you choose.
"AI makes us faster. Why measure?"
METR (July 2025) RCT: experienced developers were ~19% slower with AI than without, while believing they were ~24% faster. Velocity intuition is unreliable. Measure or kid yourself.
The measurement that doesn't exist yet.
Slide 11 showed what today's tools do measure. This slide names what they'd have to start collecting to close the gap. Four pieces of missing schema:
Decision events
A first-class timestamp for "we decided X", with provenance: who decided, where, what was decided. Closest analog today: design-doc creation time, which conflates decision with documentation.
Capture latency
The delta between t(decision) and t(decision-recorded-as-machine-readable). Currently inferred by hand from meeting transcripts and doc revision history.
Intent → commit linkage
A traceable edge from a deployed line of code back to the decision that birthed it. Today's "branch name in PR title" convention is the loosest possible version of this.
Decision-graph topology
Decisions don't fire and forget. They amend, supersede, or branch each other. No mainstream PM tool models this as a graph.
This is the measurement frontier. Naming the missing schema is the precondition for anyone (vendor, framework author, or in-house team) to measure Lead Time to Value end-to-end.
You don't need the missing schema to start.
Three stages. None require a vendor. Each one is finished before the next becomes worth doing.
Crawl
- Add a Decided on: field to your PRD / RFC template. Doc revision history becomes your decision log.
- Hand-sample 3 features. Compute
t(first commit) − t(first PRD draft). That's your baseline.
Cost: a meeting and an afternoon. Output: a number you didn't have last week.
Walk
- Pilot one team. Greenfield beats brownfield. Don't roll out org-wide.
- Encode the spec doc ID in the branch or PR title. Loose, but it's your intent → commit edge.
- Reorganize the review gates (slide 15).
Cost: workflow change on one team. Output: Capture and linkage become observable.
Run
- Treat the spec as the assignment. Sequencing, Pickup, Activation collapse to zero (slide 8).
- Capture latency becomes your one ILT number. Pair with DORA at the commit boundary.
- Watch the delegation envelope — agents went from 30-second nudges to multi-hour tasks in a year. Expansion is the leading indicator.
Cost: real workflow rework. Output: Lead Time to Ship as a single number.
None of these steps requires the four missing primitives from slide 13. They just require an honest baseline. Crawl with what you have; refine the schema once the workflow change makes the gap obvious.
Reorganize the gates, not just the dashboard.
ILT's lower three sub-latencies don't shrink because you measured them. They shrink because the workflow that produced them disappeared. Decide which gate is human and which is agent — the queue collapses on its own.
Mechanical, structural, repetitive
- Unit test authoring & maintenance
- Structural code review — style, patterns, type fit
- Security scans, dependency hygiene
- PR mechanics: drafts, descriptions, follow-ups
These collapse to minutes. They're the reason Implementation went from days to hours on slide 6.
Intent, judgment, irreversible decisions
- Intent & spec authorship — the source of truth
- Integration, smoke, and manual validation
- Architectural decisions with cross-system blast radius
- Release decisions: what ships, when, to whom
"You can no longer review code line by line." Human authority migrates upward — to the spec.
Tools roll out org-wide. Workflows pilot one team at a time, ideally greenfield. Top-down workflow mandates without controlled pilots produce enablement burden, not adoption.
Practitioner field notes: gregce.github.io/ai-product-development/blackbaud.html · "Most of their time writing tickets and reviewing the output of agents. Very little time on the code itself."
Definitions cheat sheet.
Every metric named in this deck, in one place.
- Lead Time to Value
- "The elapsed time from when work is requested until it delivers measurable value to customers or the business" (Farr, 2025). Five stages (Discovery, Development, Deployment, Adoption, Value Realisation) that this deck collapses into two halves: Lead Time to Ship and Lead Time to Realize.
- Lead Time to Ship
- Decision made → change running in production. Bounded by events that can be timestamped if instrumented. The focus of this deck.
- Lead Time to Realize
- In production → outcome signal (revenue, retention, NPS delta). Lagging, noisy, confounded; useful as a separate clock.
- Flow Time
(Kersten / Tasktop) - "From the point that work is accepted into the value stream — first active state — to when it's available to the customer." Closest commercial analog to Lead Time to Ship.
- DORA Lead Time for Changes
- "From a code change being committed to the repository to the same change running in production." Clock starts at
git commit. - Cycle Time
(Jira / SEI tools) - Typically: ticket "in progress" → merged. A subset of Lead Time to Ship that excludes capture, sequencing, and pickup.
- Intent Lead Time (ILT)
t(first commit) − t(product decision captured). The pre-commit half of Lead Time to Ship. Dovetails with DORA at the commit boundary.- Capture latency
- ILT sub-component: decision made → decision recorded as artifact. The only sub-latency that survives an intent-driven workflow.
- Sequencing latency
- ILT sub-component: artifact recorded → ticket created. Vestigial in agent-driven flows.
- Pickup latency
- ILT sub-component: ticket created → assigned. Vestigial in agent-driven flows.
- Activation latency
- ILT sub-component: assigned → first commit. Vestigial in agent-driven flows.
- Fuzzy front end
(Reinertsen / Cooper) - Pre-software lineage for the same idea: the period from opportunity identification to project commitment. "Fuzziness consumes time."
Receipts.
Every claim in the deck traces to one of these. Vendor sources flagged.
Lead Time to Value · canonical
- Farr · Lead Time to Value: The Metric That Actually Matters (Oct 2025) — verbatim definition and 5-stage breakdown used on slide 4
- Flow Time vs. Lead Time — Tasktop / Planview
- Planview · Flow Framework
- Mik Kersten, Project to Product (2018)
- Forrester Wave: VSM Solutions, Q2 2025
DORA
Agents shifted the bottleneck (2025–2026)
Counter-evidence (non-vendor)
Fuzzy front end · pre-software
- Smith & Reinertsen, Developing Products in Half the Time (1991)
- Reinertsen, Principles of Product Development Flow
- Black Swan Farming · Fuzzy Front End
- MIT Sloan · Integrating the Fuzzy Front End
- Robert G. Cooper · Stage-Gate framework