A measurement-focused talk for product leaders

Lead Time to Value.

Measuring the full pipeline in the agentic era.

DORA, Flow Time, and DX dashboards all stop at different clocks. This talk is about what's missing in the middle. Measuring it just got more important.

Greg Ceccarelli · Co-founder & CPO, SpecStory

Jake Levirne · Co-founder & CEO, SpecStory

Building withstoa.com · 2026

[ 02 / 17 ]The question

"How fast does Monday's decision become value our customer sees?"

What dashboards measure today

Deploy frequency
Cycle time on a Jira ticket
PR review latency
Story points per sprint

What product leaders are asking

How long from decision to live in front of users?
Where does that time actually go?
Are agents making it faster, or just making the typing faster?

Every dashboard you've bought almost answers this question. None of them measures the whole thing.

[ 03 / 17 ]Genealogy

Four "lead times". They don't measure the same thing.

Every framework is right about its own clock. None of them measures the whole pipeline. Each row below is a canonical definition; the bar shows the segment of the pipeline it actually covers.

Lean / ReinertsenDeveloping Products in Half the Time, 1991

opportunity → market

whole stream

Flow Framework / KerstenProject to Product, 2018 (Tasktop / Planview)

request → delivered

post-request

SEI vendorsJellyfish, LinearB, Faros, DX, Swarmia

ticket in-progress → merged

post-ticket

DORAForsgren et al., 2014; dora.dev

commit → production

post-commit

decision captured ticketed committed shipped realized

Reinertsen named it before software did: "fuzziness consumes time." Software adopted Lean and DevOps but largely skipped instrumenting the front end of its own value stream.

[ 04 / 17 ]Definition

Lead Time to Value, defined.

"The elapsed time from when work is requested until it delivers measurable value to customers or the business."

The umbrella · five stages, two halves

LTTV = Discovery + Development + Deployment + Adoption + Value Realisation

Definition and breakdown: Agile by Farr · Lead Time to Value: The Metric That Actually Matters (Oct 2025).

LEAD TIME TO SHIP · this deck

LEAD TIME TO REALIZE · for another deck

Discovery

Development

Deployment

Adoption

Value realisation

request code complete shipped engagement impact

Half 1 · the topic of this deck

Lead Time to Ship

Decision made → change running in production. Bounded by events that can be timestamped if we choose to instrument them.

stages: discovery · development · deployment

Half 2 · for another deck

Lead Time to Realize

In production → outcome signal (revenue, retention, NPS delta). Lagging, noisy, confounded. A different measurement problem.

stages: adoption · value realisation

"Every day a feature waits to deliver value is a day your organisation pays for work it hasn't yet benefited from." — Farr, 2025. The bottleneck moved into Lead Time to Ship, which is why this deck focuses there. Lead Time to Realize matters too, but it can't be the only clock you steer by.

[ 05 / 17 ]DORA's blind spot

DORA's clock starts at `git commit`. By design, not by mistake.

LEAD TIME TO SHIP

INTENT LEAD TIME

DORA · LEAD TIME FOR CHANGES

decision → first commit

commit → production

Decision "we'll do this"

First commit handoff event

Production deployed

Intent Lead Time · ILT

ILT = t(first commit) − t(decision captured)

The time from a product decision being made to the first commit implementing it. Pre-commit half of Lead Time to Ship.

DORA · Lead Time for Changes

LTFC = t(production) − t(commit)

"The elapsed time from a code change being committed to the same change running in production." — dora.dev

[ 06 / 17 ]Two collapses

Two collapses. One hold-out.

The gap that didn't matter to measure is now the gap that matters most.

Yesterday

2014 · DORA's world

Intent capture

~ weeks

Implementation

days to weeks

Deploy

hours to days

Today

2026 · Agents + mature CI/CD

Intent capture

~ weeks ↳ unchanged

Implementation

hours ↳ agents

Deploy

minutes ↳ CI/CD

DORA 2025: "AI accelerates software development, but that acceleration can expose weaknesses downstream… an increase in change volume leads to instability."
InfoQ / Agoda 2026: "Human authority is migrating upward in the abstraction stack — from writing code to defining and governing intent."

[ 07 / 17 ]Inside Lead Time to Ship

Where the time actually lives.

Lead Time to Ship = Intent Lead Time + DORA Lead Time for Changes. They dovetail at the commit. No overlap. No gap.

Reused from the Intent Lead Time guide. The fade across the ILT segments isn't decorative. It tracks how much each sub-latency still matters in an intent-driven workflow (slide 8).

[ 08 / 17 ]Sub-latencies

Naming the four sub-latencies of ILT.

Sub-metric	Clock	What slows it	Survives an agent flow?
Capture	decision made → recorded	"we'll document it later"	the only one
Sequencing	recorded → ticket	triage queue, sprint cadence	collapses to zero
Pickup	ticket → assigned	backlog prioritization	collapses to zero
Activation	assigned → first commit	spec ambiguity	collapses to zero

The claim

Tickets are a project-management technology designed for queueing work to humans. When the picker is an agent, the spec is the assignment. Sequencing, pickup, and activation are scaffolding from the assembly-line era. In a truly intent-driven workflow, three of the four collapse to zero. Only Capture remains.

[ 09 / 17 ]Bands

Where teams currently land.

First-principles estimates, not benchmark data. No tool currently measures this.

Reused from the Intent Lead Time guide. Caveat the guide already names: "These bands are first-principles estimates, not benchmarked data. No tool currently measures ILT; these numbers are starting points for the community to refute, refine, or replace." That caveat is the deck (slide 10). But you don't need a tool to find your band — hand-sample 3 features and you have a baseline (slide 14).

[ 10 / 17 ]The hard part

Why this is a new hard measurement problem.

DORA was easy because Git is an event log. ILT is hard because product decisions aren't.

01

No native event

Commits and deploys have timestamps. "We decided" doesn't.

02

No schema

Decisions live in Slack threads, transcripts, doc revisions, head-nods. None of those is structured data.

03

No agreed origin

Was the decision made when it was spoken, agreed, or captured? Three different timestamps.

04

Fuzzy causality

Mapping a deployed line of code back to the decision that birthed it requires intent tracing the toolchain wasn't built for.

why now

Pre-agents, this didn't matter. Implementation was the long pole and dominated the equation. Post-agents, implementation collapsed, and these four obstacles became the dominant terms. The measurement problem became newly urgent at the exact moment it became newly hard. Hard ≠ helpless: slide 14 names what to do this week without any of the missing primitives.

[ 11 / 17 ]Today's dashboards

What today's dashboards actually measure.

Every vendor positions its data sources as "the value stream." Most measure post-Jira time. The pre-Jira window remains uninstrumented.

Vendor / framework	Clock starts	Clock stops	Sees decision time?	Category
Planview / Tasktop · Flow Time	value-stream entry (first active state)	delivered to customer	partial	VSM platform
GitLab · Value Stream Analytics	issue created	merged	no	VSM (devops-side)
Jellyfish, LinearB, Faros, DX, Swarmia	ticket "in progress" or first commit	merged / deployed	no	SEI (engineering intelligence)
DORA	commit	production	no	delivery performance
What's missing	decision	→ captured → ticketed → committed	unmeasured	— no category yet

Forrester named Planview a Leader in the VSM Wave Q2 2025. Gartner now publishes a separate Software Engineering Intelligence (SEI) category. The bifurcation matters: even the VSM tools that span the longest window can't see the time before a request becomes a tracked work item.

[ 12 / 17 ]Counter-arguments

The pushback, steelmanned.

"Value is unmeasurable. Outcomes are lagging and noisy."

True for outcome value. Not true for ship value. The honest move is to call this what it is: a flow-time proxy from decision to running in production, not a value-realization metric. Ship value is measurable. Outcome value is for a different deck and a different time horizon.

"DORA is sufficient. We don't need more dashboards."

DORA's strength is its narrowness. Four numbers, every team can compute them, a decade of benchmarks. The fix isn't ten new dashboards. It's one companion metric on the side DORA was never built to see.

"VSM is vendor-speak. It's just Jira rollups in Lean clothing."

Often, yes. The Flow Framework is genuinely useful; most VSM dashboards are repackaged ALM. Argue the metric on its merits, not its label. Pre-Jira measurement is missing whatever vocabulary you choose.

"AI makes us faster. Why measure?"

METR (July 2025) RCT: experienced developers were ~19% slower with AI than without, while believing they were ~24% faster. Velocity intuition is unreliable. Measure or kid yourself.

[ 13 / 17 ]The frontier

The measurement that doesn't exist yet.

Slide 11 showed what today's tools do measure. This slide names what they'd have to start collecting to close the gap. Four pieces of missing schema:

01 primitive

Decision events

A first-class timestamp for "we decided X", with provenance: who decided, where, what was decided. Closest analog today: design-doc creation time, which conflates decision with documentation.

02 primitive

Capture latency

The delta between t(decision) and t(decision-recorded-as-machine-readable). Currently inferred by hand from meeting transcripts and doc revision history.

03 primitive

Intent → commit linkage

A traceable edge from a deployed line of code back to the decision that birthed it. Today's "branch name in PR title" convention is the loosest possible version of this.

04 primitive

Decision-graph topology

Decisions don't fire and forget. They amend, supersede, or branch each other. No mainstream PM tool models this as a graph.

This is the measurement frontier. Naming the missing schema is the precondition for anyone (vendor, framework author, or in-house team) to measure Lead Time to Value end-to-end.

[ 14 / 17 ]Start here Monday

You don't need the missing schema to start.

Three stages. None require a vendor. Each one is finished before the next becomes worth doing.

Stage 1 · this week

Crawl

Add a Decided on: field to your PRD / RFC template. Doc revision history becomes your decision log.
Hand-sample 3 features. Compute t(first commit) − t(first PRD draft). That's your baseline.

Cost: a meeting and an afternoon. Output: a number you didn't have last week.

Stage 2 · this quarter

Walk

Pilot one team. Greenfield beats brownfield. Don't roll out org-wide.
Encode the spec doc ID in the branch or PR title. Loose, but it's your intent → commit edge.
Reorganize the review gates (slide 15).

Cost: workflow change on one team. Output: Capture and linkage become observable.

Stage 3 · this year

Run

Treat the spec as the assignment. Sequencing, Pickup, Activation collapse to zero (slide 8).
Capture latency becomes your one ILT number. Pair with DORA at the commit boundary.
Watch the delegation envelope — agents went from 30-second nudges to multi-hour tasks in a year. Expansion is the leading indicator.

Cost: real workflow rework. Output: Lead Time to Ship as a single number.

None of these steps requires the four missing primitives from slide 13. They just require an honest baseline. Crawl with what you have; refine the schema once the workflow change makes the gap obvious.

[ 15 / 17 ]Reorganize the gates

Reorganize the gates, not just the dashboard.

ILT's lower three sub-latencies don't shrink because you measured them. They shrink because the workflow that produced them disappeared. Decide which gate is human and which is agent — the queue collapses on its own.

Delegate to agents

Mechanical, structural, repetitive

Unit test authoring & maintenance
Structural code review — style, patterns, type fit
Security scans, dependency hygiene
PR mechanics: drafts, descriptions, follow-ups

These collapse to minutes. They're the reason Implementation went from days to hours on slide 6.

Stays human · still load-bearing

Intent, judgment, irreversible decisions

Intent & spec authorship — the source of truth
Integration, smoke, and manual validation
Architectural decisions with cross-system blast radius
Release decisions: what ships, when, to whom

"You can no longer review code line by line." Human authority migrates upward — to the spec.

Rollout pattern · learned the hard way

Tools roll out org-wide. Workflows pilot one team at a time, ideally greenfield. Top-down workflow mandates without controlled pilots produce enablement burden, not adoption.

Practitioner field notes: gregce.github.io/ai-product-development/blackbaud.html · "Most of their time writing tickets and reviewing the output of agents. Very little time on the code itself."

[ 16 / 17 ]Appendix A · Definitions

Definitions cheat sheet.

Every metric named in this deck, in one place.

Lead Time to Value: "The elapsed time from when work is requested until it delivers measurable value to customers or the business" (Farr, 2025). Five stages (Discovery, Development, Deployment, Adoption, Value Realisation) that this deck collapses into two halves: Lead Time to Ship and Lead Time to Realize.
Lead Time to Ship: Decision made → change running in production. Bounded by events that can be timestamped if instrumented. The focus of this deck.
Lead Time to Realize: In production → outcome signal (revenue, retention, NPS delta). Lagging, noisy, confounded; useful as a separate clock.
Flow Time (Kersten / Tasktop): "From the point that work is accepted into the value stream — first active state — to when it's available to the customer." Closest commercial analog to Lead Time to Ship.
DORA Lead Time for Changes: "From a code change being committed to the repository to the same change running in production." Clock starts at git commit.
Cycle Time (Jira / SEI tools): Typically: ticket "in progress" → merged. A subset of Lead Time to Ship that excludes capture, sequencing, and pickup.
Intent Lead Time (ILT): t(first commit) − t(product decision captured). The pre-commit half of Lead Time to Ship. Dovetails with DORA at the commit boundary.
Capture latency: ILT sub-component: decision made → decision recorded as artifact. The only sub-latency that survives an intent-driven workflow.
Sequencing latency: ILT sub-component: artifact recorded → ticket created. Vestigial in agent-driven flows.
Pickup latency: ILT sub-component: ticket created → assigned. Vestigial in agent-driven flows.
Activation latency: ILT sub-component: assigned → first commit. Vestigial in agent-driven flows.
Fuzzy front end (Reinertsen / Cooper): Pre-software lineage for the same idea: the period from opportunity identification to project commitment. "Fuzziness consumes time."

[ 17 / 17 ]Appendix B · Sources

Receipts.

Every claim in the deck traces to one of these. Vendor sources flagged.

Lead Time to Value · canonical

Farr · Lead Time to Value: The Metric That Actually Matters (Oct 2025) — verbatim definition and 5-stage breakdown used on slide 4
Flow Time vs. Lead Time — Tasktop / Planview
Planview · Flow Framework
Mik Kersten, Project to Product (2018)
Forrester Wave: VSM Solutions, Q2 2025

DORA

Agents shifted the bottleneck (2025–2026)

Counter-evidence (non-vendor)

METR · Measuring the Impact of Early-2025 AI on Experienced OSS Developer Productivity

Fuzzy front end · pre-software

Smith & Reinertsen, Developing Products in Half the Time (1991)
Reinertsen, Principles of Product Development Flow
Black Swan Farming · Fuzzy Front End
MIT Sloan · Integrating the Fuzzy Front End
Robert G. Cooper · Stage-Gate framework