Cost-per-outcome accounting for an AI-run agency | AI Automation & Growth Insights

June 6, 2026

The number that lies to you

The first dashboard we built for our agents tracked cost per task. Tokens consumed per agent run, rolled up daily. It was easy to build and almost useless. It told us the enrichment agent was cheap and the proposal agent was expensive, which we already knew, and it told us nothing about whether any of that spend produced anything worth paying for.

An agent can run a thousand cheap tasks and ship zero outcomes. Another can run one expensive task that closes a deal. Cost per task ranks them backwards. So we rebuilt our accounting around a different unit. Not cost per task. Cost per shipped outcome. That single change is what made our AI agency unit economics legible, and it is the number we now run the business on.

Defining an outcome

The hard part is not the math. It is deciding what counts as an outcome. We settled on a rule. An outcome is a unit of value a client would actually pay for, not a step in the process of producing it.

A drafted email is not an outcome. A reply from a prospect is. A classified inbound is not an outcome. A positive reply that becomes a deal in the pipeline is. A generated proposal is not an outcome. A signed client is. The test we apply is simple. If we billed for this line item, would the client see value, or would they ask why they are paying for our internal plumbing.

This pushed us to define outcomes per division. In outreach, the ladder is reply, then positive reply, then deal, then closed-won. In content, it is a published article that ranks and a piece of inbound it drove. In each case the outcome sits at the end of a chain of many cheap tasks, and the cost we care about is the full chain divided by the outcomes it produced.

Defining the ladder, rather than a single outcome, turned out to matter. A single bottom-line number, cost per closed client, is too far downstream to act on quickly. Deals close on a lag of weeks, so if that were the only number you watched, you would be steering on data a month stale. The intermediate rungs, cost per reply and cost per positive reply, move within days and act as leading indicators. When cost per positive reply rises, you know the closed-client cost will follow, and you can intervene before the lagging number confirms it. The ladder gives you both the honest bottom line and an early warning system on the way down to it.

Attributing cost to the chain

To get cost per outcome you have to add up every token across the chain that led to it. That requires real tracking rather than estimation.

We stamp every agent invocation with the model used, the token count in and out, and the job it belonged to. Those events flow into a usage table. When a cold reply turns positive and creates a deal, we walk the chain backwards. The sourcing call that found the lead. The verification cost. The enrichment tokens. Every sequence step the writer drafted. The classifier run that tagged the reply. The handoff that opened the deal. We sum all of it, plus the per-lead third-party costs, Apollo credits and the email-verification fee, and that sum is the cost of that one positive reply.

The ratios fall out once the attribution is in place. Cost per reply. Cost per positive reply. Cost per deal. Cost per closed-won client. We track each as a rolling number per campaign and per channel, so we can see not just the average but where it is rising.

The attribution has one honest complication worth naming. Some costs are shared across many outcomes and some are wasted on outcomes that never happened. The 1,196 leads that did not reply still cost money to source, verify, and sequence. We do not pretend that spend vanishes. It loads onto the outcomes that did happen, which is exactly why cost per outcome is higher than naive per-task math suggests, and exactly why it is the honest number. A campaign that produces few outcomes carries its entire wasted spend on those few, and the per-outcome cost reflects that weight. That is the point. The metric refuses to let a low conversion rate hide behind cheap individual tasks.

A concrete example

Here is a real shape of the math from one outreach campaign. The model is illustrative but the structure is exactly what we run.

Over a month the campaign sourced 1,200 leads. Apollo credits and verification ran roughly 5 cents a lead in third-party cost, call it 60 dollars. The agents, sourcing, enrichment, sequence drafting, classification, ran a few dollars of tokens across all 1,200, call it another 40 dollars. Total spend, about 100 dollars.

That 1,200 produced 84 replies, of which 19 were positive, of which 4 became deals, of which 2 closed. Run the per-task math and you get a comfortable fraction of a cent per agent task, which tells you nothing. Run the per-outcome math and the picture sharpens. About 1.20 dollars per reply. About 5.30 dollars per positive reply. About 25 dollars per deal entered. About 50 dollars per closed client.

Now the number means something. Fifty dollars of fully-loaded agent and tooling cost to land a client is a unit economic you can build a business on. And the moment cost per positive reply climbs from 5 to 15, we know to look, even if cost per task held flat, because that is the ratio that ties to revenue.

What changed once we tracked it

Three decisions got easier the moment we had cost per outcome instead of cost per task.

We started killing campaigns on the right signal. A campaign with a low cost per task and a sky-high cost per deal is a campaign burning cheap tasks that never convert. The old dashboard called it efficient. The new one calls it what it is.

We started spending more, deliberately, where it paid. Enrichment costs more tokens per task than almost anything else, and the per-task view made us want to trim it. The per-outcome view showed that better enrichment lifted positive-reply rate enough to cut cost per deal. So we spent more per task on purpose, because it lowered the only cost that mattered.

And we got an honest answer to the question every AI-run agency has to answer for itself. Are the agents actually cheaper than the alternative, all in. Cost per outcome is the only frame in which that question has a real answer, because the alternative was never priced per task either. It was priced per client landed.

Where the accounting feeds back in

The point of measuring cost per outcome is not the dashboard. It is the loop it closes. Once the number is trustworthy, it becomes an input to the agents themselves, not just a report a human reads.

We feed the ratios back into how campaigns are run. A channel whose cost per positive reply drifts above a ceiling gets throttled automatically, and the spend reallocates toward channels performing under it. The sourcing agent's ICP filters get tuned against which segments produce the cheapest deals, not just the most replies, because a segment that replies often and never closes is expensive in the only frame that counts. The enrichment depth gets dialed per segment, deeper where it lowers cost per deal and lighter where it does not move the number.

This is the part that surprised us. Per-outcome accounting started as a way to understand the business and became a control signal the system uses to steer itself. The same usage events that let a human ask whether the agents are worth it let the agents ask, on every campaign, which spend is producing outcomes and which is producing only tasks. The metric stopped being a rear-view mirror and became part of the steering.

If your agent dashboard reports cost per task and calls it unit economics, it is measuring the cheapest thing instead of the thing that matters. Define an outcome a client would pay for, attribute the full chain of cost to it, and run the business on cost per reply, per deal, per client. That is the accounting that tells you the truth about an AI-run agency.

Go back to Blog

Last-click attribution tells you content does not work, right when it is starting to. Here is the lagged, assisted content attribution model we run instead.

Running SMS and email orchestration as separate programs trains people to ignore both. Here is the channel-priority, frequency-cap, and cross-suppression logic we run.

Our median lead response went from 6 hours to 90 seconds. Here is the n8n automated lead routing workflow: capture, enrich, score, assign, alert, with real timings.

Architecture Notes

Occasional insights on infrastructure, conversion systems, retention architecture, and AI deployment, shared when they’re worth reading.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.