Cost per outcome: measuring the real economics of AI workflows

2 points by deborahjacob 82 days ago | 3 comments

Hi HN, I’m the technical founder of botanu (https://www.botanu.ai ).

I started building this after repeatedly running into the same problem on AI teams: we could see total LLM spend, but we couldn’t answer a simple question:

“What did one successful outcome actually cost?”

In real systems, a single business event often requires multiple attempts before it succeeds — retries, fallbacks, tool calls, escalations, async workers, etc. Most tooling measures individual model calls or sometimes a single workflow run, which hides the real cost.

The unit that matters to the business is the outcome, not the individual call.

The approach I’m exploring in botanu:

An event_id represents the business intent (e.g., resolve support ticket, generate report)

Each attempt is a run with its own run_id

All runs share the same event_id

A final outcome is emitted for the event (success / failure / partial)

Cost per outcome = sum of all runs for that event, including failed attempts

Run context propagates across services using W3C Baggage (OpenTelemetry) so the event can be traced across distributed systems.

The idea is to make AI economics measurable at the outcome level, not just tokens or model calls.

On the engineering side, teams can use this to:

experiment with models and workflows in a dev playground

compare architectures and retries

optimize the cost of producing a successful outcome

On the business side, it helps teams understand:

unit economics of AI features

cost per customer action

how to support outcome-based pricing models.

I’m curious how others here are thinking about AI unit economics and measuring outcomes in production systems.

Happy to answer technical questions or get critical feedback.

Deborah deborah [at] botanu dot ai

alexbuiko 82 days ago |

Focusing on 'Cost per Outcome' rather than 'Cost per Token' is a vital shift for AI reliability. At SDAG [https://github.com/alexbuiko-sketch/SDAG-Standard], we’ve been looking at the same problem from the opposite end of the stack: the hardware-inference interface.

In a distributed system using OpenTelemetry, a 'successful outcome' often hides a lot of silent technical debt. If an event requires 4 retries, it’s not just a billing issue—it’s a signal of high routing entropy. We’ve found that failed attempts or long CoT (Chain of Thought) loops often correlate with specific hardware stress patterns and memory controller 'redlining.'

Integrating SDAG signals into something like your event_id tracking could be powerful. It would allow teams to see not just how much a success cost, but whether the 'path to success' was physically efficient or if it was stressing the cluster due to poor routing logic. Have you considered adding hardware-level telemetry (like jitter or entropy metrics) to your outcome tracking to predict which 'runs' are likely to fail before they even finish?"

deborahjacob 81 days ago | |

That's a great idea. I am doing only application-level tracking but I agree hardware-level telemetry would be super helpful. Would love to learn more about how you think about it. Here's my email : deborah [at] botanu dot ai

alexbuiko 81 days ago | | |

sent e-mail to you