Token Economics: Four Ways to Scale Enabling AI Experiences in Customer Communications. 

by | May 12, 2026

How regulated institutions will manage the cost of every AI interaction they put in front of a customer.

AI isn’t free at scale. That’s the very first thing every organization learns during customer-facing AI implementations. When rollouts begin, it’s clear that the cost of answering is not a constant.

Two customer-facing AI experiences with identical behavior can have unit costs that differ by a factor of 10. The customer will not know which one they are on until the institution quietly retires the more expensive one.

This is what token economics means: the compounding effect of every decision behind every customer interaction. Experiences that scale token economics own those decisions. Experiences that don’t will accumulate costs faster than customers.

What Happens Behind One Customer AI Moment? 

From the customer’s perspective, an AI moment appears as a single event. A question goes in, and an answer comes out. It’s seamless.  

What customers don’t see, however, is what goes on inside the system: six cost events run every time. 

One customer. One AI moment. Six cost events.

What actually happens behind a single AI experience, and where the unit economics get decided

Customer Experience

A customer opens an AI surface and asks a question. A response appears in seconds. From the customer's perspective, this is a single interaction. From the cost ledger, it is six events.

1 Context assembled
User history, system prompt, retrieved policy. All packed in.
~62% of cost
2 Request routed
Routing layer picks the right model for this question.
~3% of cost
3 Model inferred
Tokens consumed by the model. The visible cost.
~21% of cost
4 Output shaped
Response structured to fit the surface and channel.
~6% of cost
5 Trace recorded
Inputs, outputs, latency, evaluation captured.
~5% of cost
6 Lineage stored
Governance writes the audit trail for the moment.
~3% of cost
The Pattern

Model inference is roughly one fifth of the cost. The other four fifths are choices made by the institution building the experience.

Context, routing, output shape, trace, lineage. Every one of them gets touched on every customer interaction. Every one of them compounds at scale.


Experiences that scale token economics own these six events.

They design context once. They route deliberately. They shape output by design. They trace by default. They govern in the pipeline.
Experiences that do not, watch each event grow unbounded as customer volume rises.

Cost shares are illustrative averages across observed implementations. Vary by surface, domain, and deployment maturity.

Figure 1. The six cost events behind a single customer interaction with an AI surface. The customer sees one. The system runs six.

Three things matter about the above image.

  1. Inference is roughly 21% of the unit cost. The model call is the visible cost and is the smaller portion. The other 79% sits in context, routing, output shape, trace, and lineage.
  2. The highest cost is the one that the organization controls the most. Context assembly averages 62% of unit cost across observed deployments. Each prompt template, retrieval strategy, and grounding choice impacts that number.
  3. The cost curve is not the value curve. Customers see latency and quality.

What Happens Next: Two Trajectories

Two AI experience trajectories. One scales. One does not.

Modeled unit cost per interaction across customer growth, based on observed scaling patterns

Unit cost per interaction (indexed, day-one = 1.0x)
4x 3x 2x 1x 1k 10k 100k 1M 10M+ Monthly active customers on the AI experience 3.1x unit cost gap at scale UNMANAGED EXPERIENCE unit cost rises 3.6x as users scale 100x MANAGED EXPERIENCE unit cost flattens at 1.15x as users scale 100x
Read this chart

Unmanaged experiences get more expensive per user as they gain users. The cost curve diverges from the value curve.

Managed experiences converge. Unit cost flattens. Every new feature lands on a cost base that has already been amortized.


Curves are illustrative based on observed scaling patterns. Specific values vary by surface, domain, and architecture.

Figure 2. Unit cost across customer growth. Unmanaged experiences rise 3.6x. Managed experiences flatten at 1.15x. 

By the time both experiences hit 10M monthly active customers, the gap is roughly three times in unit cost. Here’s where we see different economics, conversations at the board table, and roadmaps for years afterward.

In a regulated environment, the unmanaged trajectory also drags compliance overhead, audit findings, and rework cycles. The trajectories diverge on more than cost.

The unit cost curve is an experience capability, not a financial artifact. The shape of the curve is decided by the team building the experience.

Where Does the Cost Actually Go?

Where the cost goes inside a single AI moment

Six components, ranked by share of unit cost and by your team's degree of control

Context assembly
component
Share of unit cost
62%
Your degree of control
92%
Model inference
component
Share of unit cost
21%
Your degree of control
38%
Response shaping
component
Share of unit cost
6%
Your degree of control
84%
Trace and evaluation
component
Share of unit cost
5%
Your degree of control
70%
Routing layer
component
Share of unit cost
3%
Your degree of control
88%
Governance and lineage
component
Share of unit cost
3%
Your degree of control
78%
The Insight

The largest cost component is the one you control most. The smallest you control least is the one most teams spend energy debating.

Scaling token economics means inverting the energy. High-cost, high-control components first.

Figure 3. Cost share and degree of control across the six components. The largest components are also the ones the institution controls most.

The context assembly is 62% and 92% under the institution’s control. The model inference accounts for 21% of the cost, and the control accounts for 38%.

The energy ratio for most teams is inverted. Scaling token economics means flipping it.

Four Strategies that Scale Token Economics

Four strategies move unit cost on a customer-facing AI surface. Each has a deep metric tied to it, whether the surface is a statement explainer, an onboarding assistant, a disclosure walkthrough, or any other customer communication that has become an AI experience.

Strategy One: Context Discipline

Context discipline is the largest leverage in any AI experience, and the first place a regulated institution should look. It begins with setting a token ceiling for each AI surface and enforcing it in the experience itself. The customer history, brand voice, regulatory guardrails, and retrieved policy do not need to be shipped fresh on every interaction. The observed unit cost reduction once a surface has a budgeted context window is 30%-55%.

On top of that, caching shared context compounds. System prompts, retrieved knowledge, response templates, and evaluator outputs all benefit. All that needs to happen is a ‘compute once’ and reuse across every customer, every session, every interaction. Observed reduction: 25%-40%. Cache hit rates above 70% become normal in mature deployments and continue to rise as the corpus stabilizes.

What happens next is:

  • Retrieval compression closes the loop
  • Retrieve fewer chunks
  • Re-rank harder
  • Send the model only what changes the answer

On retrieval-heavy surfaces typical of regulated communications, where policies, disclosures, and product documentation all compete for context, the observed reduction is 15%-30%.

Together, the three context tactics routinely halve the cost of the largest component on the chart.

Strategy Two: Model Routing

Different questions deserve different models. Routing is a layer inside the experience, not a vendor selection. For example, a simple lookup goes to a small, fast model. A reasoning-heavy explanation goes to a mid-tier model. A genuinely hard case escalates to the most capable model available.

The routing policy lives inside the institution and is expressed as a function of question type, customer segment, surface, and required latency. Observed inference cost reduction at constant quality, measured against a fixed evaluation suite: 40%-70%.

Curious to learn more about AI in Customer Communications Management? Contact us now to book a personalized demo.

Strategy Three: Output Shape

The structure of a customer response is a design decision, not a model default. Templates and structured outputs are faster to generate, easier to validate against the underlying source, and use fewer tokens than free-form prose, including:

  • JSON schemas
  • Response template
  • Constrained generation
  • Observed unit cost reduction against free-form baselines: 20%-35%.

In a regulated context, structured output also makes the response easier to inspect, which is the entire point of governance.

 

Strategy Four: Substitutable Architecture

The model market resets every 9 to 14 months

Substitutable experiences adopt each new frontier. Locked-in experiences do not.

Effective capability per dollar
Today cycle 1 cycle 2 cycle 3 cycle 4 cycle 5 Time horizon capability gap widens 8 to 12x over four cycles Frontier capability resets every cycle Substitutable adopts each new frontier Locked-in grows on incumbent timeline
Why this is an architectural property

An experience wired to a single model on day one cannot adopt the better model when it arrives. An experience built behind a substitutable layer adopts each release.

Over four cycles, the capability-per-dollar gap widens by roughly an order of magnitude. The team that loses access to the frontier ships an experience that is structurally weaker than what users now expect.

Figure 4. The model market resets every 9 to 14 months. The capability-per-dollar gap between substitutable and locked-in experiences widens 8 to 12x over four cycles.

Substitutability, the ability to replace system components with minimal cost or disruption, is the strategy that protects the cost line on horizons longer than any single planning cycle. It’s the strategy that decides whether a regulated institution can keep pace with the frontier or watch as it pulls away.

Model layer abstraction places every customer-facing surface behind a stable internal API. Models are swapped out when better ones appear. It’s the long-term capability-per-dollar gap between an institution built for substitutability and one that didn’t run 5 -10 times over four release cycles.

Eval-driven swaps make the property operational. Every model candidate runs against the institution’s evaluation suite before it ships into a customer surface. Swap decisions become data-driven, with regression tests against accuracy, tone, compliance markers, and unit cost.

Prompt portability finishes the work. Prompts written to work across model families, not to one model’s quirks, mean time-to-swap drops from quarters to weeks. The institution stops being a hostage to its incumbent vendor’s release schedule.

What This Looks Like for the Customer

Same customer. Same intent. Different experience.

What changes step by step when token economics is enabled

Unmanaged
experience
step 1
Opens surface
Cold start. 3.4s wait.
3.4s measured
step 2
Asks question
Full context resent.
8.2k tokens measured
step 3
Frontier model
Most expensive tier.
$0.082 measured
step 4
Free-form reply
Verbose. Hard to validate.
1,240 tokens measured
step 5
Customer asks again
Refines. Cost doubles.
+$0.078 measured
Managed
experience
step 1
Opens surface
Cache hit. Instant.
0.2s measured
step 2
Asks question
Context budgeted. Reused.
1.6k tokens measured
step 3
Right-sized model
Routed to mid-tier.
$0.011 measured
step 4
Structured reply
Templated. Validated.
320 tokens measured
step 5
Follow-up cached
Cost drops further.
+$0.004 measured

Figure 5. Same customer, same five steps, two experiences. The managed one runs at ~1/12th the cost and ~17x the speed.

Unmanaged: 3.4 second cold start, 8.2k tokens on the first question, $0.082 for the frontier model call, 1240 tokens of free-form reply, cost roughly doubles on the follow-up.

Managed: 0.2 second cache hit, 1.6k tokens because context is budgeted and reused, $0.011 routed to mid-tier, 320 tokens of structured reply, follow-up mostly cached.

Same customer. Same intent. The customer notices the latency. The institution sees the rest.

What This Enables for the Future of Customer Communications

Token economics is not the goal. It is the precondition.

The institutions that scale token economics get to do things their peers cannot:

  • They add new AI surfaces without renegotiating the budget every cycle.
  • They keep pace with each new model release.
  • They ship customer experiences that respond in 200 milliseconds (instead of three seconds) with the same regulatory posture they had before.
  • They put AI in front of every customer they serve, not just the ones who happen to land on the most-funded surfaces.

Four Strategies. One Outcome.

Token economics is not a budget conversation. It’s the operating discipline that decides which AI experiences a regulated institution can put in front of its customers, and which ones it cannot.

  1. Context discipline
  2. Model routing
  3. Output shape
  4. Substitutable architecture

Each of the four strategies moves unit cost by a measurable amount. Together, they decide whether an AI experience scales token economics or stalls.

Across observed deployments, regulated institutions that have invested in all four sit at 8-15 times lower unit cost than those that have invested in none. The gap shows up in which customer experiences survive the next planning cycle, and in how many new ones the institution can stand up beside them.

That’s what enabling AI experiences actually looks like.

Our Bloggers

James Hall headshot
James Hall

CCM in Europe

Jason Pothen headshot
Jason Pothen

CCM in Utilities, Healthcare, and Consumer Finance

Steve Diamond headshot
Steve Diamond

CCM for Banks and Credit Unions

Tim Carlson headshot
Tim Carlson

CCM in Utilities

Curly Lippa headshot
Curly Lippa

CCM in Wealth Management

Shah javed headshot
Shah Javed

CEM Product Strategy

Guest Bloggers

Keypoint Intelligence logo
Keypoint Intelligence

CCM Industry Experts

Shah Javed
Vice President, Product – CEM at Doxim
Shah joined Doxim in 2019 and is the head of strategy and product management for Doxim's Customer Engagement Management (CEM) platform.
Shah has held product, management consulting and executive roles in multiple companies.
Shah has done a MBA with the University of Edinburgh, a PGDip in Information technology governance from the Edinburgh Napier University, and completed an executive program in strategic innovation from the Said School of Business at the University of Oxford.

Connect with me
Skip to form