Prompt Caching: Cut Agent Cost Without Breaking Quality

Why did the same agent get more expensive after a harmless prompt edit?

The tempting answer is to chase cheaper models before checking whether context layout broke the cache. That answer is not useless, but it is too vague to operate. Prompt caching reuses stable prompt prefixes so repeated agent turns can avoid reprocessing the same context. For agents, it is not just a pricing trick. It is a context-layout discipline: stable rules and tools stay early, volatile turn data stays late, and evals protect quality.

Query

prompt caching

Generated hand-drawn illustration of stable prompt context separated from volatile agent turn data.

Direct answer

Prompt caching reuses stable prompt prefixes so repeated agent turns can avoid reprocessing the same context. For agents, it is not just a pricing trick. It is a context-layout discipline: stable rules and tools stay early, volatile turn data stays late, and evals protect quality.

Data note

When this matters

Agents carry long system instructions, tool definitions, memory, policies, or source packs.
Cost per turn rises after prompt, tool, or memory changes.
You need to reduce cost without changing task success or safety results.

Failure modes this page should catch

Timestamps or scratch notes enter the stable prefix and break cache reuse.
Tool definitions reorder every turn.
A cache improvement hides a quality regression.
Teams optimize cost per turn without tracking eval pass rate.

Prompt-cache decision table

Gate	Signal	Action
Stable prefix	system rules, policy, tool definitions	Keep deterministic
Volatile suffix	current user turn, live data, timestamps	Move late
Cache metrics	read tokens, write tokens, miss rate	Track per turn
Quality guardrail	eval pass rate	Do not count cost win if quality drops
Regression check	prompt diff and cache read drop	Bisect context builder

Running example

A coding agent’s cost jumps after a prompt cleanup. The monitor shows cache-read tokens dropped. The diff reveals a timestamp added near the top of the system block. Moving it to the turn message restores cache behavior without changing the eval suite.

Copy the working template

Use the prompt-cache decision table above as the v1 artifact for this page. Replace the placeholders with your own agent names, tools, risk classes, and thresholds, then link the result back into your monitoring, tracing, security, and evaluation gates.

How this connects to the control-gates library

Frequently Asked Questions

What is prompt caching?

Prompt caching lets repeated requests reuse stable prompt content instead of reprocessing the same tokens every turn. It is most useful when long instructions, tools, or context stay consistent.

How does prompt caching help agents?

Agents often repeat large policies, tool definitions, and project context. Caching those stable sections can reduce cost and latency, but only if volatile data does not break the prefix.

What is the quality risk?

A cheaper cached prompt can still be worse if context layout hides fresh evidence or changes tool behavior. Treat eval pass rate as the guardrail metric.

The Takeaway

Prompt caching belongs in the control layer because cost reduction only counts when correctness stays intact.