Book a demo Sign up

Engineering

A collection of 1 post

Agent memory placement can cut your token bill up to 2x

Agent memory placement can cut your token bill up to 2x

Agent memory has to refresh every turn, but putting it in the system prompt breaks prompt caching and re-bills the whole conversation each turn. Here is the one-message fix, and the token savings it led to in an experiment.