Context Templates: Context Engineering Made Simple

Context Templates: Context Engineering Made Simple

You're tuning an agent. Retrieve too little context and it hallucinates. Too much and you're burning tokens on irrelevant facts while adding latency. Every agent builder faces this decision.

Zep maintains a knowledge graph of user conversations and business data. When your agent needs context, it queries this graph—and the tradeoffs regarding context size, agent accuracy, and latency apply. Getting this tradeoff right is critical for building an effective agent for your domain.

Zep's context templates make it easier than ever to engineer your agent's context and find your sweet spot, without complex retrieval code.

The Retrieval Tradeoff

Our retrieval tradeoff research shows that retrieval follows a diminishing returns curve. Moving from minimal to medium context significantly improves accuracy, but beyond a certain point, adding more context yields marginal gains while increasing token costs and latency.

Modern LLMs are good at filtering irrelevant context in their input; they're much less good at inferring facts that aren't there. This is why Zep optimizes for recall over precision—but gives you the tools to tune retrieval limits for your specific domain.

Control Retrieval Without Writing Retrieval Code

Without templates, you'd write retrieval logic in code—sequential API calls, orchestration logic, hard to tune, harder to reproduce in evals. Context templates solve this with a simple abstraction. You declare what you want:

template = """
# USER SUMMARY
%{user_summary}

# REQUIREMENTS AND PREFERENCES
%{edges limit=4 types=[HAS_REQUIREMENT,PREFERS_NEIGHBORHOOD]}

# KEY ENTITIES
%{entities limit=3}

# EPISODES
%{episodes limit=2}
"""

In Zep's graph: entities are people, places, and things; edges are relationships between them; episodes are the raw data you sent to Zep—messages, JSON, or unstructured text. This template specifies which of these to retrieve and how many.

You focus on what to retrieve and how much. Zep handles the rest.

Saving and Using Templates

Save the template once, then reference it by ID in agent calls. This makes your context configuration version-controlled and reproducible:

from zep_cloud.client import Zep

zep_client = Zep(api_key=os.getenv("ZEP_API_KEY"))

zep_client.context.create_context_template(
    template_id="requirements-and-preferences-1",
    template=template
)

In your agent implementation, reference the template when retrieving context:

context_block = await self.zep_client.thread.get_user_context(
    thread_id=thread_id,
    template_id="requirements-and-preferences-1"
).context

Instead of Zep's default context format, you receive a context block structured exactly according to your template.

Finding Your Sweet Spot

Context templates help you navigate the retrieval tradeoff in practice:

Tuning for your workload. Our research shows the optimal context configuration varies by task complexity. Multi-hop reasoning and questions spanning long conversation history benefit from higher retrieval limits. Simple, direct lookups can use lower limits to reduce latency and cost. Templates let you dial in the right balance.

Domain-specific filtering. Different domains have different context needs. A real estate agent needs property requirements and neighborhood preferences. A healthcare assistant needs medical history and care preferences. Zep's custom ontology feature lets you adapt Zep's graph to your domain, and templates let you filter to those domain-specific types.

Reproducible evaluations. When you're measuring agent performance, you need consistent context retrieval. Templates ensure you're testing apples to apples, making it easier to isolate whether changes in accuracy come from your model, your prompts, or your context configuration. Pair templates with Zep's evaluation harness to measure what's working.

Learn More