Introducing Zep: Long-term Memory Storage and Enrichment for AI Apps

Chat history storage is an infrastructure challenge all developers and enterprises face as they look to move from prototypes to productionizing LLM/ AI Chat applications that provide rich and intimate experiences to users.

Zep allows developers to focus on developing their AI apps, rather than building memory persistence, search, and enrichment infrastructure.

Rapidly prototyping LLM-based chat and agent applications has never been easier. Langchain, LlamaIndex, and other frameworks are fantastic building blocks for doing so.

However, taking software to production is often hard and LLM-based apps are no different. I recently developed a mortgage advisor agent and was struck by how much I would need to build myself in order to persist and manage chat and agent histories over the long term.

Standing up and managing low-latency, asynchronous infrastructure to store, manage, and enrich memories is non-trivial. Both Sharath and I have built large enterprise systems, and are excited to announce Zep, a platform for long-term memory storage and enrichment.

Want to get started using Zep?

Follow the Zep Quick Start Guide.

Why persist memories over the long-term?

Long-term memory persistence enables a variety of capabilities for LLM apps, including:

  • Personalized re-engagement of users based on their chat history. If an agent can't recall context from previous sessions with a user, the experience will be poor.
  • Prompt evaluation based on historical data. It's valuable to understand how conversations evolve longitudinally, where an agent gets stuck, and how prompts could potentially be improved.
  • Analysis of historical data to understand user behavior and preferences. Insights into how an agent is being used and user behavior are a powerful driver for feature roadmap planning.


  • Most LLM chat history or memory implementations run in-memory, and are not designed for stateless deployments or long-term persistence.
  • Summarization, entity extraction, and metadata enrichment capabilities are still very primitive and executed synchronously by agents, resulting in high latency experienced by users.
  • When storing messages long-term, developers are exposed to privacy and regulatory obligations around retention and deletion of user data (such as CCPA and GDPR), and need to develop their own solutions to ensure compliance.

The Zep platform

We've only just started on the Zep journey, but today Zep already includes:

  • Long-term memory persistence: Access historical messages independently of the chosen summarization strategy.
  • Auto-summarization: Configurable message window that stores a series of summaries, allowing for flexibility in future summarization strategies.
  • Vector search: Search over memories with messages automatically embedded on creation.
  • Auto-token counting: Finer-grained control over prompt assembly using memory and summary token counts.
  • Python and JavaScript SDKs: Easily integrate Zep into your development environment.

Zep ❤️ Langchain

Today's release of Langchain includes support for Zep:

  • A ZepChatHistory class, which you can use with Langchain memory classes to natively persist and retrieve chain and agent memory.
  • A ZepRetriever class, which enables contextual vector search over a user's long-term memory, allowing you to provide agents with historical context.

Read more here.

Getting Zep

Zep is open source and available on GitHub:

Quick start documentation:

We hope that Zep allows you to focus on developing awesome AI chat and agent apps, rather than dealing with the complexities of memory persistence, search, and enrichment infrastructure.

Daniel & Sharath

Visit Zep on GitHub!