Building Voice Agents with Memory: Zep x LiveKit

Create personalized voice agents with long-term memory with minimal added latency

Building Voice Agents with Memory: Zep x LiveKit

Building Personalized Voice Agents with Zep and LiveKit

The Zep/LiveKit integration combines Zep's context engineering platform with LiveKit's voice agent framework. With the integration, you can build personalized and reliable voice agents that remember user preferences and conversation history across sessions with low latency.

⚡ Concerned memory will slow your agent down? Zep delivers P95 retrieval latency under 250ms, ensuring your voice agents maintain real-time responsiveness while accessing rich contextual memory.

A complete code example is available on GitHub to get you started quickly.

The integration provides a pre-built agent design with Zep implemented for you, making it simple to immediately make your agents more personalized.


How It Works

The integration works by providing a predefined voice agent - ZepUserAgent - that has Zep already implemented for you as its long-term memory store.

The ZepUserAgent is a normal voice agent, but it:

  • Stores all messages in Zep's knowledge graph
  • Retrieves relevant context in the form of Zep's Context Block before each response
  • Updates the system prompt with the Context Block on each turn to improve quality of AI response

Under the hood, ZepUserAgent uses Zep's thread.add_messages() and thread.get_user_context() methods to add messages to the user's graph and to retrieve Zep's Context Block, respectively:

# The agent uses this Zep method to add messages to the user's graph (and the given thread)
response = zep_client.thread.add_messages(thread_id, messages=messages)

# The agent uses this Zep method to get the Zep Context Block for the user
memory = zep_client.thread.get_user_context(thread_id=thread_id, mode="basic")
context_block = memory.context

The thread.get_user_context() uses the previous few messages from the provided thread as a search query on the user's knowledge graph; the results of this knowledge graph search make up the Context Block.

The Context Block (when mode="basic") looks like this:

FACTS, ENTITIES, and EPISODES represent relevant context to the current conversation.
# These are the most relevant facts and their valid date ranges
<FACTS>
    - John's favorite song is "Viva La Vida" by Coldplay (valid: 2024-01-15 to present)
    - User prefers morning meetings over afternoon ones (valid: 2024-02-01 to present)
</FACTS>
# These are the most relevant entities  
<ENTITIES>
    - John: Software engineer, works at tech startup, enjoys indie rock music
    - Spotify: Music streaming service user frequently mentions
</ENTITIES>
# These are the most relevant episodes
<EPISODES>
    - "My favorite song is Viva la Vida by Coldplay. And Coldplay is my favorite band"
    - "Can you make me a playlist of indie rock songs?"
</EPISODES>

This block is dynamically inserted into the system prompt before the agent responds so the agent response can take this context into account.

Using the Integration

Implementation requires three steps:

Step 1: Creating Zep User and Thread

Set up the user and thread in Zep:

import os
import uuid
from zep_cloud.client import AsyncZep

# Initialize Zep client
zep_client = AsyncZep(api_key=os.getenv("ZEP_API_KEY"))

# User constants
USER_ID = "John-1234"
THREAD_ID = f"conversation-{uuid.uuid4().hex[:8]}"
USER_FIRST_NAME = "John"
USER_LAST_NAME = "Doe"
USER_EMAIL = "[email protected]"

# Create or get user
try:
    await zep_client.user.get(user_id=USER_ID)
except Exception:
    await zep_client.user.add(
        user_id=USER_ID,
        first_name=USER_FIRST_NAME,
        last_name=USER_LAST_NAME,
        email=USER_EMAIL,
    )

# Create new thread for this session
await zep_client.thread.create(
    thread_id=THREAD_ID,
    user_id=USER_ID
)

Step 2: Setting up LiveKit Session/Room

Create the LiveKit session:

from livekit import agents
from livekit.plugins import openai, silero

async def entrypoint(ctx: agents.JobContext):
    # Connect to LiveKit room
    await ctx.connect()
    
    # Create agent session with OpenAI components
    session = agents.AgentSession(
        stt=openai.STT(),
        llm=openai.LLM(model="gpt-4o-mini"),
        tts=openai.TTS(voice="alloy"),
        vad=silero.VAD.load(),
    )

Step 3: Defining Voice Agent with Memory

Initialize the agent:

from zep_livekit import ZepUserAgent

# Create the memory-enabled agent
agent = ZepUserAgent(
    zep_client=zep_client,
    user_id=USER_ID,
    thread_id=THREAD_ID,
    user_message_name=USER_FIRST_NAME,
    assistant_message_name="Assistant",
    instructions="You are a helpful assistant who responds concisely in at most 1 sentence for each response."
)

# Start the session
await session.start(agent=agent, room=ctx.room)

The ZepUserAgent handles memory operations automatically.

Environment Setup

Make sure to set these API keys:

OPENAI_API_KEY=your_openai_api_key_here
ZEP_API_KEY=your_zep_cloud_api_key_here
LIVEKIT_URL=wss://your-project.livekit.cloud
LIVEKIT_API_KEY=your_livekit_api_key_here
LIVEKIT_API_SECRET=your_livekit_api_secret_here

Running the Voice Agent

Once configured, run:

python zep_voice_agent.py dev

And then test the agent on the LiveKit Agents Playground.

Key Features

The integration provides memory management with minimal configuration:

  • Automatic conversation storage
  • Context-aware response generation
  • Cross-session conversation continuity
  • User identity and preference tracking

Use Cases

The integration enables voice applications that require persistent context:

  • Personal Assistants: Maintaining user preferences and interaction history
  • Customer Service: Access to complete conversation history across sessions
  • Educational Tools: Tracking learning progress and adapting to individual needs
  • Healthcare Applications: Preserving patient interaction history and preferences

Extra: Add External User Data

Add additional user data in the form of JSON or unstructured text to a user's knowledge graph using graph.add():

import json
from zep_cloud.client import Zep

client = Zep(api_key=API_KEY)

# Add user profile data
profile_data = {
    "favorite_music_genre": "indie rock",
    "preferred_meeting_times": ["9:00 AM", "2:00 PM"],
    "dietary_restrictions": ["vegetarian"],
    "location": "San Francisco",
    "job_title": "Software Engineer"
}

json_string = json.dumps(profile_data)

client.graph.add(
    user_id=USER_ID,
    type="json",
    data=json_string,
)

This data is stored in the knowledge graph and retrieved as context when relevant, alongside the data sent to the graph from user-agent conversations. If the JSON and conversation messages ever refer to the same entity, those entities are automatically de-duplicated in the graph.

Extra: More Customizable Memory

Use ZepGraphAgent for more control over memory retrieval and search parameters:

from zep_livekit import ZepGraphAgent
from zep_cloud import SearchFilters

# Configure search filters for specific entity types
search_filters = SearchFilters(
    node_labels=["Restaurant", "Location"],
    edge_labels=["VISITED", "TRAVELED_TO"],
)

# Create graph agent with full customization
agent = ZepGraphAgent(
    zep_client=zep_client,
    graph_id="custom-graph-id",
    user_name=USER_FIRST_NAME,
    facts_limit=15,
    entity_limit=10,
    episode_limit=8,
    search_filters=search_filters,
    reranker="rrf",
    instructions="Your custom instructions here..."
)

ZepGraphAgent allows customization of context limits and filtering by custom entity/edge types. Behind the scenes these parameters are passed to Zep's graph.search method (docs).

Getting Started

The integration requires API keys for OpenAI, Zep Cloud, and LiveKit. Once configured, the agent handles memory management automatically while you focus on building voice interaction logic.

For complete documentation, setup instructions, and advanced configuration options, see the complete working example on Github and the Zep LiveKit Integration documentation.