Introducing Zep Hybrid Search and Custom Metadata
Zep now supports both vector search over message text and filtering on message metadata, including system metadata such as Named Entities and creation dates.
Zep automatically embeds chat histories and makes them available for semantic vector search via Zep's client libraries or a Langchain Retriever. Zep now also supports search across both the text and the metadata of messages, including system metadata such as Named Entities and creation dates.
With custom metadata and hybrid search, developers can:
- associate business context with messages. For example, a support case ID, allowing conversations in long-term memory associated with the case to be retrieved when needed.
- search for specific classes of Named Entities, such as peoples, places, numbers, dates and more, associated with a topic. For example, people's names in a conversation about family, or property valuations in a conversations about selling a home.
Custom Message Metadata
Alongside hybrid search, developers can now associate metadata with messages. The json
structures persisted alongside messages may be arbitrarily deep and support any json
types.
zep_client.add_memory(
session_id=session_id,
memory_messages=Memory(
messages=[Message(role="human",
content="I've read many books written by Octavia Butler.",
metadata={"foo": "bar"})]
),
)
Zep supports jsonpath
queries over metadata using Postgres jsonb_path_exists
, offering a powerful query language for retrieving messages.
zep_client.search_memory(
session_id=session_id,
search_payload=MemorySearchPayload(
query="I enjoy reading science fiction.",
metadata={
"where": {"jsonpath": '$[*] ? (@.foo == "bar")'},
},
),
)
{
"dist": 0.7170433826192629,
"message": {
"content": "I've read many books written by Octavia Butler.",
"created_at": "2023-06-03T22:00:43.034056Z",
"metadata": {
"foo": "bar",
"system": {
"entities": [
{
"Label": "PERSON",
"Matches": [
{
"End": 46,
"Start": 32,
"Text": "Octavia Butler"
}
],
"Name": "Octavia Butler"
}
]
}
},
"role": "human",
"token_count": 13,
"uuid": "8f3a06dd-0625-41da-a2af-b549f2056b3f"
},
"metadata": null,
"summary": null
}
Zep's Langchain Retriever can also be used to execute hybrid searches. A custom Chain would need to be used to take advantage of this capability.
retriever.get_relevant_documents(
"Famous sci-fi authors",
metadata={"where": {"jsonpath": '$[*] ? (@.foo == "bar")'}}
)
[
{
"page_content": "I've read many books written by Octavia Butler.",
"metadata": {
"score": 0.8346713396773939,
"uuid": "8f3a06dd-0625-41da-a2af-b549f2056b3f",
"created_at": "2023-06-03T22:00:43.034056Z",
"role": "human",
"metadata": {
"foo": "bar",
"system": {
"entities": [
{
"Label": "PERSON",
"Matches": [
{
"End": 46,
"Start": 32,
"Text": "Octavia Butler"
}
],
"Name": "Octavia Butler"
}
]
}
},
"token_count": 13
}
}
]
Metadata Search Deep Dive
As mentioned above, Zep uses the Postgres jsonpath
query language implementation to filter for messages. This allows us to write sophisticated queries that traverse the json
structure.
{"where": {"jsonpath": "$.system.entities[*] ? (@.Label == \"PERSON\")"}}
The above would match on the Octavia Butler
named entity in the json
metadata of the message above.
Composing boolean searches
While it's possible to implement complex boolean searches using jsonpath
itself, Zep offers a simpler approach to composing boolean queries.
{
"where": {
"and": [
{
"jsonpath": "$.system.entities[*] ? (@.Label == \"GRE\")"
},
{
"jsonpath": "$.system.entities[*] ? (@.Label == \"ORG\")"
},
{
"or": [
{
"jsonpath": "$.system.entities[*] ? (@.Name == \"Iceland\")"
},
{
"jsonpath": "$.system.entities[*] ? (@.Name == \"Canada\")"
}
]
}
]
}
}
The above would match on a message with Named Entities that were both of GRE
and ORG
label types, and where these or other entities were also named Iceland
or Canada
. These query structures can be arbitrarily deep.
Searching by Creation Date
We can also search by message creation date. In the following example, we've composed a query that executes a semantic similarity search on message contents, filters by creation date, and by metadata contents.
zep_client.search_memory(
session_id=session_id,
search_payload=MemorySearchPayload(
query="Famous sci-fi authors",
metadata={
"start_date": "2023-06-02",
"end_date": "2023-06-04",
"where": {"jsonpath": '$[*] ? (@.foo == "bar")'},
},
),
)
The date values should be in ISO 8601 format may include a time and timezone.
We have a Python notebook exploring the above in our Zep By Example repo.
This functionality is available in Zep's Python SDK and Langchain. TypeScript/Javascript support will be coming soon.
Next steps
- Follow the Zep Quick Start Guide for installation and SDK instructions.