Introducing Zep Hybrid Search and Custom Metadata

Zep now supports both vector search over message text and filtering on message metadata, including system metadata such as Named Entities and creation dates.

Introducing Zep Hybrid Search and Custom Metadata

Zep automatically embeds chat histories and makes them available for semantic vector search via Zep's client libraries or a Langchain Retriever. Zep now also supports search across both the text and the metadata of messages, including system metadata such as Named Entities and creation dates.

With custom metadata and hybrid search, developers can:

  • associate business context with messages. For example, a support case ID, allowing conversations in long-term memory associated with the case to be retrieved when needed.
  • search for specific classes of Named Entities, such as peoples, places, numbers, dates and more, associated with a topic. For example, people's names in a conversation about family, or property valuations in a conversations about selling a home.
💡
Want to get started using Zep?

Follow the Zep Quick Start Guide.

Custom Message Metadata

Alongside hybrid search, developers can now associate metadata with messages. The json structures persisted alongside messages may be arbitrarily deep and support any json types.

zep_client.add_memory(
    session_id=session_id,
    memory_messages=Memory(
        messages=[Message(role="human", 
        		  content="I've read many books written by Octavia Butler.", 
                  metadata={"foo": "bar"})]
    ),
)

Zep supports jsonpath queries over metadata using Postgres jsonb_path_exists, offering a powerful query language for retrieving messages.

zep_client.search_memory(
    session_id=session_id,
    search_payload=MemorySearchPayload(
        query="I enjoy reading science fiction.",
        metadata={
            "where": {"jsonpath": '$[*] ? (@.foo == "bar")'},
        },
    ),
)
{
  "dist": 0.7170433826192629,
  "message": {
    "content": "I've read many books written by Octavia Butler.",
    "created_at": "2023-06-03T22:00:43.034056Z",
    "metadata": {
      "foo": "bar",
      "system": {
        "entities": [
          {
            "Label": "PERSON",
            "Matches": [
              {
                "End": 46,
                "Start": 32,
                "Text": "Octavia Butler"
              }
            ],
            "Name": "Octavia Butler"
          }
        ]
      }
    },
    "role": "human",
    "token_count": 13,
    "uuid": "8f3a06dd-0625-41da-a2af-b549f2056b3f"
  },
  "metadata": null,
  "summary": null
}

Zep's Langchain Retriever can also be used to execute hybrid searches. A custom Chain would need to be used to take advantage of this capability.

retriever.get_relevant_documents(
    "Famous sci-fi authors", 
    metadata={"where": {"jsonpath": '$[*] ? (@.foo == "bar")'}}
)
[
  {
    "page_content": "I've read many books written by Octavia Butler.",
    "metadata": {
      "score": 0.8346713396773939,
      "uuid": "8f3a06dd-0625-41da-a2af-b549f2056b3f",
      "created_at": "2023-06-03T22:00:43.034056Z",
      "role": "human",
      "metadata": {
        "foo": "bar",
        "system": {
          "entities": [
            {
              "Label": "PERSON",
              "Matches": [
                {
                  "End": 46,
                  "Start": 32,
                  "Text": "Octavia Butler"
                }
              ],
              "Name": "Octavia Butler"
            }
          ]
        }
      },
      "token_count": 13
    }
  }
]

Metadata Search Deep Dive

As mentioned above, Zep uses the Postgres jsonpath query language implementation to filter for messages. This allows us to write sophisticated queries that traverse the json structure.

{"where": {"jsonpath": "$.system.entities[*] ? (@.Label == \"PERSON\")"}}

The above would match on the Octavia Butler named entity in the json metadata of the message above.

Composing boolean searches

While it's possible to implement complex boolean searches using jsonpath itself, Zep offers a simpler approach to composing boolean queries.

{
  "where": {
    "and": [
      {
        "jsonpath": "$.system.entities[*] ? (@.Label == \"GRE\")"
      },
      {
        "jsonpath": "$.system.entities[*] ? (@.Label == \"ORG\")"
      },
      {
        "or": [
          {
            "jsonpath": "$.system.entities[*] ? (@.Name == \"Iceland\")"
          },
          {
            "jsonpath": "$.system.entities[*] ? (@.Name == \"Canada\")"
          }
        ]
      }
    ]
  }
}

The above would match on a message with Named Entities that were both of GRE and ORG label types, and where these or other entities were also named Iceland or Canada.  These query structures can be arbitrarily deep.

Searching by Creation Date

We can also search by message creation date. In the following example, we've composed a query that executes a semantic similarity search on message contents, filters by creation date, and by metadata contents.

zep_client.search_memory(
    session_id=session_id,
    search_payload=MemorySearchPayload(
        query="Famous sci-fi authors",
        metadata={
            "start_date": "2023-06-02",
        	"end_date": "2023-06-04",
            "where": {"jsonpath": '$[*] ? (@.foo == "bar")'},
        },
    ),
)

The date values should be in ISO 8601 format may include a time and timezone.

We have a Python notebook exploring the above in our Zep By Example repo.

This functionality is available in Zep's Python SDK and Langchain. TypeScript/Javascript support will be coming soon.

Next steps