Introducing Open Source Embeddings! 🔥

Zep now has support for open source embedding models. Search your LLM App's chat history faster and more cheaply.

As a fast follow to our recent survey of embedding models, we're introducing an experimental local, open source embedding option in today's Zep Memory Store 0.6.5 release.

Using Zep's open source embedding feature has the following benefits:

  • 🚀 Fast! The model we're using is fast on a CPU and far faster than a call to the OpenAI embedding API. This is particularly important for vector search over message histories, where we've seen latency drop by up to 70%.
  • 💸 Free! You're running Zep already, why not use it for embeddings?
  • 🥑 Lower storage requirements! Our selected embedding model has 768 element-wide vectors versus OpenAI's 1,536 elements, translating to a ~50% savings on vector storage.

Why is embedding latency important?

While documents in a vector store are usually embedded before prompt creation, search term embeddings must be created on the fly when the search is executed. This puts the embedding speed on the critical path to generating an LLM result.

Zep stores an LLM app's chat history and offers vector search over this memory, enabling long-term context to be added to a prompt. Using a local embedding model dramatically reduces round-trip time to search for historical memories.

How to enable Local Embeddings

Local embeddings will not be available if you're upgrading an existing implementation. This functionality is currently only available with new installations.

Step 1

If you're running Zep locally with docker compose, edit the docker-compose.yml file and set the nlp service's ENABLE_EMBEDDINGS environment variable to true.

  nlp:
    image: ghcr.io/getzep/zep-nlp-server:latest
    container_name: zep-nlp
    environment:
        - ENABLE_EMBEDDINGS=true

Alternatively, add the following to the zep-nlp container's environment:

ENABLE_EMBEDDINGS=true

Step 2

Before running Zep for the first time, edit your config.yaml file and uncomment the local model type and 768 dimensions lines and comment the lines referring to the OpenAI AdaEmbeddingV2 model. The embeddings section in the file should look as follows

  embeddings:
    enabled: true
#    dimensions: 1536
#    model: "AdaEmbeddingV2"
    dimensions: 768
    model: "local"

Alternatively, add the following environment variables to the Zep container's environment or your .env file:

ZEP_EXTRACTORS_EMBEDDINGS_MODEL="local"
ZEP_EXTRACTORS_EMBEDDINGS_DIMENSIONS=768

Why are Local Embeddings "experimental"?

While relatively stable, this feature is not yet fully baked:

  • We've not developed a migration path for existing users. Existing messages are not embedded with the new model, and the existing index remains in use.
  • We've not yet found a way to optionally deploy Zep with embeddings enabled to Render or other platforms with our default configurations. This is still possible, but DIY. See the environment variable requirements above. You will also need to ensure you have 1GB of memory available for the zep-nlp service container.
  • We'd like to understand our user's experience with using the model we've selected versus OpenAI's embedding API, before we promote this feature to stable.

Which model is shipped with Zep?

We use SentenceTransformer's multi-qa-mpnet-base-dot-v1 model for embedding messages. This model has been specifically trained for semantic search on a large and diverse set of (question, answer) pairs. We selected the model for its accuracy, performance, and relatively low memory footprint.

Other important changes in this release

  • We're now using the dot or inner product to calculate search distance. This may affect distance values (or scores if you're using Langchain). If you filter by distance value, please experiment with the new distances before updating.
  • When embeddings are enabled, the Zep NLP server uses ~420MiB more memory and requires a model download on first run. Please ensure that your deployment environment has approx. 1GB of memory for this container and allows outbound internet access.

Next steps