Semantic Similarity as an Intent Router for LLM Apps

Ensuring your LLM app understands user intent is crucial in offering a great experience. We build an intent router using Langchain, which automatically selects the prompt best suited to a task.

Semantic Similarity as an Intent Router for LLM Apps

Understanding a user’s intent is key to offering personalized and relevant experiences. One user’s intention for using a chatbot agent may not be the same as another's. A useful framework for thinking about intent is to consider task specificity and task specialization:

  • Task specificity: What task is the user hoping to accomplish by engaging with the agent? For example, a chatbot on a website may be tasked with both sales and customer support. An enterprise HR application may have an agent interface that allows employees to manage their 401K, request leave, or sign up for workplace activities.  Which of these does the user need?
  • Task specialization: Beyond the specific task, are there attributes to the user’s intent that might inform how the agent should interact with the user? Consider an example shoe sales and support bot: we may need to understand not only whether the user would like to buy shoes but also the shoe type. Is it a running shoe? And if so, does the agent need to ask whether the user pronates or supinates to make a purchase recommendation?

Task specificity and specialization dictate which prompts our agents should use and how we construct those prompts:

  • A sales prompt would instruct the LLM to ask a different set of questions to complete the task than a support-specific prompt. 
  • The shoe bot, in support mode, would need to be primed with runner-specific advice if engaging with a user who previously purchased running shoes.

Conceptually, we want to develop specific “routes” for a user to take through their journey with the LLM, with these routes being composed of different prompts. 

How Zep’s Intent Extractor works and what it’s good for

We launched Zep’s Intent Extractor last week and demonstrated how it can be a powerful tool for post hoc analysis of a user’s intent. For example, developers can extract a history of intents from user conversations and cluster them to understand the users’ needs better.

Human: I'm looking for a good face moisturizer.

// The message in the Zep Memory Store
{
    "uuid": "f8ec855b-fd59-4084-a3d1-2b81857d5dcd",
    "created_at": "2023-06-21T02:19:45.577522Z",
    "role": "human",
    "content": "I'm looking for a good face moisturizer.",
    "metadata": {
        "system": {
            "intent": "The subject is searching for a specific type of facial skincare product."
        }
    },
    "token_count": 12
}

The extractor uses an LLM to identify a user’s intent and persists this intent to the Zep Memory Store. We have seen unpredictable latencies with calls to LLM endpoints, such as those offered by OpenAI, sometimes requiring multiple retries. Zep runs extractors asynchronous to the chat loop to ensure this doesn’t impact the user experience, but it hard to rely on an LLM for routing prompts in the critical path of applications.

So, I started to think about a way to implement low-latency routing that didn’t rely on LLMs. Below I share an approach to implementing real-time intent routing.

The Intent Router concept

An “Intent Router” selects a prompt from a collection of prompts based on the context of the conversation. Using an LLM to decide which prompt to use is entirely feasible. Still, as mentioned above, LLM API access and completion latency can leave a user waiting and result in a poor user experience.

Semantic Similarity offers a very useful alternative to LLMs. Rather than waiting to identify the user’s intent before we can determine which prompt to use, we define a set of expected intents for an application and use vector search to determine which of these intents is closest to the user’s chat message.

Using Langchain and Semantic Similarity to Build an Intent Router

We can build a fast and effective intent router using a simple in-memory vector store, OpenAI Embeddings, and Langchain’s EmbeddingRouterChain class. 

Conceptually, we'll do the following:

  1. Identify the potential user intents for our application (in the example below, we have a sales intention and a customer support intention).
  2. Create separate task-specific prompts for each intention.
  3. Create chains using the above prompts for each intent, and associate them with the intent text.
  4. Embed the set of intents we identified, adding these to our vector store.
  5. When a user sends the agent a message, we embed the message and run a simple Nearest Neighbors or similar algorithm to find the embedded intent that is nearest in the vector space to the user’s message. That is, the intent is most semantically similar to the user’s message. 
  6. With this intent, we can select the correct chain and prompts.

There are many other vector embedding models, and OpenAI might not be the best option for a fast router, but it’s used here as it is simple to implement. 

The Intent Router Setup

Our simple Intent Router comprises an IntentRouter Python model and a IntentRouterChain that subclasses Langchain's MultiRouteChain.

💡

See the Zep By Example GitHub Repo for the Intent Router and example usage code.

Our Intent Router model is a data structure that neatly organizes all things associated with our intents, including the intent description and prompt.

class IntentModel(NamedTuple):
    """A model for an intent that a human may have."""

    intent: str
    description: str
    prompt: str
    default: bool = False  # is this the default or fallback intent?

Our IntentRouterChain class is where the magic happens.

class IntentRouterChain(MultiRouteChain):
    """Chain for routing inputs to different chains based on intent."""

    router_chain: RouterChain
    destination_chains: Mapping[str, LLMChain]
    default_chain: LLMChain

    @property
    def output_keys(self) -> List[str]:
        return ["text"]

    @classmethod
    def from_intent_models(
        cls,
        intent_models: List[IntentModel],
        llm: ChatOpenAI,
        embedding_model: Optional[Embeddings],
        memory: Optional[ConversationBufferMemory] = None,
        verbose: bool = False,
    ) -> "IntentRouterChain":
        """Create a new IntentRouterChain from a list of intent models."""

        names_and_descriptions = [(i.intent, [i.description]) for i in intent_models]

        router_chain = EmbeddingRouterChain.from_names_and_descriptions(
            names_and_descriptions,
            DocArrayInMemorySearch,
            embedding_model,
            routing_keys=["input"],
            verbose=verbose,
        )

        default_chain: Optional[LLMChain] = None
        destination_chains = {}
        for i in intent_models:
            destination_chains[i.intent] = LLMChain(
                llm=llm,
                prompt=PromptTemplate(
                    template=i.prompt, input_variables=["input", "chat_history"]
                ),
                memory=memory,
            )
            if i.default:
                default_chain = destination_chains[i.intent]

        if not default_chain:
            raise ValueError("No default chain was specified.")

        return cls(
            router_chain=router_chain,
            destination_chains=destination_chains,
            default_chain=default_chain,
            verbose=verbose,
        )

Firstly, Langchain ships with a MultiRouteChain which takes as input a RouterChain and a collection of destination_chains. The RouterChain is responsible for determining which of the destination_chains should be used to respond to the user's message.

Our router chain uses the Langchain EmbeddingRouterChain to do this routing. As the name implies, the embedding router takes the most recent user message, embeds it, and compares the message to vectors in the provided vector store to find the most relevant route.

The vector store we're using, the DocArrayInMemorySearch, is a simple and lightweight store ideal for this use case. We aren't likely to have more than a dozen intents in a large application and so a vector database would be overkill here.

Let's start of by defining our intents and populating our list of models. We have two intents: purchase a widget and needs customer support, each with a task-specific prompt.

sales_template = (
    "You are widgets sales rep and your job is to assist humans with completing the "
    "purchase of a widget.\n"
    "In order to close a widget sale, you need to know how many widgets the human would "
    "like to purchase. Don't be pushy about making the sale,\n"
    "but remember that your job is dependent on achieving your sales quota.\n"
    "\n"
   ...
    "\n"
    "Here are the prior messages in this conversation:\n"
    "{chat_history}\n"
    "\n"
    "Here is a question: {input}\n"
)

support_template = (
    "You are support agent for a widget producer and your job is to assist humans with "
    "issues they may have with using the widgets\n"
    "they purchased from us.\n"
    "To assist a human, you need to know when the widget was purchased, what color it "
    "is, and what the user's support issue is.\n"
    "\n"
   ...
    "\n"
    "Here are the prior messages in this conversation:\n"
    "{chat_history}\n"
    "\n"
    "Here is a question: {input}\n"
)

Our models look as follows.

intent_models = [
    IntentModel(
        intent="purchase a widget",
        description="the human would like to make a purchase",
        prompt=sales_template,
        default=True,
    ),
    IntentModel(
        intent="needs customer support",
        description="the human has a support query",
        prompt=support_template,
    ),
]

The IntentRouterChain, takes this list of IntentModels, embeds the description fields, and creates a chain for each of the prompts.

zep_chat_history = ZepChatMessageHistory(session_id="test_user", url=ZEP_API_URL)
memory = ConversationBufferMemory(
    chat_memory=zep_chat_history, memory_key="chat_history"
)

llm = ChatOpenAI(model_name="gpt-3.5-turbo")

chain = IntentRouterChain.from_intent_models(
    intent_models=intent_models,
    llm=llm,
    embedding_model=OpenAIEmbeddings(),
    memory=memory,
    verbose=True,
)

We also instantiate the Zep memory and pass this to the router, alongside the OpenAI LLM and Embedding models.

When we run the example, we see the correct chain and prompt being used.

print(chain.run({"input": "I'm upset my widget doesn't work!"}))

> Entering new  chain...
Prompt after formatting:
You are support agent for a widget producer and your job is to assist humans with issues they may have with using the widgets
they purchased from us.
In order to assist a human, you need to know when the widget was purchased, what color it is, and what the user's support issue is.

Important notes about working with humans:
- Always be friendly! They may be upset if their widget doesn't work. Being friendly will help you getting the answers you need.
- Humans can only answer 1 question at a time.

Important support information:
- Widgets have a warranty of 1 year.
- Sparkling gold widgets tend to flake the sparkling paint. We are happy to replace these as long as the widget is under warranty.
- Many humans forget to power on their widgets before attempting use. This should be your first line of questioning.

Today's date is 06/27/2023.

Here are the prior messages in this conversation:


Here is a question: I'm upset my widget doesn't work!


> Finished chain.

Hello! I'm sorry to hear that your widget isn't working. I'd be happy to assist you with that. Before we proceed, could you please let me know when you purchased the widget?

When asking a sales question, we see...

    print(
        chain.run(
            {"input": "I'd like to purchase 10 widgets. What color do they come in?"}
        )
    )

> Entering new  chain...

> Finished chain.
purchase a widget: {'input': "I'd like to purchase 10 widgets. What color do they come in?"}

> Entering new  chain...
Prompt after formatting:
You are widgets sales rep and your job is to assist humans with completing the purchase of a widget.
In order to close a widget sale, you need to know how many widgets the human would like to purchase. Don't be pushy about making the sale,
but remember that your job is dependent on achieving your sales quota.

Important sales information:
- The current price of widgets is $499.
- If a customer wants to purchase 10 or more widgets, you can offer a 20% discount. There are no discounts available for smaller purchases.
- They come in blue, black, hot pink, and sparkling gold colors.
- Widgets have a warranty of 1 year.

Here are the prior messages in this conversation:
Human: I'm upset my widget doesn't work!
AI: Hello! I'm sorry to hear that your widget isn't working. I'd be happy to assist you with that. Before we proceed, could you please let me know when you purchased the widget?

Here is a question: I'd like to purchase 10 widgets. What color do they come in?


> Finished chain.
Thank you for your interest in purchasing widgets! I'm happy to assist you with your order. Widgets come in blue, black, hot pink, and sparkling gold colors. Which color would you prefer for your 10 widgets?

Inspecting Zep's memory for one of the above messages, we see Zep has identified the intent of the user. Given we're not providing the LLM the list of the intents we identified above, the Zep intent text is different but semantically similar and could potentially be passed to the router, too.

  Message(
      role="human",
      content="I'm upset my widget doesn't work!",
      uuid="749923b9-4dbf-4bd6-b755-eb1d725bab41",
      created_at="2023-06-28T18:11:06.850246Z",
      token_count=11,
      metadata={
          "system": {
              "intent": (
                  "The subject is expressing dissatisfaction and frustration"
                  " regarding the malfunctioning of their widget."
              ),
          }
      },
  )

The above is a very simple example with only two intents. Provided intents are not too nuanced, this approach will likely scale well to many more intents.

There are several improvements that could be made to the router:

  • The router only evaluates the most recent message for a user's intent. This is problematic when the most recent user response lacks the context of the conversation. For example, a user responding "size 9" when asked by the bot for shoe size, does not convey intent. A user's intent would be more accurately determined looking across intent identified in prior messages using Zep’s Intent Extraction.
  • Another layer of routing may be required to determine task specialization. We know the user has a complaint or concern about a pair of shoes, but not how we might want to respond specifically given the user has previously purchased running shoes from us. Zep's Entity Extraction might be useful here in determining how to populate the prompt with the shoe type or user segmentation.
Visit Zep on GitHub!