Enhancing Chatbots with Memory: Promoting better customer experience

The power of memory management in chatbots and building personalized interactions using Langchain

What makes a good chatbot in your opinion?

A good chatbot is as engaging as an insightful human conversation. Its ability to remember past exchanges, draw context, and deliver relevant responses is pivotal.

But how does one enable such memory and contextual understanding in a machine, specifically in large language models like OpenAI’s GPT-3.5 and GPT-4?

The answer: Conversational memory refers to a chatbot’s ability to handle successive inquiries in a dialogue format. It fosters a seamless and meaningful interaction, since without this capability, every question would be processed as a completely separate input, ignoring any previous exchanges.

This article aims to explore how to expand and condense conversational memory in large language models using the Langchain library, demonstrating techniques with Python code snippets.

The Mechanics of Memory in AI

Artificial Intelligence memory is not akin to biological memory.

Instead, memory in language models, such as those offered by OpenAI, is a crucial component in understanding the context and the flow of conversation.

Langchain offers different types of memory, as by default LLM’s (Large Language Models) are stateless, meaning each request from the user (Prompt) is processed independently.

There are several options, all are built on top of the ConversationChain.

ConversationBufferMemory
ConversationBufferWindowMemory
ConversationSummaryBufferMemory
ConversationKnowledgeGraphMemory

What options do I have for Conversational Memory?

ConversationBufferMemory

The simplest form of conversational memory in LangChain is the ConversationBufferMemory. Basically what it enables is injecting the previous interactions between the human and AI and transferring it directly into the {history} parameter.

In Python you basically need only these few lines of code:

from langchain.memory import ConversationBufferMemory

conversation_buffer_memory = ConversationChain(
    llm=llm,
    memory=ConversationBufferMemory()
)

conversation_buffer_memory("My name is Christophe, can you be my AI Assistant?")

This will from now on store all the interactions within the buffer memory until the prompt will explode. Hence, the large language model will take into consideration the questions and answers from previous interactions to generate the most recent answer.

To extract the past interactions you can simply access the buffer with the following line of code:

print(conversation_buffer_memory.memory.buffer)

ConversationBufferWindowMemory

On a second spot, we have the buffer window memory from Langchain. This is quite similar to the first one but instead of taking the whole history into account it only takes the last X, example 5, interactions into consideration for generating the most recent response.

In order to be able to use the ConversationBufferWindowMemory, we can take the following code:

from langchain.memory import ConversationBufferWindowMemory

conversation_buffer_window_memory = ConversationChain(
 llm=llm,
 memory=ConversationBufferWindowMemory(k=3)
)

In this code, we need to indicate “k”, k represents the window, so the number of messages to be remembered before removing them from the history.

Why would we consider removing knowledge from the history, by specifying k to a relevant small number?

With models like OpenAI’s GPT-3.5 or GPT-4, fees are determined based on the size of the prompt, measured in tokens. The larger the prompt, the higher the fees. In an effort to minimize these costs, one strategy is to limit the historical interactions to only the most recent exchanges. However, this approach has its drawbacks. For instance, information from earlier in the conversation may be pertinent to a current question, so how can this issue be addressed while still reducing fees?

ConversationSummaryBufferMemory might be the next suitable solution in line for you. By utilizing this method, the conversation’s history can be summarized, preserving the essential information while also reducing the overall token count. This not only helps in retaining the context and relevance of the entire conversation but also plays a role in managing costs. It strikes a balance between limiting the prompt size and ensuring that valuable information from earlier interactions remains accessible.

ConversationSummaryBufferMemory

In order to avoid this excessive consumption of tokens, ConversationSummaryMemory can be implemented, as using ConversationBufferMemory, it is possible to rapidly consume a significant number of tokens, even surpassing the context window limit of today’s most advanced LLMs.

Hence, this type of memory condenses the history of the conversation before transmitting it to the {history} parameter, as implied by its name.

The ConversationChain is initiated with the summary memory using the following Python code. Do not forget to create a LLM as the ConversationSummaryMemory needs one as input to perform the summarization.

from langchain.memory import ConversationSummaryMemory

conversation_summary_buffer_memory = ConversationChain(
 llm=llm,
 memory=ConversationSummaryMemory(llm=llm)
)

This allows us to condense each new interaction and augment it into an ongoing summary of all prior interactions.

You may think that the token count for this type of conversation is higher than when utilizing ConversationBufferMemory, as we are not only responding to the last user prompt but also using the LLM model to summarize.

So, you may wonder, why we do this, rather than using the BufferWindowMemory or simply the BufferMemory.

In the context of longer interactions, this analysis is particularly relevant. As the number of interactions grows over time, so does the history, leading to escalating costs as the prompt continues to expand. The summarization function of the LLM, however, presents a different scenario. Though the initial costs are higher due to the usage of summarization, it eventually results in substantially lower costs in the long run, even though two LLMs are being utilized. This approach ensures that the conversation can continue over an extended period without linearly increasing costs, making it a more efficient option for extended dialogues.

ConversationKnowledgeGraphMemory

Conversation Knowledge Graph Memory is an advanced type of memory system that partners with an external knowledge graph, allowing for the encapsulation and retrieval of intricate knowledge triples within a conversation. Utilizing the predictive capabilities of the Langchain Language Model (LLM), it identifies and extracts entities and knowledge triples, actively dissecting the details of the conversational content.

This specialized memory system is invaluable when the task involves the extraction, preservation, and recovery of ordered information from conversational content, translating it into a knowledge graph. Whether the purpose is analyzing data, grasping the context, or preserving information for subsequent use, Conversation Knowledge Graph Memory serves as an optimized and effective means of handling dialogue information. It stands as a vital bridge linking the fluidity of casual conversation with the rigidity of structured data management, acting as a robust instrument for those aspiring to maximize the capabilities of their conversational interfaces.

In order to use this with Python, you need to execute the following code.

llm = OpenAI(temperature=0)
from langchain.prompts.prompt import PromptTemplate
from langchain.chains import ConversationChain

template = """The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. 
If the AI does not know the answer to a question, it truthfully says it does not know. The AI ONLY uses information contained in the "Relevant Information" section and does not hallucinate.

Relevant Information:

{history}

Conversation:
Human: {input}
AI:"""
prompt = PromptTemplate(input_variables=["history", "input"], template=template)
conversation_with_kg = ConversationChain(
    llm=llm, verbose=True, prompt=prompt, memory=ConversationKGMemory(llm=llm)
)

Conclusion

In the ever-evolving world of AI and conversational technology, the implementation of conversational memory is a defining factor in creating engaging, human-like interactions.

Langchain provides a versatile library for managing conversational memory, offering different methods such as:

ConversationBufferMemory
ConversationBufferWindowMemory
ConversationSummaryBufferMemory
ConversationKnowledgeGraphMemory

→ each with unique attributes and applications.

The simplest forms, ConversationBufferMemory and ConversationBufferWindowMemory, focus on retaining recent interactions, with the latter limiting the history to a specific window of interactions. This caters to needs around minimizing costs while maintaining contextual understanding.

On the other hand, ConversationSummaryBufferMemory presents a strategic balance by summarizing conversation history, thereby preserving essential context without a linear increase in costs. This approach is valuable for longer conversations where the balance between retaining context and managing costs is crucial.

Lastly, the advanced Conversation Knowledge Graph Memory offers a method to meticulously structure conversation data within a network of interconnected information, bridging the free flow of conversation with the precision of structured data storage. It highlights the continuous efforts to bridge human-like engagement with computational efficiency.

Together, these tools empower developers to create chatbots that not only respond to queries but remember, analyze, and engage in meaningful dialogue. Whether for simple Q&A, extended interaction, or in-depth analysis, Langchain’s memory management techniques offer tailored solutions to harness the full potential of large language models like OpenAI’s GPT-3.5 and GPT-4.

AI in Finance is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber. Unlimited access to exclusive AI articles in Finance written by me + Q&A’s sessions if you want deeper insights!

AI in Finance | Christophe Atten | Substack

AI in Finance Decoded. Weekly insights that are transforming financial services. ✍️ State-of-the-art finance and generative AI…

christopheatten.substack.com