avatarChristophe Atten

Summary

The provided content discusses the enhancement of chatbots with memory capabilities using the Langchain library to create personalized and context-aware interactions with large language models like OpenAI's GPT-3.5 and GPT-4.

Abstract

The article delves into the significance of conversational memory in chatbots, emphasizing the importance of remembering past interactions to provide relevant and engaging responses. It introduces the Langchain library as a tool for managing memory in large language models, detailing various memory types such as ConversationBufferMemory, ConversationBufferWindowMemory, ConversationSummaryBufferMemory, and ConversationKnowledgeGraphMemory. Each memory type serves different purposes, from simply storing interaction history to summarizing conversations for cost efficiency and even constructing knowledge graphs for complex data structuring. The article also touches on the cost implications of using these memory systems, particularly with respect to the token-based pricing model of language models like GPT-3.5 and GPT-4.

Opinions

  • The author posits that a good chatbot should be able to engage in conversations as insightfully as a human by remembering past exchanges and understanding context.
  • The article suggests that memory management is crucial for chatbots to deliver personalized interactions, and it highlights the Langchain library as an effective solution for this purpose.
  • The author implies that while simpler memory systems like ConversationBufferMemory and ConversationBufferWindowMemory are useful, they may not be cost-effective for long conversations due to the token-based pricing model of language models.
  • The use of ConversationSummaryBufferMemory is presented as a strategic approach to balance context retention with cost management by summarizing conversation history.
  • The author expresses that ConversationKnowledgeGraphMemory is particularly valuable for extracting, preserving, and retrieving structured information from conversations, which is essential for complex interactions and data analysis.
  • The conclusion of the article underscores the importance of conversational memory in creating chatbots that can engage in meaningful dialogue, with Langchain's memory management techniques being key to leveraging the full potential of large language models.

Enhancing Chatbots with Memory: Promoting better customer experience

The power of memory management in chatbots and building personalized interactions using Langchain

Photo by Natasha Connell on Unsplash

What makes a good chatbot in your opinion?

A good chatbot is as engaging as an insightful human conversation. Its ability to remember past exchanges, draw context, and deliver relevant responses is pivotal.

But how does one enable such memory and contextual understanding in a machine, specifically in large language models like OpenAI’s GPT-3.5 and GPT-4?

The answer: Conversational memory refers to a chatbot’s ability to handle successive inquiries in a dialogue format. It fosters a seamless and meaningful interaction, since without this capability, every question would be processed as a completely separate input, ignoring any previous exchanges.

This article aims to explore how to expand and condense conversational memory in large language models using the Langchain library, demonstrating techniques with Python code snippets.

The Mechanics of Memory in AI

Artificial Intelligence memory is not akin to biological memory.

Instead, memory in language models, such as those offered by OpenAI, is a crucial component in understanding the context and the flow of conversation.

Langchain offers different types of memory, as by default LLM’s (Large Language Models) are stateless, meaning each request from the user (Prompt) is processed independently.

There are several options, all are built on top of the ConversationChain.

  1. ConversationBufferMemory
  2. ConversationBufferWindowMemory
  3. ConversationSummaryBufferMemory
  4. ConversationKnowledgeGraphMemory

What options do I have for Conversational Memory?

ConversationBufferMemory

The simplest form of conversational memory in LangChain is the ConversationBufferMemory. Basically what it enables is injecting the previous interactions between the human and AI and transferring it directly into the {history} parameter.

In Python you basically need only these few lines of code:

from langchain.memory import ConversationBufferMemory

conversation_buffer_memory = ConversationChain(
    llm=llm,
    memory=ConversationBufferMemory()
)

conversation_buffer_memory("My name is Christophe, can you be my AI Assistant?")

This will from now on store all the interactions within the buffer memory until the prompt will explode. Hence, the large language model will take into consideration the questions and answers from previous interactions to generate the most recent answer.

To extract the past interactions you can simply access the buffer with the following line of code:

print(conversation_buffer_memory.memory.buffer)

ConversationBufferWindowMemory

On a second spot, we have the buffer window memory from Langchain. This is quite similar to the first one but instead of taking the whole history into account it only takes the last X, example 5, interactions into consideration for generating the most recent response.

In order to be able to use the ConversationBufferWindowMemory, we can take the following code:

from langchain.memory import ConversationBufferWindowMemory

conversation_buffer_window_memory = ConversationChain(
 llm=llm,
 memory=ConversationBufferWindowMemory(k=3)
)

In this code, we need to indicate “k”, k represents the window, so the number of messages to be remembered before removing them from the history.

Why would we consider removing knowledge from the history, by specifying k to a relevant small number?

With models like OpenAI’s GPT-3.5 or GPT-4, fees are determined based on the size of the prompt, measured in tokens. The larger the prompt, the higher the fees. In an effort to minimize these costs, one strategy is to limit the historical interactions to only the most recent exchanges. However, this approach has its drawbacks. For instance, information from earlier in the conversation may be pertinent to a current question, so how can this issue be addressed while still reducing fees?

ConversationSummaryBufferMemory might be the next suitable solution in line for you. By utilizing this method, the conversation’s history can be summarized, preserving the essential information while also reducing the overall token count. This not only helps in retaining the context and relevance of the entire conversation but also plays a role in managing costs. It strikes a balance between limiting the prompt size and ensuring that valuable information from earlier interactions remains accessible.

ConversationSummaryBufferMemory

In order to avoid this excessive consumption of tokens, ConversationSummaryMemory can be implemented, as using ConversationBufferMemory, it is possible to rapidly consume a significant number of tokens, even surpassing the context window limit of today’s most advanced LLMs.

Hence, this type of memory condenses the history of the conversation before transmitting it to the {history} parameter, as implied by its name.

The ConversationChain is initiated with the summary memory using the following Python code. Do not forget to create a LLM as the ConversationSummaryMemory needs one as input to perform the summarization.

from langchain.memory import ConversationSummaryMemory

conversation_summary_buffer_memory = ConversationChain(
 llm=llm,
 memory=ConversationSummaryMemory(llm=llm)
)

This allows us to condense each new interaction and augment it into an ongoing summary of all prior interactions.

You may think that the token count for this type of conversation is higher than when utilizing ConversationBufferMemory, as we are not only responding to the last user prompt but also using the LLM model to summarize.

So, you may wonder, why we do this, rather than using the BufferWindowMemory or simply the BufferMemory.

In the context of longer interactions, this analysis is particularly relevant. As the number of interactions grows over time, so does the history, leading to escalating costs as the prompt continues to expand. The summarization function of the LLM, however, presents a different scenario. Though the initial costs are higher due to the usage of summarization, it eventually results in substantially lower costs in the long run, even though two LLMs are being utilized. This approach ensures that the conversation can continue over an extended period without linearly increasing costs, making it a more efficient option for extended dialogues.

ConversationKnowledgeGraphMemory

Conversation Knowledge Graph Memory is an advanced type of memory system that partners with an external knowledge graph, allowing for the encapsulation and retrieval of intricate knowledge triples within a conversation. Utilizing the predictive capabilities of the Langchain Language Model (LLM), it identifies and extracts entities and knowledge triples, actively dissecting the details of the conversational content.

This specialized memory system is invaluable when the task involves the extraction, preservation, and recovery of ordered information from conversational content, translating it into a knowledge graph. Whether the purpose is analyzing data, grasping the context, or preserving information for subsequent use, Conversation Knowledge Graph Memory serves as an optimized and effective means of handling dialogue information. It stands as a vital bridge linking the fluidity of casual conversation with the rigidity of structured data management, acting as a robust instrument for those aspiring to maximize the capabilities of their conversational interfaces.

In order to use this with Python, you need to execute the following code.

llm = OpenAI(temperature=0)
from langchain.prompts.prompt import PromptTemplate
from langchain.chains import ConversationChain

template = """The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. 
If the AI does not know the answer to a question, it truthfully says it does not know. The AI ONLY uses information contained in the "Relevant Information" section and does not hallucinate.

Relevant Information:

{history}

Conversation:
Human: {input}
AI:"""
prompt = PromptTemplate(input_variables=["history", "input"], template=template)
conversation_with_kg = ConversationChain(
    llm=llm, verbose=True, prompt=prompt, memory=ConversationKGMemory(llm=llm)
)

Conclusion

In the ever-evolving world of AI and conversational technology, the implementation of conversational memory is a defining factor in creating engaging, human-like interactions.

Langchain provides a versatile library for managing conversational memory, offering different methods such as:

  1. ConversationBufferMemory
  2. ConversationBufferWindowMemory
  3. ConversationSummaryBufferMemory
  4. ConversationKnowledgeGraphMemory

→ each with unique attributes and applications.

The simplest forms, ConversationBufferMemory and ConversationBufferWindowMemory, focus on retaining recent interactions, with the latter limiting the history to a specific window of interactions. This caters to needs around minimizing costs while maintaining contextual understanding.

On the other hand, ConversationSummaryBufferMemory presents a strategic balance by summarizing conversation history, thereby preserving essential context without a linear increase in costs. This approach is valuable for longer conversations where the balance between retaining context and managing costs is crucial.

Lastly, the advanced Conversation Knowledge Graph Memory offers a method to meticulously structure conversation data within a network of interconnected information, bridging the free flow of conversation with the precision of structured data storage. It highlights the continuous efforts to bridge human-like engagement with computational efficiency.

Together, these tools empower developers to create chatbots that not only respond to queries but remember, analyze, and engage in meaningful dialogue. Whether for simple Q&A, extended interaction, or in-depth analysis, Langchain’s memory management techniques offer tailored solutions to harness the full potential of large language models like OpenAI’s GPT-3.5 and GPT-4.

AI in Finance is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber. Unlimited access to exclusive AI articles in Finance written by me + Q&A’s sessions if you want deeper insights!

Data Science
Artificial Intelligence
Langchain
Gpt
Programming
Recommended from ReadMedium