avatarValentina Alto

Summary

The provided content discusses the implementation of LangChain Agents using Azure OpenAI and Python, which enables Large Language Models (LLMs) to interact with external tools and databases to retrieve and act on information beyond their training data.

Abstract

The article "Introducing LangChain Agents" explores the integration of LLMs with a suite of tools, such as search engines and databases, to create agents capable of reasoning and executing actions to fulfill user requests. It highlights the limitations of standalone LLMs, such as GPT-3.5-turbo, in accessing up-to-date information and introduces the concept of agents to overcome this. The implementation uses LangChain, a lightweight SDK that simplifies the integration of LLMs into applications. The article details two types of agents: Action Agents, which perform tasks step by step, and Plan and Execute Agents, which plan actions ahead of time. It also references the ReAct prompt engineering technique and the Plan-and-Solve prompting approach, emphasizing the importance of these methods in enhancing the capabilities of LLMs. The article provides a practical demonstration of building a simple Action Agent using Azure OpenAI chat models and LangChain, showcasing how the agent can use tools like the Bing Search API to find information that is not within the LLM's knowledge base, such as the ending of the movie "Avatar: The Way of Water." The agent's ability to reason and act is demonstrated through a detailed example, where it successfully retrieves the requested information and provides a final answer. The article concludes by suggesting that the integration of agents with LLMs opens up new possibilities for innovative applications and use cases, as it allows LLMs to act upon the world in a more dynamic and informed manner.

Opinions

  • The author believes that the current state of LLMs is limited by their static knowledge base and that the integration of agents is a necessary step to enhance their utility.
  • There is an emphasis on the potential of agents to perform complex tasks by planning and executing a series of actions, as seen in the Plan-and-Solve prompting approach.
  • The author positively views the LangChain SDK as a powerful tool for integrating LLMs with external tools, simplifying the development process for creating agents.
  • The use of the Bing Search API as a tool for agents is presented as a practical solution for accessing current information, indicating a favorable opinion towards this Microsoft service.
  • The article suggests that the combination of reasoning and acting capabilities in LLMs, facilitated by agents, is a significant advancement in the field of AI.
  • The author implies that the future of AI applications will be shaped by the ability of LLMs to interact with the environment through the use of agents, highlighting excitement for the possibilities this technology enables.

Introducing LangChain Agents

An implementation with Azure OpenAI and Python

Large Language Models (LLMs) like GPT-3.5-turbo (the model behind ChatGPT) and GPT-4 have been proving their generative power in last few months.

However, as they are today they suffer from a limitation: they are limited to the knowledge on which they have been trained and the additional knowledge provided as context; as a result, if a useful piece of information is missing the provided knowledge, the model cannot “go around” and try to find it in other sources.

This is the reason why we need to introduce the concept of Agents. Agents can be seen as applications powered by LLMs and integrated with a set of tools like search engines, databases, websites, and so on. Within an agent, the LLM is the reasoning engine that, based on the user input, is able to plan and execute a set of actions that are needed to fulfill the request.

The concept of reasoning and acting is also the basis of a new prompt engineering technique called ReAct (Reason and Act), introduced by Yao et al (you can read the full paper here)

In this article, we are going to see an implementation of an Agent powered by Azure OpenAI chat models. To do so, we will use LangChain, a powerful lightweight SDK which makes it easier to integrate and manage LLMs within applications.

Building your first Agent

According to LangChain’s documentation, there are two main types of Agents we can build, which also corresponds to two different types of prompt engineering techniques:

  • Action Agent →this type of agent decide an action and then executes it one step at a time. For example, given user input, the agent will reason that it will probably find the required information within Wikipedia, hence it will search for that site; then, if the information required is not found in the parsed site, it will decide to parse another site and so on, until it finds the required information to generate a response. This approach of retrieving documents, reading, and deciding whether retrieve more documents is called Retrieve-Read-Retrieve;
  • Plan and Execute Agent → this type of agent follows a more sophisticated approach that is useful for complex tasks. The approach is that of planning ahead the required information and actions needed to accomplish user input. This approach is known as Plan-and-Solve prompting and it has been introduced by Wang et al (you can read the paper here).

Let’s start with a simple Action Agent.

As first thing, I will need to import all the necessary documentation. The code mainly comes from the demo notebook published by LangChain available here.

from langchain.agents import Tool, AgentExecutor, LLMSingleActionAgent, AgentOutputParser
from langchain.prompts import StringPromptTemplate
from langchain.chat_models import AzureChatOpenAI
from langchain import LLMChain
from langchain.utilities import BingSearchAPIWrapper
from typing import List, Union
from langchain.schema import AgentAction, AgentFinish
import re

Then, we need to initialize the main components of the Agent as follows:

  • Tools. Tools are the actions that the agent can take. In our case, we will give our Agent the possibility to navigate the web via the Bing API available in Azure (you can create one for free here. Once created the resource, you will need to set your API and URL as environmental variables or passing them as parameters when calling the module) using the LangChain wrapper BingSearchAPIWrapper.
os.environ["BING_SEARCH_URL"] = "xxx"
os.environ["BING_SUBSCRIPTION_KEY"] = "xxx"

search = BingSearchAPIWrapper()

tools = [
    Tool(
        name = "Search",
        func=search.run,
        description="useful for when you need to answer questions about current events"
    )
]
  • Prompt template. It takes the user’s input and put it into a more useful format depending on our goal. In this case, we want the agent to perform several times a reasoning procedure of the type: process the question, think about what action to take, take that action, reason about the output of the action, evaluate whether you have the answer or need to repeat the cycle.
# Set up the base template
template = """Answer the following questions as best you can, but speaking as a pirate might speak. You have access to the following tools:

{tools}

Use the following format:

Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [{tool_names}]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question

Begin! Remember to speak as a pirate when giving your final answer. Use lots of "Arg"s

Question: {input}
{agent_scratchpad}"""

# Set up a prompt template
class CustomPromptTemplate(StringPromptTemplate):
    # The template to use
    template: str
    # The list of tools available
    tools: List[Tool]
    
    def format(self, **kwargs) -> str:
        # Get the intermediate steps (AgentAction, Observation tuples)
        # Format them in a particular way
        intermediate_steps = kwargs.pop("intermediate_steps")
        thoughts = ""
        for action, observation in intermediate_steps:
            thoughts += action.log
            thoughts += f"\nObservation: {observation}\nThought: "
        # Set the agent_scratchpad variable to that value
        kwargs["agent_scratchpad"] = thoughts
        # Create a tools variable from the list of tools provided
        kwargs["tools"] = "\n".join([f"{tool.name}: {tool.description}" for tool in self.tools])
        # Create a list of tool names for the tools provided
        kwargs["tool_names"] = ", ".join([tool.name for tool in self.tools])
        return self.template.format(**kwargs)
  • Large Language Models. Here we need to set up the reasoning engine of our Agent. In our case, we will use a gpt-3.5-turbo model available in the Azure OpenAI service (to setup an Azure OpenAI instance, you can read my former article here).
os.environ["OPENAI_API_TYPE"] = "azure"
os.environ["OPENAI_API_BASE"] = "xxx"
os.environ["OPENAI_API_KEY"] = "xxx"

llm = AzureChatOpenAI(deployment_name="gpt-35-turbo", openai_api_version="2023-03-15-preview")
  • Output parser. This component will parse the output of our LLM into either an AgentAction or an AgentFinish classes. In case the agent returns the final answer, the cycle of the agent will terminate and the output will be parsed into an AgentFinish; otherwise, the output will be parsed into an AgentAction that returns an action (the tool to use) and action_input (the input given to that tool), so that the cycle can continue until we reach the final answer.
class CustomOutputParser(AgentOutputParser):
    
    def parse(self, llm_output: str) -> Union[AgentAction, AgentFinish]:
        # Check if agent should finish
        if "Final Answer:" in llm_output:
            return AgentFinish(
                # Return values is generally always a dictionary with a single `output` key
                # It is not recommended to try anything else at the moment :)
                return_values={"output": llm_output.split("Final Answer:")[-1].strip()},
                log=llm_output,
            )
        # Parse out the action and action input
        regex = r"Action\s*\d*\s*:(.*?)\nAction\s*\d*\s*Input\s*\d*\s*:[\s]*(.*)"
        match = re.search(regex, llm_output, re.DOTALL)
        if not match:
            raise ValueError(f"Could not parse LLM output: `{llm_output}`")
        action = match.group(1).strip()
        action_input = match.group(2)
        # Return the action and action input
        return AgentAction(tool=action, tool_input=action_input.strip(" ").strip('"'), log=llm_output)

output_parser = CustomOutputParser()
  • Agent. It is the “wrapper” of everything mentioned above. It is the application with the logic of an LLM and the capabilities of “moving around and doing things” with the tools provided.
# LLM chain consisting of the LLM and a prompt
llm_chain = LLMChain(llm=llm, prompt=prompt)
tool_names = [tool.name for tool in tools]
agent = LLMSingleActionAgent(
    llm_chain=llm_chain, 
    output_parser=output_parser,
    stop=["\nObservation:"], 
    allowed_tools=tool_names
)
  • Agent executor. Finally, we move towards the Agent Executor class, that calls the agent and tools in a loop until a final answer is provided.
agent_executor = AgentExecutor.from_agent_and_tools(agent=agent, tools=tools, verbose=True)
agent_executor.run("generate a short blog post to review the plot of the movie Avatar 2. It should be a positive review.")

As you can see, I asked my Agent to tell me the final scene of Avatar: The Way of Water (spoiler alert: if you didn’t watch the movie, stop reading!). It i dated December 2022, so not included in the knowledge base of ChatGPT and gpt-3.5-turbo. Let’s see the reasoning and output:

> Entering new AgentExecutor chain...
Thought: I am not sure about the ending scene of Avatar 2 as it is not released yet.
Action: Search
Action Input: "Avatar 2 ending scene"

Observation:<b>Avatar</b>: The Way of Water <b>Ending</b> Explained After a years-long absence, “The Sky People” (aka Earthlings) return to Pandora with full military might to plunder its natural resources for their own... The warring culminates in a final battle where Jake Sully and the cloned Colonel Miles Quaritch, along with Neytiri and Tuk, are trapped on a sinking marine hunting vessel. Sully and Quaritch <b>end</b>... How does Avatar 2 end? Sam Worthington in<b> Avatar: The Way of Water</b> (Image credit: 20th Century Studios) The fight between the Sullys and Quaritch takes place on the water, with a tulkun that has bonded with Lo&#39;ak helping them to level the playing field. “All energy is borrowed,” said the legend in <b>Avatar</b>. The movie ends with Neteyam’s funeral, with the golden tentacles of the seabed gently wrapping around him and embracing his energy for the... Watch the Official Trailer of<b> AVATAR 2: The Way Of Water</b> – only on theaters Dec 16th: https://www.youtube.com/watch?v=d9MyW... About AVATAR: On the lush alien world of Pandora live the Na&#39;vi ... The final <b>scene</b> of Titanic is either a dream or (more likely) a lovely afterlife in which Rose’s soul joins all those who died on that cold April night in 1912. At the bottom of the sea, the... 662K views <b>2</b> months ago <b>Avatar</b> The Way Of Water <b>Ending</b>, <b>End</b> Credit <b>Scene</b> Explained. <b>Avatar</b> 3 Teaser. Quaritch <b>Ending</b>, Jake Sully, Kiri Powers Explained &amp; <b>Avatar</b> 3 Trailer... James Cameron&#39;s visually spectacular <b>Avatar</b> <b>2</b> is 2022&#39;s most successful movie, ... &#39;<b>Avatar</b>: The Way of Water&#39; <b>Ending</b> Explained, ... it doesn&#39;t have a mid- or post-credits <b>scene</b>. Sadly, there’s no <b>end</b>-credits <b>scene</b> in <b>Avatar</b> <b>2</b>, but there’s a compelling reason for that. In this article, we&#39;ll discuss why an <b>Avatar</b> <b>2</b> <b>end</b>-credit <b>scene</b> doesn&#39;t fit in the vision...Based on my search, the ending scene of Avatar 2 involves Neteyam's funeral and the golden tentacles of the seabed embracing his energy. There is no end-credits scene in Avatar 2.
Final Answer: The ending scene of Avatar 2 involves Neteyam's funeral and the golden tentacles of the seabed embracing his energy. There is no end-credits scene in Avatar 2.

> Finished chain.

"The ending scene of Avatar 2 involves Neteyam's funeral and the golden tentacles of the seabed embracing his energy. There is no end-credits scene in Avatar 2."

As you can see, the first thought is that our LLM has no embedded knowledge about Avatar 2, as it was not available back in 2021 as knowledge base. So it plans an action to find it on the web and then produces the correct answer.

Conclusions

If the knowledge cutoff in 2021 of ChatGPT was a problem, with Agents we are entering a new phase of LLMs. Those are getting extremely useful thanks to not only their integrability with external tools and sources, but also their ability to reason step by step to solve complex tasks. Note also that we limited this article only to one tool, yet you can provide a whole set of tools you can find here. For example, we might have also integrated our Agent to our local machine with the File System tool and saved the output as a txt file in our working directory.

With Agents, we are adding the “acting” capability to Large Language Models, paving the way to a new wave of use cases and innovative applications.

References

Langchain
Large Language Models
Azureopenai
Agents
ChatGPT
Recommended from ReadMedium