avatarzhaozhiming

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

5924

Abstract

queried"</span>, <span class="hljs-string">"type"</span>: <span class="hljs-string">"str"</span>, <span class="hljs-string">"required"</span>: <span class="hljs-literal">True</span> } ] } ] <span class="hljs-comment"># Format 2</span> <span class="hljs-attr">tools</span> = [ { <span class="hljs-string">"name"</span>: <span class="hljs-string">"get_weather"</span>, <span class="hljs-string">"description"</span>: <span class="hljs-string">"Get the current weather for city_name"</span>, <span class="hljs-string">"parameters"</span>: { <span class="hljs-string">"type"</span>: <span class="hljs-string">"object"</span>, <span class="hljs-string">"properties"</span>: { <span class="hljs-string">"city_name"</span>: { <span class="hljs-string">"description"</span>: <span class="hljs-string">"The name of the city to be queried"</span> } }, <span class="hljs-string">"required"</span>: [<span class="hljs-string">"city_name"</span>] } } ]</pre></div><p id="9e54">Then, we initiate the API request call through Python code, requiring the installation of OpenAI’s Python library: <code>pip install openai</code>.</p><div id="c558"><pre><span class="hljs-keyword">import</span> openai

<span class="hljs-comment"># Setting OpenAI parameters</span> openai.api_base = <span class="hljs-string">"http://localhost:7861/v1"</span> openai.api_key = <span class="hljs-string">"xxx"</span> system_info = { <span class="hljs-string">"role"</span>: <span class="hljs-string">"system"</span>, <span class="hljs-string">"content"</span>: <span class="hljs-string">"Answer the following questions as best as you can. You have access to the following tools:"</span>, <span class="hljs-string">"tools"</span>: tools, } messages = [ system_info, { <span class="hljs-string">"role"</span>: <span class="hljs-string">"user"</span>, <span class="hljs-string">"content"</span>: <span class="hljs-string">"Help me check the weather in Beijing"</span>, } ] response = openai.ChatCompletion.create( model=<span class="hljs-string">"chatglm3"</span>, messages=messages, temperature=<span class="hljs-number">0</span>, return_function_call=<span class="hljs-literal">True</span> )</pre></div><ul><li>We switch the OpenAI API address to the local ChatGLM3 API address. Since it’s a local LLM call, there’s no need for an OpenAI api_key. A placeholder string will suffice.</li><li>We input the system role’s prompt information, including the tools parameter.</li><li>We input the user role’s information as usual, with the role and content parameters.</li><li>We initiate a ChatCompletion request. Note that the model should be set to <code>chatglm3</code>, and the <code>return_function_call</code> parameter should be true to enable tool invocation by the LLM.</li></ul><p id="dbe1">After sending the initial request, let’s look at how the tool is invoked:</p><div id="6193"><pre><span class="hljs-keyword">import</span> json <span class="hljs-keyword">from</span> tool_register <span class="hljs-keyword">import</span> dispatch_tool

function_call = json.loads(response.choices[<span class="hljs-number">0</span>].message.content) <span class="hljs-comment"># Returns information of get_weather tool</span> tool_response = dispatch_tool(function_call[<span class="hljs-string">"name"</span>], function_call[<span class="hljs-string">"parameters"</span>]) messages = response.choices[<span class="hljs-number">0</span>].history <span class="hljs-comment"># Retrieve conversation history</span> messages.append( { <span class="hljs-string">"role"</span>: <span class="hljs-string">"observation"</span>, <span class="hljs-string">"content"</span>: tool_response, <span class="hljs-comment"># Tool execution result</span> } ) response = openai.ChatCompletion.create( model=<span class="hljs-string">"chatglm3"</span>, messages=messages, temperature=<span class="hljs-number">0</span>, ) <span class="hljs-built_in">print</span>(response.choices[<span class="hljs-number">0</span>].message.content)</pre></div><ul><li>The LLM selects a tool from the toolset based on the user’s query, here <code>get_weather</code>.</li><li>The <code>dispatch_tool</code> method is used to execute the tool. There are many ways to implement the <code>dispatch_tool</code> method, and functional programming can conveniently realize this feature.</li><li>The tool execution results are added to the conversation history through the <code>observation</code> role, effectively returning the tool execution results to the LLM.</li><li>Another ChatCompletion request is initiated for the LLM to generate the final answer, which is then printed.</li></ul><p id="599d">This is how tools are invoked using the API. For more details, refer to the <a href="https://github.com/THUDM/ChatGLM3/tree/main/tool_using">official source code</a>.</p><h1 id="ccf8">Code Interpreter</h1><p id="ec03">By examining the sample code of the code interpreter, we find that its general flow is as follows: User poses a question -> LLM generates code -> Extracts the generated code -> Calls code execution tool -> Executes code using the tool (Jupyter) -> Extracts the execution result from (Jupyter) -> Returns the result to the user.</p><p id="9021">ChatGLM3 adds three more roles to the existing three (<code>system</code>, <code>user</code>, <code>assistant</code>): <code>observation</code>, <code>interpreter</code>, <code>tool</code>:</p><div id="90d6"><pre><span class="hljs-comment"># conversation.py</span> <span class="hljs-keyword">class</span> <span class="hljs-title class_">Role</span>(<span class="hljs-title class_ inherited__">Enum</span>): <span class="hljs-keywo

Options

rd">def</span> <span class="hljs-title function_">str</span>(<span class="hljs-params">self</span>): <span class="hljs-keyword">match</span> self: <span class="hljs-keyword">case</span> Role.SYSTEM: <span class="hljs-keyword">return</span> <span class="hljs-string">""</span> <span class="hljs-keyword">case</span> Role.USER: <span class="hljs-keyword">return</span> <span class="hljs-string">""</span> <span class="hljs-keyword">case</span> Role.ASSISTANT | Role.TOOL | Role.INTERPRETER: <span class="hljs-keyword">return</span> <span class="hljs-string">""</span> <span class="hljs-keyword">case</span> Role.OBSERVATION: <span class="hljs-keyword">return</span> <span class="hljs-string">""</span></pre></div><p id="1bd6">The <code>tool</code> role is for tool invocation, the <code>interpreter</code> role for code interpretation, and the <code>observation</code> role for observing various results, including LLM outputs, tool responses, and code interpreter execution results. Let's further explore how the code interpreter functionality is implemented:</p><div id="bc88"><pre><span class="hljs-keyword">case</span> <span class="hljs-string">''</span>: code = extract_code(output_text) <span class="hljs-built_in">print</span>(<span class="hljs-string">"Code:"</span>, code)

display_text = output_text.split(<span class="hljs-string">'interpreter'</span>)[-<span class="hljs-number">1</span>].strip()
append_conversation(Conversation(
    Role.INTERPRETER,
    postprocess_text(display_text),
), history, markdown_placeholder)
message_placeholder = placeholder.chat_message(name=<span class="hljs-string">"observation"</span>, avatar=<span class="hljs-string">"user"</span>)
markdown_placeholder = message_placeholder.empty()
output_text = <span class="hljs-string">''</span>
<span class="hljs-keyword">with</span> markdown_placeholder:
    <span class="hljs-keyword">with</span> st.spinner(<span class="hljs-string">'Executing code...'</span>):
        <span class="hljs-keyword">try</span>:
            res_type, res = execute(code, get_kernel())
        <span class="hljs-keyword">except</span> Exception <span class="hljs-keyword">as</span> e:
            st.error(<span class="hljs-string">f'Error when executing code: <span class="hljs-subst">{e}</span>'</span>)
            <span class="hljs-keyword">return</span>
<span class="hljs-built_in">print</span>(<span class="hljs-string">"Received:"</span>, res_type, res)
<span class="hljs-keyword">if</span> res_type == <span class="hljs-string">'text'</span> <span class="hljs-keyword">and</span> <span class="hljs-built_in">len</span>(res) &gt; TRUNCATE_LENGTH:
    res = res[:TRUNCATE_LENGTH] + <span class="hljs-string">' [TRUNCATED]'</span>
append_conversation(Conversation(
    Role.OBSERVATION,
    <span class="hljs-string">'[Image]'</span> <span class="hljs-keyword">if</span> res_type == <span class="hljs-string">'image'</span> <span class="hljs-keyword">else</span> postprocess_text(res),
    tool=<span class="hljs-literal">None</span>,
    image=res <span class="hljs-keyword">if</span> res_type == <span class="hljs-string">'image'</span> <span class="hljs-keyword">else</span> <span class="hljs-literal">None</span>,
), history, markdown_placeholder)
message_placeholder = placeholder.chat_message(name=<span class="hljs-string">"assistant"</span>, avatar=<span class="hljs-string">"assistant"</span>)
markdown_placeholder = message_placeholder.empty()
output_text = <span class="hljs-string">''</span>
<span class="hljs-keyword">break</span></pre></div><ul><li>Code is extracted from LLM output using the <code>extract_code</code> method, usually from markdown formatted documents.</li><li>An <code>interpreter</code> role conversation record is added to display the code on the page.</li><li>The code is executed, and its results are obtained. An <code>observation</code> role conversation record is added to return the execution result to the LLM. The LLM then generates the final answer based on the result and displays it on the page.</li></ul><p id="e264">Here’s the functionality for extracting code, where markdown code is parsed using regex:</p><div id="422b"><pre><span class="hljs-keyword">def</span> <span class="hljs-title function_">extract_code</span>(<span class="hljs-params">text: <span class="hljs-built_in">str</span></span>) -&gt; <span class="hljs-built_in">str</span>:
pattern = <span class="hljs-string">r'```([^\n]*)\n(.*?)```'</span>
matches = re.findall(pattern, text, re.DOTALL)
<span class="hljs-keyword">return</span> matches[-<span class="hljs-number">1</span>][<span class="hljs-number">1</span>]</pre></div><p id="d581">For more details, refer to the official <a href="https://github.com/THUDM/ChatGLM3/tree/main/composite_demo">composite demo source code</a>. If you encounter any issues during testing, you can troubleshoot based on the source code.</p><h1 id="e99a">Conclusion</h1><p id="00f7">Indeed, ChatGPT previously implemented functionalities such as tool invocation and code interpretation. However, since it’s closed-source, we couldn’t peek into its workings. ChatGLM3, built on open-source foundations, has made these functionalities accessible, allowing us a deeper understanding of the principles involved. We can also customize and develop further according to our needs, highlighting the charm of open-source. Due to limited research time, there may be oversights in this article. If you find any inaccuracies, please discuss them in the comments section.</p><p id="0b08">Follow me to learn about various artificial intelligence and AIGC technologies. Feel free to share your thoughts and questions in the comments section.</p></article></body>

Unveiling the Functionality of ChatGLM3–6B

In our last discussion, we introduced the deployment of ChatGLM3–6B. Although we have successfully deployed our large language model (LLM) and tested its new features, many questions arise about how these are implemented. Today, let’s delve into the specific functionalities of ChatGLM3–6B, including tool invocation and code interpretation.

Adding Custom Tools

The official documentation outlines the process for adding new tools to enhance the model’s capabilities:

New tools can be added by registering them in tool_registry.py. This can be done simply by using the @register_tool decorator. For tool declaration, the function name becomes the tool's name, and the function docstring serves as the tool's description. For tool parameters, use Annotated[typ: type, description: str, required: bool] to annotate the type, description, and whether they are mandatory.

Let’s try adding a web search tool using SerpApi. SerpApi is a web search API that facilitates various web searches, including Google, Baidu, Bing, etc. To use SerpApi, first, you need to register an account on their website and obtain an API Key from your account settings. This API Key is necessary for conducting web searches through SerpAPI. Next, install the SerpApi Python library: pip install google-search-results. Now, we can write the tool code.

We add a web_search tool in tool_registry.py:

@register_tool
def web_search(
    query: Annotated[str, 'The query text to be queried', True],
) -> str:
    """
    Search the result for input `query` from web
    """
    from serpapi import GoogleSearch

    search = GoogleSearch({
        "q": query,
        "gl": "cn",
        "location": "China",
        "output": "json",
        "api_key": "your serpapi api key"
      })
    try:
        query_result = search.get_dict()
        result = process_response(query_result)
        return result
    except:
        import traceback
        ret = "Error encountered while searching!\n" + traceback.format_exc()

In the code, we use the @register_tool tag to register our tool. The tool method parameters are annotated with Annotated. Then, we call the SerpApi method for web searching. The process_response method is used for parsing the search results, extracting, and returning the first search result content. For specific implementation, refer to LangChain's source code.

After adding the code, restart the WebUI service to test the new tool and observe its performance:

Using API Interface for Tool Invocation

Next, let’s explore how tools are invoked through the API interface. In the API request parameter messages, in addition to role and content, we now have metadata and tools. metadata is the specific tool name, and tools is a list of all available tools. Essentially, ChatGLM3 draws inspiration from ChatGPT's Function Calling feature. The metadata and tools parameters correspond to Function Calling's function_call and functions, respectively.

In the initial request, we need to pass the tools parameter to inform the LLM of the available tools. Each element in the tools parameter has several attributes:

  • name: Tool name
  • description: Tool description
  • parameters: Tool parameters, including the type, description, and whether they are mandatory (can be formatted in 2 ways, as shown below)
# Format 1
tools = [
    {
        "name": "get_weather",
        "description": "Get the current weather for `city_name`",
        "parameters": [
            {
                "name": "city_name",
                "description": "The name of the city to be queried",
                "type": "str",
                "required": True
            }
        ]
    }
]
# Format 2
tools = [
    {
        "name": "get_weather",
        "description": "Get the current weather for `city_name`",
        "parameters": {
            "type": "object",
            "properties": {
                "city_name": {
                    "description": "The name of the city to be queried"
                }
            },
            "required": ["city_name"]
        }
    }
]

Then, we initiate the API request call through Python code, requiring the installation of OpenAI’s Python library: pip install openai.

import openai

# Setting OpenAI parameters
openai.api_base = "http://localhost:7861/v1"
openai.api_key = "xxx"
system_info = {
    "role": "system",
    "content": "Answer the following questions as best as you can. You have access to the following tools:",
    "tools": tools,
}
messages = [
    system_info,
    {
        "role": "user",
        "content": "Help me check the weather in Beijing",
    }
]
response = openai.ChatCompletion.create(
    model="chatglm3",
    messages=messages,
    temperature=0,
    return_function_call=True
)
  • We switch the OpenAI API address to the local ChatGLM3 API address. Since it’s a local LLM call, there’s no need for an OpenAI api_key. A placeholder string will suffice.
  • We input the system role’s prompt information, including the tools parameter.
  • We input the user role’s information as usual, with the role and content parameters.
  • We initiate a ChatCompletion request. Note that the model should be set to chatglm3, and the return_function_call parameter should be true to enable tool invocation by the LLM.

After sending the initial request, let’s look at how the tool is invoked:

import json
from tool_register import dispatch_tool

function_call = json.loads(response.choices[0].message.content) # Returns information of `get_weather` tool
tool_response = dispatch_tool(function_call["name"], function_call["parameters"])
messages = response.choices[0].history  # Retrieve conversation history
messages.append(
    {
        "role": "observation",
        "content": tool_response,  # Tool execution result
    }
)
response =
 openai.ChatCompletion.create(
    model="chatglm3",
    messages=messages,
    temperature=0,
)
print(response.choices[0].message.content)
  • The LLM selects a tool from the toolset based on the user’s query, here get_weather.
  • The dispatch_tool method is used to execute the tool. There are many ways to implement the dispatch_tool method, and functional programming can conveniently realize this feature.
  • The tool execution results are added to the conversation history through the observation role, effectively returning the tool execution results to the LLM.
  • Another ChatCompletion request is initiated for the LLM to generate the final answer, which is then printed.

This is how tools are invoked using the API. For more details, refer to the official source code.

Code Interpreter

By examining the sample code of the code interpreter, we find that its general flow is as follows: User poses a question -> LLM generates code -> Extracts the generated code -> Calls code execution tool -> Executes code using the tool (Jupyter) -> Extracts the execution result from (Jupyter) -> Returns the result to the user.

ChatGLM3 adds three more roles to the existing three (system, user, assistant): observation, interpreter, tool:

# conversation.py
class Role(Enum):
    def __str__(self):
        match self:
            case Role.SYSTEM:
                return ""
            case Role.USER:
                return ""
            case Role.ASSISTANT | Role.TOOL | Role.INTERPRETER:
                return ""
            case Role.OBSERVATION:
                return ""

The tool role is for tool invocation, the interpreter role for code interpretation, and the observation role for observing various results, including LLM outputs, tool responses, and code interpreter execution results. Let's further explore how the code interpreter functionality is implemented:

case '':
    code = extract_code(output_text)
    print("Code:", code)

    display_text = output_text.split('interpreter')[-1].strip()
    append_conversation(Conversation(
        Role.INTERPRETER,
        postprocess_text(display_text),
    ), history, markdown_placeholder)
    message_placeholder = placeholder.chat_message(name="observation", avatar="user")
    markdown_placeholder = message_placeholder.empty()
    output_text = ''
    with markdown_placeholder:
        with st.spinner('Executing code...'):
            try:
                res_type, res = execute(code, get_kernel())
            except Exception as e:
                st.error(f'Error when executing code: {e}')
                return
    print("Received:", res_type, res)
    if res_type == 'text' and len(res) > TRUNCATE_LENGTH:
        res = res[:TRUNCATE_LENGTH] + ' [TRUNCATED]'
    append_conversation(Conversation(
        Role.OBSERVATION,
        '[Image]' if res_type == 'image' else postprocess_text(res),
        tool=None,
        image=res if res_type == 'image' else None,
    ), history, markdown_placeholder)
    message_placeholder = placeholder.chat_message(name="assistant", avatar="assistant")
    markdown_placeholder = message_placeholder.empty()
    output_text = ''
    break
  • Code is extracted from LLM output using the extract_code method, usually from markdown formatted documents.
  • An interpreter role conversation record is added to display the code on the page.
  • The code is executed, and its results are obtained. An observation role conversation record is added to return the execution result to the LLM. The LLM then generates the final answer based on the result and displays it on the page.

Here’s the functionality for extracting code, where markdown code is parsed using regex:

def extract_code(text: str) -> str:
    pattern = r'```([^\n]*)\n(.*?)```'
    matches = re.findall(pattern, text, re.DOTALL)
    return matches[-1][1]

For more details, refer to the official composite demo source code. If you encounter any issues during testing, you can troubleshoot based on the source code.

Conclusion

Indeed, ChatGPT previously implemented functionalities such as tool invocation and code interpretation. However, since it’s closed-source, we couldn’t peek into its workings. ChatGLM3, built on open-source foundations, has made these functionalities accessible, allowing us a deeper understanding of the principles involved. We can also customize and develop further according to our needs, highlighting the charm of open-source. Due to limited research time, there may be oversights in this article. If you find any inaccuracies, please discuss them in the comments section.

Follow me to learn about various artificial intelligence and AIGC technologies. Feel free to share your thoughts and questions in the comments section.

Llm
Chatglm3
Recommended from ReadMedium