Parallel Function Calling in OpenAI’s Assistants API

A complete guide to enhancing your Assistant’s capabilities by defining custom tools

🙋‍♂️This is a deep dive into how Function Calling works in OpenAI's Assistants 
API. Readers will walk away with a complete understanding of how they can 
implement this in their own applications.

It is strongly recommended that you have a good understanding of the Assistants
API. I have covered this in a separate tutorial.

Please feel free to skip to any section 🙏🏾

∘ Why Do We Even Need Function Calling? ∘ Traditional Function Calling ∘ Creating Custom Tools with Function Calling ∘ Parallel Function Calling ∘ Conclusion ∘ References

Why Do We Even Need Function Calling?

There’s more OpenAI’s Assistants API than meets the eye. You can built really powerful and impressive applications by diving deeper into what’s on offer.

OpenAI’s Assistants API comes packed with prebuilt tools that make it easier than ever for developers to build powerful AI applications.

At the moment, you have the option to use the Retrieval tool as well as a Code Interpreter, which are hosted by OpenAI so you don’t have to worry about implementing them yourself.

While we can already build really cool applications on top of these tools, we very quickly run into the need to further the Assistants capabilities.

Suppose you want to build a Car Assistant 🚗, you will quickly realise that you need custom tools.

The Assistants API has you covered here too. We can define our own custom tools by using a feature known as Function Calling.

Function calling allows you to describe functions to the Assistants and have it intelligently return the functions that need to be called along with their arguments. The Assistants API will pause execution during a Run when it invokes functions, and you can supply the results of the function call back to continue the Run execution.

Traditional Function Calling

Actually, Function Calling isn’t new, it was first released in the Chat Completions API to enable developers to extract structured data from text[2].

For example, if we want to extract specific data from a user’s input, say student information, then we can define a custom function detailing what data we want from a user’s input[1].

student_custom_functions = [
    {
        'name': 'extract_student_info',
        'description': 'Get the student information from the body of the input text',
        'parameters': {
            'type': 'object',
            'properties': {
                'name': {
                    'type': 'string',
                    'description': 'Name of the person'
                },
                'major': {
                    'type': 'string',
                    'description': 'Major subject.'
                },
                'school': {
                    'type': 'string',
                    'description': 'The university name.'
                },
                'grades': {
                    'type': 'integer',
                    'description': 'GPA of the student.'
                },
                'club': {
                    'type': 'string',
                    'description': 'School club for extracurricular activities. '
                }
                
            }
        }
    }
]

Then if we define an example input from a user;

student_1_description = "David Nguyen is a sophomore majoring in computer science at Stanford University. He is Asian American and has a 3.8 GPA. David is known for his programming skills and is an active member of the university's Robotics Club. He hopes to pursue a career in artificial intelligence after graduating."

Our model can return a JSON of the information we asked for.

{'name': 'David Nguyen', 'major': 'computer science', 'school': 'Stanford University', 'grades': 3.8, 'club': 'Robotics Club'}

Creating Custom Tools with Function Calling

In the case of the Assistants API, Function Calling plays an even bigger role than just extracting structured data.

The fact that we can reliably extract data from a user’s query means we can define our own custom functions (I refer to these as tools).

Note: If you're following along with code, please run the code in the
sequence it appears. Also note the provide code was ran in a jupyter
notebook, so every section represents a different cell of the notebook

from openai import OpenAI
import json

def show_json(obj):
    display(json.loads(obj.model_dump_json()))


#Custom tools

# Example dummy function hard coded to return the same weather
# In production, this could be your backend API or an external API
def get_current_weather(location, unit="fahrenheit"):
    """Get the current weather in a given location"""
    if "adelaide" in location.lower():
        return json.dumps({"location": "Adelaide", "temperature": "10", "unit": unit})
    elif "san francisco" in location.lower():
        return json.dumps({"location": "San Francisco", "temperature": "72", "unit": unit})
    elif "paris" in location.lower():
        return json.dumps({"location": "Paris", "temperature": "22", "unit": unit})
    else:
        return json.dumps({"location": location, "temperature": "unknown"})

# Example dummy function to play song requested by user
def play_song(song):
        """Play a song"""
        return json.dumps({"Now playing": song})
    
# Example dummy function to set volume
def set_audio_volume(volume):
        """Set the volume"""
        return json.dumps({"Volume set to": volume})

We then provide these functions in JSON format so that the Assistant knows what custom tools we have available in our backend. Note that, below we’re including the prebuilt tools — Retrieval and Code Interpreter.

tools=[
    {"type": "code_interpreter"},
    {"type": "retrieval"},
    {
      "type": "function",
    "function": {
      "name": "get_current_weather",
      "description": "Get the weather in location",
      "parameters": {
        "type": "object",
        "properties": {
          "location": {"type": "string", "description": "The city and state e.g. San Francisco, CA"},
          "unit": {"type": "string", "enum": ["c", "f"]}
        },
        "required": ["location"]
      }
    }
  },
    {
        "type": "function",
        "function": {
        "name": "play_song",
        "description": "Play a song",
        "parameters": {
            "type": "object",
            "properties": {
            "song": {"type": "string", "description": "The song to play"}
            },
            "required": ["song"]
            }
        }
    },
    {
        "type": "function",
        "function": {
        "name": "set_audio_volume",
        "description": "Set the volume",
        "parameters": {
            "type": "object",
            "properties": {
            "volume": {"type": "string", "description": "The volume to set"}
            },
            "required": ["volume"]
            }
        }
    }
  ]

Once we’ve defined our custom tools and defined a list of tools we need, we can define our assistant that will be aware of what custom tools we have available.

client = OpenAI() # Make you have your API key set in the OPENAI_API_KEY environment variable

assistant = client.beta.assistants.create(
    name="Car Assistant 🚘",
    instructions="You are a helpful in-car assistant. Please call the appropriate function based on the user's request.",
    model="gpt-4-1106-preview",
    tools=tools,
)
show_json(assistant)

If you had over to your Assistants Playground, you should be able to see your Assistants. Because the Assistants comes packed with the Retrieval and Code Interpreter tools, you can test these out in the playground.

If you try to ask like “play Thriller” in the Playground, the Playground will simply pretend to call the start_music() custom tool.

Because it doesn’t have access to our backend code, there’s no way of actually running the custom tool in the Playground.

But we can achieve that in our code base, so let’s head back to our Jupyter notebook to make this a reality.

We’ll first define a set of helper functions to simplify our code further down the track[2].

import time

WEATHER_ASSISTANT_ID = assistant.id

def submit_message(assistant_id, thread, user_message):
    client.beta.threads.messages.create(
        thread_id=thread.id, role="user", content=user_message
    )
    return client.beta.threads.runs.create(
        thread_id=thread.id,
        assistant_id=assistant_id,
    )

def create_thread_and_run(user_input):
    thread = client.beta.threads.create()
    run = submit_message(WEATHER_ASSISTANT_ID, thread, user_input)
    return thread, run


def get_response(thread):
    return client.beta.threads.messages.list(thread_id=thread.id, order="asc")


def wait_on_run(run, thread):
    while run.status == "queued" or run.status == "in_progress":
        run = client.beta.threads.runs.retrieve(
            thread_id=thread.id,
            run_id=run.id,
        )
        time.sleep(0.5)
    return run

# Pretty printing helper
def pretty_print(messages):
    print("# Messages")
    for m in messages:
        print(f"{m.role}: {m.content[0].text.value}")
    print()

Now here’s the important part; because we have submitted tools of type function, whenever a user’s input comes in, the Assistant will decide if an external tool is required to answer the user’s query.

For example, if we ask the assistant to “please play Thriller”, then it will identify that the prebuilt Retrieval and Code Interpreter are not sufficient to answer that query.

When this is the case, the Assistant will inform us that there’s an action we need to execute on our end.

thread, run = create_thread_and_run(
    "Play Thriller"
)
run = wait_on_run(run, thread)
run.status # prints 'action_required`

We can inspect what we get from the Run.

show_json(run)

This prints a long json output, but the important part looks like this:

'required_action': {'submit_tool_outputs': {'tool_calls': [{'id': 'call_gvpztu1qHKPaRmnRXr4ibK65',
     'function': {'arguments': '{"song":"Thriller"}', 'name': 'play_song'},
     'type': 'function'}]},
  'type': 'submit_tool_outputs'},
 'started_at': 1703830204,
 'status': 'requires_action',

The action here is referring to a tool or a set of tools that the Assistant will need to access in order to answer the user’s query.

# Extract single tool call
tool_call = run.required_action.submit_tool_outputs.tool_calls[0]
name = tool_call.function.name
arguments = json.loads(tool_call.function.arguments)

print("Function Name:", name)
print("Function Arguments:")
arguments

# Output 👇
Function Name: play_song
Function Arguments:
{'song': 'Thriller'}

Now that we know what tool the Assistant wants us to invoke on our end, we simply call it then we submit out tool output back to the Assistant.

response = play_song(arguments["song"])
print("Response:", response)

#Prints
#Response: {"Now playing": "Thriller"}

run = client.beta.threads.runs.submit_tool_outputs(
    thread_id=thread.id,
    run_id=run.id,
    tool_outputs=[
        {
            "tool_call_id": tool_call.id,
            "output": json.dumps(response),
        }
    ],
)
show_json(run)

{Output not shown to improve readability 🙏🏾}

And now that the Assistant know what our custom tool returned, it will provide us an answer with the appropriate answer.

run = wait_on_run(run, thread)
pretty_print(get_response(thread))

# Messages
user: Play Thriller
assistant: "Now playing: Thriller."

Before we move on to the next section, let’s recap everything we’ve learned in a nice diagram.

An overview of how Function Calling works in the Assistants API

Parallel Function Calling

So far, we seen that the Assistant can handle situations that only require one custom tool.

What if our user’s query requires multiple custom tools to be called at the same time?

OpenAI have solved this with a feature called Parallel Function Calling, which the model can call multiple functions at a time.

For example, the user might ask something “ hey, play Thriller and turn the volume to 20”.

Before parallel function calling, we would have to make two different calls calls to the underlying model, which of course resulted in extra costs as well as latency.

Before and after Parallel Function Calling

With parallel function calling, such a use case can be handled with one single call.

def run_conversation(user_input):
    # create and run thread with user input
    thread, run = create_thread_and_run(user_input)
    run = wait_on_run(run, thread)

    # If run status requires action, extract tool call and run it, else print response
    if run.status == "requires_action":
        tool_calls = run.required_action.submit_tool_outputs.tool_calls
        # collect the outputs of the tool calls before submitting them to the run.
        tool_outputs = []
        # Loop over tool calls
        for tool_call in tool_calls:
            TOOL_ID = tool_call.id
            # Get name of the tool
            tool_name = tool_call.function.name
            # Get the parameters of the tool call
            tool_args = json.loads(tool_call.function.arguments)
            # Note: the JSON response may not always be valid; be sure to handle errors
            available_functions = {
                "get_current_weather": get_current_weather,
                "play_song": play_song,
                "set_audio_volume": set_audio_volume,} 
            
            # Function to call
            function_to_call = available_functions[tool_name]
            response = function_to_call(**tool_args)
            # Add the tool output to the list of tool outputs
            tool_outputs.append(
                {
                    "tool_call_id": TOOL_ID,
                    "output": json.dumps(response),
                }
            )

        # All outputs must be submitted in a single request.
        # So make sure this if statement is outside the for loop
        if tool_outputs:
            # Submit the response to the run
            run = client.beta.threads.runs.submit_tool_outputs(
                    thread_id=thread.id,
                    run_id=run.id,
                    tool_outputs=tool_outputs,
                )
            run = wait_on_run(run, thread)
            pretty_print(get_response(thread))
        else:
            print("No tool outputs to submit")

    else:
        print("No action required.")
        pretty_print(get_response(thread))

Disclaimer: The explaination below was generated by an AI🤖. I personally
proof read it to make sure it was correct. I used the
folllowing prompt in GitHub Copilot:
"""
Please explain what this function is doing.
"""

The run_conversation function is responsible for running a conversation with a user and interacting with various tools based on the user's input. Let's break down the function step by step:

The function takes a user_input parameter as input.
It calls the create_thread_and_run function, passing the user_input to it. This function creates a new thread and submits the user's input as a message in that thread. It returns the created thread and the initial run object.
The function then calls the wait_on_run function, passing the initial run object and the thread. This function continuously retrieves the status of the run until it is no longer in the "queued" or "in_progress" state. It uses a while loop and sleeps for a short duration between each retrieval to avoid excessive API requests.
After the run status is no longer “queued” or “in_progress”, the function checks if the run status requires any action. If it does, it means that there are tool calls that need to be executed based on the user’s input.
If there are tool calls to be executed, the function retrieves the tool calls from the run object. It then iterates over each tool call.
For each tool call, the function extracts the tool ID, tool name, and tool arguments. It uses these values to determine which function to call from a dictionary of available functions.
The selected function is called with the tool arguments, and the response is stored.
The function appends the tool call ID and the response to a list of tool outputs.
After iterating over all the tool calls, the function checks if there are any tool outputs to submit. If there are, it calls the submit_tool_outputs function, passing the thread ID, run ID, and the list of tool outputs. This function submits all the tool outputs in a single request.
The function then calls the wait_on_run function again to wait for the run to complete after submitting the tool outputs.
Finally, the function calls the pretty_print function, passing the response

And with that you should be able to test the function with a query that would require the Assistant access two custom tools.

run_conversation("Play Thriller and turn the volume to 60")

# Messages
user: Play Thriller and turn the volume to 60
assistant: "Thriller" is now playing at volume 60.

Or just testing for one tool call.

run_conversation("What's the weather in Adelaide?")

# Messages
user: What's the weather in Adelaide?
assistant: The current temperature in Adelaide is 10 degrees Fahrenheit.

Conclusion

We have covered a lot of material in this tutorial, congrats on making it this far 👏🏾. I hope you now well equipped to apply this powerful feature to your Assistants. If you have any feedback, I would really appreciate it if you could leave a comment.

Find the full code here.

References

[1]https://www.datacamp.com/tutorial/open-ai-function-calling-tutorial [2] OpenAI cookbook: Assistants API Overview [3] Chat Completions API [4] OpenAI Dev Day 2023: New products deep dive [5] What is JSON? by Oracle