avatarYeyu Huang

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

5874

Abstract

ur request will be directed to OpenAI.</p><figure id="b586"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*7_DIcBgjasKTigGoe-UN2A.png"><figcaption>Use the correct name from the model page</figcaption></figure><p id="0b21">Then define the LLM_config for agents.</p><div id="fb95"><pre>import autogen

llm_config={ <span class="hljs-string">"timeout"</span>: 600, <span class="hljs-string">"cache_seed"</span>: 28, <span class="hljs-comment"># change the seed for different trials</span> <span class="hljs-string">"config_list"</span>: autogen.config_list_from_json( <span class="hljs-string">"OAI_CONFIG_LIST"</span>, filter_dict={<span class="hljs-string">"model"</span>: [<span class="hljs-string">"google/gemma-7b-it"</span>]}, ), <span class="hljs-string">"temperature"</span>: 0.7, }</pre></div><h1 id="e11d">Case 1–1 on 1 conversation</h1><p id="fad0">OK, let’s construct the two agents.</p><div id="7704"><pre><span class="hljs-keyword">from</span> autogen.agentchat.contrib.math_user_proxy_agent <span class="hljs-keyword">import</span> MathUserProxyAgent

<span class="hljs-comment"># create an AssistantAgent instance named "assistant"</span> assistant = autogen.AssistantAgent( name=<span class="hljs-string">"coder"</span>, system_message=<span class="hljs-string">"You are good at coding"</span>, llm_config=llm_config,

) mathproxyagent = MathUserProxyAgent( name=<span class="hljs-string">"mathproxyagent"</span>, human_input_mode=<span class="hljs-string">"NEVER"</span>, is_termination_msg=is_termination_msg=<span class="hljs-keyword">lambda</span> x: x.get(<span class="hljs-string">"content"</span>, <span class="hljs-string">""</span>) <span class="hljs-keyword">and</span> x.get(<span class="hljs-string">"content"</span>, <span class="hljs-string">""</span>).rstrip().endswith(<span class="hljs-string">"TERMINATE"</span>),
code_execution_config={ <span class="hljs-string">"work_dir"</span>: <span class="hljs-string">"work_dir"</span>, <span class="hljs-string">"use_docker"</span>: <span class="hljs-literal">False</span>, },

max_consecutive_auto_reply=<span class="hljs-number">3</span>,

)</pre></div><p id="3ad8">The first agent “coder” is an assistant agent who is driven by the Gemma model to generate code-related responses. The second one is a special user proxy agent that can generate well-structured prompts for math problems and execute code in its environment as well.</p><p id="e41e">Simply prompt them with an inequality calculation.</p><div id="46c0"><pre>task1 = <span class="hljs-string">""" Find all x that satisfy the inequality (2x+10)(x+3)&lt;(3x+9)(x+8). Use code. """</span>

mathproxyagent.initiate_chat(assistant, problem=task1)</pre></div><h2 id="ebea">Case 1 Result</h2><p id="568c">From the output, basically, the conversation workflow can work with Gemma. The “coder” generated a snippet of Python code that followed the instructions from the math user proxy agent.</p><figure id="2e12"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*TF7FG-3EL_YlevlkoKEd0w.png"><figcaption></figcaption></figure><p id="f463">Unfortunately, the first shot of the answer is not correct, so the user proxy returns an error message after execution.</p><figure id="e37a"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*tb-OhMUtt58GgM3Cpoo2QQ.png"><figcaption></figcaption></figure><p id="ce38">Then the “coder” looks incapable of following the error message to iterate the code but only generates an explanation instead. I found that Gemma-7B was struggling to complete the coding task once the AI agent could not complete it in one shot.</p><p id="6272">This result reveals two defects of Gemma-7B:</p><ul><li>The 0-shot code generation capability is still not at an acceptable level</li><li>The reasoning capability is also not strong enough to understand error messages that support its follow-up generation on code updates.</li></ul><p id="86fd">In the AutoGen framework, you may have to add more human interaction between the conversations to help the model understand the error and update the code, by setting <code>human_input_mode=”ALWAYS”</code> in UserProxyAgent.</p><h1 id="5fbe">Case 2 — Group Chat</h1><p id="a477">Let’s move to the group chat application to see whether the model can perform the expected orchestration. In this group chat, we will ask the group to generate a travel blog post with some revisions based on the review conversations between a writer and an editor.</p><figure id="3e84"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*vsaeSLI1Ceg5HQEZ0zxLVg.png"><figcaption>A group chat for blog post writing</figcaption></figure><p id="32c4">To achieve that, we construct three agents: user proxy, writer, and editor. For easy evaluation, the editor is instructed to give recommendations on making the blog posts more viral on social media. In a real project, you may have to enrich its instruction through <code>system_message</code> to give the editor guidelines on blog review.</p><div id="8c3c"><pre>user_proxy = autogen.UserProxyAgent( name=<span class="hljs-string">"User_proxy"</span>, system_message=<span class="hljs-string">"A human admin."</span>, code_execution_config=<span class="hljs-literal">False</span>,
human_input_mode=<span class="hljs-string">"TERMINATE"</span>, is_termination_msg=is_termination_msg=<span class="hljs-keyword">lambda</span> x: x.get(<span class="hljs-string">"content"</span>, <span class="hljs-string">""</span>) <span class="hljs-keyword">and</span> x.get(<span class="hljs-string">"content"</span>, <span class="hljs-string">""</span>).rstrip().endswith(<span class="hljs-string">"TERMINATE"</span>), ) writer = autogen.AssistantAgent( name=<span class="hljs-string">"Write

Options

r"</span>, system_message=<span class="hljs-string">"""You are a blog post writer who is capable of writing a travel blog. You generate one iteration of an article once at a time. You never provide review comments. You are open to comments and willing to make changes in your article based on these comments. """</span>, llm_config=llm_config, description=<span class="hljs-string">"""This is a blog post writer who is capable of writing travel blogs. The writer is open to any comments and recommendations for improving the article. """</span> ) editor = autogen.AssistantAgent( name=<span class="hljs-string">"Editor"</span>, system_message=<span class="hljs-string">"""You review blog posts and give editor recommendations to make them viral on social media. You never write or revise blogs by yourself. You will say "TERMINATE" to terminate the conversation once you feel the article has followed your directions. """</span>, llm_config=llm_config, description=<span class="hljs-string">"""This is an editor who reviews the blogs of writers and provides change ideas. The editor should be called every time the writer provides a version of a blog post. """</span> )</pre></div><p id="cb3e">In the agent construction, be aware of the difference between <code>system_message</code> and <code>description</code>:</p><ul><li>the <code>system_message</code> is provided to the AI agent for learning its role, capability, and objectives.</li><li>the <code>description</code> is provided to the orchestrator for describing the role and position of each agent in the chat group, which helps the orchestrator (manager) clearly know who is the next speaker during the chat.</li></ul><p id="b3c5">Building on these agents, we will now construct the orchestrator (manager) and group chat.</p><div id="1ee2"><pre><span class="hljs-attr">groupchat</span> = autogen.GroupChat(agents=[user_proxy, writer, editor], messages=[], max_round=<span class="hljs-number">8</span>) <span class="hljs-attr">manager</span> = autogen.GroupChatManager(groupchat=groupchat, llm_config=llm_config)</pre></div><p id="0e81">As a final step, prompt the group chat to generate a blog post about traveling in Bohol Island.</p><div id="44ff"><pre>user_proxy.initiate_chat( manager, message=<span class="hljs-string">"""Generate a blog post about traveling in Bohol Island. """</span> )</pre></div><h2 id="e2ab">Case 2 Result</h2><p id="69bf">Moving forward, let’s examine the running result. The conclusion in advance: <b>the orchestration and writing performance of Gemma-7B is quite good in AutoGen.</b></p><p id="3fa6">This is the peep of the initial article written by the Writer agent.</p><figure id="4b08"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*wHhHH76DirM2CA9WI18O1g.png"><figcaption>Step 1 — initial writing</figcaption></figure><p id="c52a">The Editor agent is called right after the Writer and can generate quite decent comments and recommendations for the original article.</p><figure id="6674"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*DqudgOwwF6csfi3vjScaEw.png"><figcaption>Step 2 — Review comments</figcaption></figure><p id="6064">To my surprise, the Writer agent is picked up as a speaker again and follows the Editor’s suggestion to add hashtags for the blog post. This is quite incredible for such a small-size model to orchestrate the multi-agent generation workflow.</p><figure id="b4a2"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*gu4w-h-5RcHF8ZHxUEVVvw.png"><figcaption>Step 3 — revised version</figcaption></figure><p id="fd6f">This proves the high general and reasoning score of Gemma-7 since I never explicitly asked the writer to revise the blog posts, and only described “<i>The writer is open to any comments and recommendations for improving the article.</i>” in the description section.</p><p id="510b">Then, the Editor shows up again to give a summary of the revised version.</p><figure id="d59a"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*s2W7lR5OHzUv1rQZq59lPg.png"><figcaption>Step 4- Summary</figcaption></figure><h1 id="9e11">In Conclusion</h1><p id="3059">According to my evaluation of multi-agent applications using Gemma, the code and debugging reasoning are not good enough to handle med-level autonomous tasks that rely on code execution. I hope there will be a fine-tuned Gemma model for better code capability.</p><p id="fdb1">However, I fully appreciate the generating quality at such a 7B size model in the group chat application of AutoGen, especially its orchestration performance. By providing a refined <code>system_message</code> and <code>description</code>, and with necessary human interaction, the collaboration between Gemma-7B and AutoGen will definitely be a competitive development toolset for creating professional writing assistants.</p><p id="9c94">Thanks for reading. If you think it’s helpful, please Clap 👏 for this article. Your encouragement and comments mean a lot to me, mentally and financially. 🍔</p><p id="523d"><b>Before you go:</b></p><p id="789f">✍️ If you have any questions or business requests, please leave me responses or find me on <a href="https://twitter.com/Yeyu2HUANG/"><b>X</b></a><b> </b>and <a href="https://discord.gg/KPTCE4CEmp"><b>Discord</b></a> where you can have my active support on development and deployment.</p><p id="5d65">☕️ If you would like to have exclusive resources and technical services, checking the membership or services on my<b> <a href="https://ko-fi.com/yeyuh">Ko-fi</a> </b>will be a good choice.</p></article></body>

Is Gemma Capable of Building Multi-agent Applications in AutoGen?

A Quick Tutorial on Building Gemma + AutoGen Apps with Performance Evaluations

Image by author

Big news every day! While I am playing with Google’s most powerful large language model — Gemini 1.5 Pro, Google suddenly unveiled its latest contribution to the open-source LLM community: Gemma, which comes with a family of lightweight, text-generation, decoder-only models with state-of-the-art performance, trained on a massive dataset of 6 trillion tokens. Available in 2 billion and 7 billion parameter scales, with both base and instruct versions, LLM app developers now have another brand new option on the shelf to upgrade the apps with fair generation quality at less cost. Trained with the inspiration from Gemini’s techniques, these models achieve impressive benchmark scores, and their open-source nature allows for further fine-tuning and customization.

As a SOTA open-source language model, it can never run away from comparing with the Llama-2 models and of course, should “beat” them in many benchmarks for various generative capabilities. From Gemma’s released report, it easily defeats Llama-2–7B and even the 13B model on categories of General, Reasoning, Math, and Code with overwhelming results. When I checked the same data of Mistral-7B which commonly stands on the top of 7B models, it surprised me that the Gemma-7B almost won Mistral-7B entirely on these categories.

Gemma v.s. Llama-2
Mistral-7B v.s. Llama-2

You will find more details from Google’s blog post.

The data is data, as a LLM app developer, I understand that the real value of the commercial lies in its actual quality in practice under complex tasks. Therefore, I am going to implement a multi-agent application using AutoGen framework, showing how to create agents with Gemma models. From the experiment, hopefully, we can understand whether this model is qualified for reasoning and orchestration capability to complete more complex tasks than simple one-answer generation.

AutoGen + Gemma models

This demonstration will be based on remote inference API for those who don’t have GPU resources and another tutorial for local deployment for those who have computational hardware with data privacy concerns will be published very soon.

In the application, I am going to create a scenario of 1 on 1 conversation to test the model’s overall generation quality, especially on coding, and another scenario of a group chat with multiple agents to see how well (or bad) the orchestration works.

Gemma API from OpenRouter

The implementation remotely with Gemma is by directly calling model API from a third-party inference service which aims to help developers who have no GPU locally create apps using open-source language models, the same way as developing GPT or Gemini models, but at a much lower cost.

In this demo, I will use the service from a platform called OpenRouter which has immediately provided an inference endpoint for the Gemma-7B-it (instruction) model right after Google’s release and most importantly, for free!

And, if you don’t want to tolerate the rate limit (10 requests/mins) for the free version, you can also try the paid version for only $0.13/M tokens cost for input and output. It’s a very competitive price even for individual play.

Let’s drop a breakpoint here and step through the code.

LLM Configuration

The AutoGen framework supports the requests and responses through OpenAI API by default, and fortunately, OpenRouter supports the OpenAI-compatible API which means the developers don’t have to implement additional web requests for its inference service.

Therefore, we should install the AutoGen package that packs OpenAI functions first.

!pip install --quiet --upgrade pyautogen

Next, we should create an LLM config as a necessary step for the AutoGen application. Define an environment variable for the config list first.

import os

os.environ['OAI_CONFIG_LIST'] ="""[{"model": "google/gemma-7b-it",
"api_key": "sk-or-v1-Your OpenRouter KEY",
"base_url": "https://openrouter.ai/api/v1",
"max_tokens":1000}]"""

Make sure the model name and path match what is on the model page together with the API key that you created in your OpenRouter account. Don’t forget to replace the base_url with the new OpenRouter’s endpoint, otherwise, your request will be directed to OpenAI.

Use the correct name from the model page

Then define the LLM_config for agents.

import autogen

llm_config={
    "timeout": 600,
    "cache_seed": 28,  # change the seed for different trials
    "config_list": autogen.config_list_from_json(
        "OAI_CONFIG_LIST",
        filter_dict={"model": ["google/gemma-7b-it"]},
    ),
    "temperature": 0.7,
}

Case 1–1 on 1 conversation

OK, let’s construct the two agents.

from autogen.agentchat.contrib.math_user_proxy_agent import MathUserProxyAgent

# create an AssistantAgent instance named "assistant"
assistant = autogen.AssistantAgent(
    name="coder",
    system_message="You are good at coding",
    llm_config=llm_config,

)
mathproxyagent = MathUserProxyAgent(
    name="mathproxyagent",
    human_input_mode="NEVER",
    is_termination_msg=is_termination_msg=lambda x: x.get("content", "") and x.get("content", "").rstrip().endswith("TERMINATE"),    
    code_execution_config={
        "work_dir": "work_dir",
        "use_docker": False,
    },

    max_consecutive_auto_reply=3,
)

The first agent “coder” is an assistant agent who is driven by the Gemma model to generate code-related responses. The second one is a special user proxy agent that can generate well-structured prompts for math problems and execute code in its environment as well.

Simply prompt them with an inequality calculation.

task1 = """
Find all $x$ that satisfy the inequality $(2x+10)(x+3)<(3x+9)(x+8)$. Use code.
"""

mathproxyagent.initiate_chat(assistant, problem=task1)

Case 1 Result

From the output, basically, the conversation workflow can work with Gemma. The “coder” generated a snippet of Python code that followed the instructions from the math user proxy agent.

Unfortunately, the first shot of the answer is not correct, so the user proxy returns an error message after execution.

Then the “coder” looks incapable of following the error message to iterate the code but only generates an explanation instead. I found that Gemma-7B was struggling to complete the coding task once the AI agent could not complete it in one shot.

This result reveals two defects of Gemma-7B:

  • The 0-shot code generation capability is still not at an acceptable level
  • The reasoning capability is also not strong enough to understand error messages that support its follow-up generation on code updates.

In the AutoGen framework, you may have to add more human interaction between the conversations to help the model understand the error and update the code, by setting human_input_mode=”ALWAYS” in UserProxyAgent.

Case 2 — Group Chat

Let’s move to the group chat application to see whether the model can perform the expected orchestration. In this group chat, we will ask the group to generate a travel blog post with some revisions based on the review conversations between a writer and an editor.

A group chat for blog post writing

To achieve that, we construct three agents: user proxy, writer, and editor. For easy evaluation, the editor is instructed to give recommendations on making the blog posts more viral on social media. In a real project, you may have to enrich its instruction through system_message to give the editor guidelines on blog review.

user_proxy = autogen.UserProxyAgent(
    name="User_proxy",
    system_message="A human admin.",
    code_execution_config=False,  
    human_input_mode="TERMINATE",
    is_termination_msg=is_termination_msg=lambda x: x.get("content", "") and x.get("content", "").rstrip().endswith("TERMINATE"),
)
writer = autogen.AssistantAgent(
    name="Writer",
    system_message="""You are a blog post writer who is capable of writing a travel blog.
                      You generate one iteration of an article once at a time.
                      You never provide review comments.
                      You are open to comments and willing to make changes in your article based on these comments.
                    """,
    llm_config=llm_config,
    description="""This is a blog post writer who is capable of writing travel blogs.
                   The writer is open to any comments and recommendations for improving the article.
                """
)
editor = autogen.AssistantAgent(
    name="Editor",
    system_message="""You review blog posts and give editor recommendations to make them viral on social media. 
                      You never write or revise blogs by yourself.
                      You will say "TERMINATE" to terminate the conversation once you feel the article has followed your directions.
                       """,
    llm_config=llm_config,
    description="""This is an editor who reviews the blogs of writers and provides change ideas.
                    The editor should be called every time the writer provides a version of a blog post.
                """
)

In the agent construction, be aware of the difference between system_message and description:

  • the system_message is provided to the AI agent for learning its role, capability, and objectives.
  • the description is provided to the orchestrator for describing the role and position of each agent in the chat group, which helps the orchestrator (manager) clearly know who is the next speaker during the chat.

Building on these agents, we will now construct the orchestrator (manager) and group chat.

groupchat = autogen.GroupChat(agents=[user_proxy, writer, editor], messages=[], max_round=8)
manager = autogen.GroupChatManager(groupchat=groupchat, llm_config=llm_config)

As a final step, prompt the group chat to generate a blog post about traveling in Bohol Island.

user_proxy.initiate_chat(
    manager, message="""Generate a blog post about traveling in Bohol Island.
                      """
)

Case 2 Result

Moving forward, let’s examine the running result. The conclusion in advance: the orchestration and writing performance of Gemma-7B is quite good in AutoGen.

This is the peep of the initial article written by the Writer agent.

Step 1 — initial writing

The Editor agent is called right after the Writer and can generate quite decent comments and recommendations for the original article.

Step 2 — Review comments

To my surprise, the Writer agent is picked up as a speaker again and follows the Editor’s suggestion to add hashtags for the blog post. This is quite incredible for such a small-size model to orchestrate the multi-agent generation workflow.

Step 3 — revised version

This proves the high general and reasoning score of Gemma-7 since I never explicitly asked the writer to revise the blog posts, and only described “The writer is open to any comments and recommendations for improving the article.” in the description section.

Then, the Editor shows up again to give a summary of the revised version.

Step 4- Summary

In Conclusion

According to my evaluation of multi-agent applications using Gemma, the code and debugging reasoning are not good enough to handle med-level autonomous tasks that rely on code execution. I hope there will be a fine-tuned Gemma model for better code capability.

However, I fully appreciate the generating quality at such a 7B size model in the group chat application of AutoGen, especially its orchestration performance. By providing a refined system_message and description, and with necessary human interaction, the collaboration between Gemma-7B and AutoGen will definitely be a competitive development toolset for creating professional writing assistants.

Thanks for reading. If you think it’s helpful, please Clap 👏 for this article. Your encouragement and comments mean a lot to me, mentally and financially. 🍔

Before you go:

✍️ If you have any questions or business requests, please leave me responses or find me on X and Discord where you can have my active support on development and deployment.

☕️ If you would like to have exclusive resources and technical services, checking the membership or services on my Ko-fi will be a good choice.

Llm
Autogen
Gemma
Open Source Model
AI
Recommended from ReadMedium