ChatGPT, Next Level: Meet 10 Autonomous AI Agents: Auto-GPT, BabyAGI, AgentGPT, Microsoft Jarvis, ChaosGPT & friends

The ultimate curated list of autonomous AI agents: complete with tools, resources and examples

ChatGPT and many of the other current foundation models are great. They can answer innumerable questions, create AI art that rivals human masterpieces, analyze photos, and in some cases, they even show what we would call intelligence.

But there’s one simple challenge they’ve yet to conquer — to efficiently complete a laborious task made up of distinct steps.

Currently, AI models are like eager office interns, tireless and enthusiastic but desperately in need of guidance. They require monitoring, frequent directions, and vigilance against fudging or half-truths (aka “hallucinations”).

This is where AI agents step in. Agent AIs can do this autonomously. These autonomous helpers take user input, break it down into smaller tasks with the assistance of LLMs, and tackle them one at a time. The agents store the results and use them, if necessary, for subsequent steps in the process. As a result, AI Agents can handle complex tasks and access various foundation models that are not limited to language alone. For example, an agent might independently decide to utilize code, video or voice models; employ search engines or calculation tools to accomplish the task you’ve given it.

The autonomous agents are not simply smarter than the foundation models on which they are based, but open up a completely new dimension: They are capable of “slow thinking” (Kahneman’s “system 2”). They solve complicated questions — in which one crawls to the goal bit by bit via intermediate results. Until now, slow thinking was only possible for LLMs via prompting techniques such as chain-of-thought, and here also only to a very limited extent.

While the addressable level of complexity does not increase significantly with agent AIs, they cover an incredible amount of additional area in the problem space (dashed red box) due to their ability to solve complicated problems: In other words, everything that requires more than a few steps to solve.

Content:

Intro: What are Autonomous AI Agents?
Must-know AI Platforms: A deep-dive into AgentGPT, Auto-GPT, BabyAGI, Jarvis & more, resources included
The Completely Incomplete List of AI Agent Platforms
Outlook: From sterile AI to powerful and dangerous agents

Intro — What are autonomous agent AI

Let’s say we want to use an AI model to create a deck of 52 cards, with each card featuring a different musician. We’d also like to substitute the usual card suits such as clubs or hearts to different music genres, such as soul or house.

Is it possible for an AI model to complete such a complex task?

The simple answer is no.

While a language model can compile a list of genres and artists, we need at least one additional model (AI art model such as Midjourney) to produce the visuals. We may also need additional systems to search the internet and to store contents.

We could write a batch processing script doing all this.

Or — and here our agent AIs fly in — we could just provide a prompt telling what we want to do, and the agent writes the batch script, executes it and monitors the outcome.

Usually, AI agents use both for the single steps (ie. selecting an artist for a single card) and for framework tasks (ie. generating a task list) various external models. They are outsourcing the thinking steps while storing information, tracking tasks, managing interface and orchestrating the entire process.

Image credit: Maximilian Vogel, note: This is an illustrative example only — the results of most current AI agents are not as overwhelming.

Autonomous AI agents have only emerged in the last few weeks, but they’re already developing at breakneck speed. Even Microsoft is getting in on the action with Jarvis / HuggingGPT. I’ll give a brief introduction to some of the main AI agents and discuss possible impacts on application development, along with AI safety.

AgentGPT

Assemble, configure, and deploy autonomous AI Agents in your browser.

This is the first model in the list, not because it is the most important, but because no installation or OpenAI keys are needed.

You can just try it right now.

Features:

browser-based
simple to use
based on OpenAI models
No OpenAI keys needed for test usage

Platform: https://agentgpt.reworkd.ai/ Developer: Asim Shresta

Demo

Let’s deep-dive into how AgentGPT managed a job I gave it:

My task: „Find the 3 most widely used task management software tools for usage in a small company and compare them in terms of price, scope, ease of installation“

Reasoning:

Image credit: Maximilian Vogel / AgentGPT

Some intermediate output:

Many, many more lines of output later, we have the final result (the whole process took approximately 3 minutes):

Auto-GPT

An experimental and open-source agent library based on GPT-4. It chains together LLM “thoughts” to autonomously achieve whatever task you set. Auto-GPT is one of the first platforms to run GPT-4 fully and autonomously, pushing the boundaries of what is possible with AI.

Features:

Accesses the internet for queries and gathering information
Long and short-term memory management
GPT-4 instances for text generation
Accesses popular websites and platforms
File storage and summarization with GPT-3.5

Repository: https://github.com/Significant-Gravitas/Auto-GPT Developer: https://www.significantgravitas.com/

Setup: Guide

Demo-Task: Look for a seasonal event on the internet and create a recipe for it.

Baby AGI

Baby AGI is an AI-powered task management system. The system uses OpenAI and Pinecone APIs to create, prioritize, and execute tasks. The appeal of Baby AGI is in its ability to autonomously solve tasks based on the results of previous tasks and to keep a predefined objective. It also prioritizes tasks efficiently.

Mode of work:

Pulls up the first task from the task list.
Sends the task to the execution agent, which uses OpenAI’s API and Llama to complete the task based on the context.
Enriches the result and stores it in Pinecone.
Creates new tasks and reprioritizes the task list based on the objective and the result of the previous task.

Repository: https://github.com/yoheinakajima/babyagi Developer: Twitter, Blog Setup Guide and Background: http://babyagi.org/ Test Baby AGI (bring your OpenAI key): Hugging Face

Task example: Find popular topics that don’t have enough documentation, for articles for my Linux tutorial blog:

Task Example: Plan a romantic dinner for my wife this Friday night in central Singapore:

JARVIS / HuggingGPT

Jarvis, or HuggingGPT, is a collaborative system comprising a Large Language Model (LLM) as the central controller and numerous expert models as collaborative executors, sourced from the Hugging Face Hub. This agent can employ LLMs as well as other models. The workflow of the system consists of four stages:

• Task Planning: Uses ChatGPT to analyze user requests to discern intent and breaks them down into manageable tasks.

• Model Selection: To solve the given tasks, ChatGPT selects the best suited expert models from Hugging Face, based on their descriptions.

• Task Execution: Invokes and executes each selected model, subsequently returning the results to ChatGPT.

• Response Generation: Finally, it uses ChatGPT to integrate the prediction of all models, and generate a comprehensive response.

Repository: https://github.com/microsoft/JARVIS Detailed setup guide: How to use Jarvis / HuggingGPT Paper: Arxiv How it works:

Image credit: Yongliang Shen, et. al, Microsoft

The Completely Incomplete List of AI Agent Platforms

🌐 Web-based platforms

AgentGPT See above.

https://aiagent.app/ Similar web-based solution like AgentGPT

Cognosys An AI powered web-based agent.

DoAnythingMachine It has an enticing name and an even more promising claim as the “To-Do list that does itself for you,”. Sadly, there’s a waitlist.

alphakit Early access list: A team of autonomous AI agents for everyone.

📚 Frameworks, libraries & platforms:

Auto-GPT See above.

AutoGPT.js Create a custom AI agent, name it and assign it a mission for any goal you can imagine — all while running within the browser. Watch while it generates tasks, executes them, and learns from the outcomes for optimal results.

AutoGPT GUI A graphical user interface to AutoGPT

Auto-GPT-Plugins Plugins for Auto-GPT

Free-AUTO-GPT-with-NO-API Free AUTOGPT with NO API is a repository that offers a simple version of Autogpt, an autonomous AI agent capable of performing tasks independently. Unlike other versions, the implementation does not rely on any paid OpenAI API, making it accessible to anyone.

babyAGI See above

BabyBeeAGI A slower, buggier, but more powerful mod of babyAGI

BabyAGI-asi BASI is a modified version of BabyAGI that shows how LLMs can perform in the real world.

JARVIS See above

OpenAGI Where LLM meets domain experts.

Agent-LLM Agent-LLM is an Artificial Intelligence Automation Platform designed to power efficient AI instruction management across multiple providers. The agent are equipped with adaptive memory, and this versatile solution offers a powerful plugin system that supports a wide range of commands, including web browsing. With growing support for numerous AI providers and models, Agent-LLM is constantly evolving to empower diverse applications.

AutoGPT-Next-Web 1. Free one-click deployment with Vercel in 1 minute 2. Improved local support: After typing in Chinese, the content will be displayed in Chinese instead of English 3. UI designed to match AgentGPT, responsive design, and support for dark mode 4. Have your own domain? Even better, after binding, you can quickly access it anywhere without barriers 5. Support access code control, only you or trusted individuals can use the website

MiniGPT-4 Enhancing Vision-language Understanding with Advanced Large Language Models

Micro-GPT MicroGPT is a simple and effective autonomous agent compatible with GPT-3.5-Turbo and GPT-4. It combines robust prompting , a minimal set of tools and short-term memory (Chain of Thoughts). Data augmentation via vector stores will be added soon.

Teenage-AGI An(other) OpenAI and Pinecone-based agent. Process steps when getting a user query:

AI vectorizes the query and stores it in a Pinecone Vector Database
AI looks inside its memory and finds memories and past queries that are relevant to the current query
AI thinks about what action to take
AI stores the thought from Step 3
Based on the thought from Step 3 and relevant memories from Step 2, AI generates an output
AI stores the current query and its answer in its Pinecone vector database memory

Camel To address the challenges of achieving autonomous cooperation, Cameleers have introduced a new communicative agent framework which is called role-playing.

ai-legion AI Legion is an LLM-powered autonomous agent platform.

Tools:

Xircuits The Xircuits toolkit provides a comprehensive set of components for experimenting and creating Collaborative Large Language Model-based automatons (Agents) in the style of BabyAGI and Auto-GPT. By default, the toolkit comes with BabyAGI agents, but it can easily be modified to accommodate your own custom prompts.

Games

gptrpg Contains: 1. A basic RPG-like environment for an LLM-enabled AI Agent to inhabit . 2. A basic AI Agent connected to the OpenAI API to exist in the environment, serving as a proof of concept.

SFighterAI SFighterAI features an AI agent trained through deep reinforcement learning to defeat the final boss in ‘Street Fighter II: Special Champion Edition’. The AI agent makes decisions based solely on the game screen’s RGB pixel values, achieving a 100% win rate in some scenarios.

Doomsday agents

ChaosGPT ChaosGPT, aka the autonomous agent trying to destroy humanity. It has failed miserably so far due to lack of access to weapons of mass destruction. It is nevertheless fascinating to observe its attempts for world domination, especially as its underlying models are trained on humanity’s collective ideas about the topic.

Let’s hope it runs out of OpenAI tokens before achieving its goal…it seems to have been eerily quiet in the last days.

More Resources

Additional lists of models: https://github.com/yoheinakajima/babyagi/blob/main/inspired-projects.md https://github.com/yzfly/Awesome-AGI

Scientific papers on arxiv.org: Yongliang Shen, et al.: HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in HuggingFace Chaoning Zhang, et al.: One Small Step for Generative AI, One Giant Leap for AGI: A Complete Survey on ChatGPT in AIGC Era Joon Sung Park et al.: Generative Agents: Interactive Simulacra of Human Behavior

Some final thoughts: From sterile AI to powerful and dangerous Agents

AI agents offer more than just an improvement on foundation models—they add a new dimension altogether. While they do not outperform classic foundation models when it comes to executing simple and specific tasks, they excel in breaking down complex tasks into smaller ones.

If foundation models get better in the future, they will not replace AI agents, but make them more powerful still.

Autonomous agents can:

Integrate different types of models (language, code, AI art, strategy and many more)
Integrate non-foundation model components such as search engines and calculation engines.
Branch into task sub-branches
Verify and rewrite output from one model using another model
Try something, check the results, accept it if it works or try something different
Run continuously and process ongoing input (e.g., controlling a running system over the time)

In short — AI agents allow the development of business applications without much need for software development. High-level processing schemas can, to a large extent, be specified in natural language, just as the granular description for a single task can be given to a foundation model.

Classic foundation models like GPT-4 are rather “sterile” in a positive sense. Meaning that if we initiate a task we won’t end up with an unintended sequence of actions with potentially disastrous outcomes. GPT-4 simply answers your question and reverts back to base after the end of a session. This is in contrast to how AI agents of the future could behave. Based on user instructions, they could set off a chain of actions that we have no way of anticipating.

A future AI agent (unlike the ones we discussed above) could create plans and actions which are beyond the control of any human.

If an agent is connected to the internet, it may do things it considers necessary to complete a task — things that were not intended by a human user — such as hacking into cloud systems to retrieve information. If an agent is able to train models or configure future instances of itself in order to complete tasks, a huge AI alignment problem could arise: Systems may emerge which are far beyond human control.

More on that by Seth Herd: Agentized LLMs will change the alignment landscape

If you know a some more agent AIs, or resources or a story about autonomous AIs please drop me a note (e.g., responding to this article). Hit the speech bubble down here: ↓🗨

Listen to The Generator Podcast