What’s next for AI: AI agentic workflows?
I’m sure that most of you would have recently heard about Devin AI, it gathered a lot of attention as the world’s first AI Software developer. Now we have another one from India called Devika. So, is this the future of AI development, AI Agents? Let’s try to go a layer deeper into what are AI agents, how they are evolving, and how they are going to change the workflow of AI development. And the most important of all, is this going to be the next steps towards AGI or AGI itself? So, without further ado, let’s jump right into AI agents and AI agentic workflows.
In general, an AI agent is a system that perceives its environment through sensors and acts upon that environment using actuators based on its perception, internal states, and experiences to achieve specific goals. But in this article, we are specifically talking about LLM-based AI agents. They operate autonomously on the web or an OS (Operating System), can learn from their interactions, and make decisions to pursue their objectives, often optimizing for certain criteria.
Table of Contents
- History of AI Agents
- Improving the Prompts
- Giving Self-Reflection Capabilities to LLMs
- Using Tools to Operate Autonomously
- Understanding AI Agents
- Agentic Workflows
- Conclusion
History of AI Agents
Back in 2016, RL agents were hype, people were trying to create different types of RL agents to play games like Atari, and other similar games. There was no concept of AI agents back then. However, a few researchers from OpenAI, including Jim Fan, Karpathy, and Tim Shee wanted to use these RL agents to get a few things done that current AI agents are doing. The project was called World of Bits, and the idea was that they would create an agent that could go on a webpage and handle small requests like ordering a pizza and stuff. They wanted to navigate the operating systems through an agent. But they were way ahead of the game, the technology hadn’t been invented and they couldn't get it to work properly.
What was missing exactly? LLMs.
They were still 5 years away from creating the basics of a much more generalized intelligent behavior. What LLM became good at was understanding the language, so good, that they were able to modify their output and behavior based on the instruction. LLMs became the right recipe, that could be instructed in a human language, and eventually tasked with creating workflows. Creating an Agentic workflow was the most logical next step.
A bit of warning about building AI agents.
It is not as simple as people might be thinking and hyping right now. It is like the autonomous car, easy to think of, easy to create a proof of concept, but really hard to make it actually usable. We still don’t have fully autonomous cars after decade's worth of research and billions of dollars. Another such technology is VR, we have had the idea and the POCs of VR since the late 2000s, and yet, it is still not scalable.
So, the same might be true with AI agents.
Improving the Prompts
The first step to creating a good agent is to give it good prompts. But are humans really good at creating good prompts? For a given subject an expert might be able to create an optimized prompt, but what about others? So, there is a strategy called PROMPTBREEDER. It is a self-improving system that evolves prompts for specific domains.
- Using LLMs, it adjusts and assesses task prompts based on training data over multiple iterations.
- PROMPTBREEDER also refines the rules (mutation-prompts) guiding task-prompt adjustments. This results in a dual layer of self-improvement: refining prompts and refining methods (self-referential).
- PROMPTBREEDER outperforms leading strategies in arithmetic and reasoning tests.
- It can also create detailed prompts for complex challenges like hate speech classification.
Read the full article here on how to create better prompts using genetic algorithm concepts:
Giving Self-Reflection Capabilities to LLMs
In order to give self-reflection capabilities to LLMs, we need to first understand the problems with current LLMs.
- They give a very generalized response, often lacking nuance. At times, it repeats itself.
- There is a lot of verbiage and unnecessary words that don’t say anything.
- They often try to tell things politically correctly and can’t make good arguments from a given worldview.
- It hallucinates and often gets things wrong with complex problems that might require the answer to be spoken in more than 8k or 16k tokens.
- Runs out of memory to store the relevant context for the given problem.
Hallucination is one of the biggest problems of LLMs. But what exactly are these hallucinations?
Instead of me explaining and talking about LLM hallucinations. Look at what the man himself has to say about it.

The probable solution to hallucinations is to let the system think more before it responds. And that’s where we have several strategies like Chain of Thought, Tree of thought and Algorithm of Thought.

The idea behind the self-reflection is to let the system explore diverse paths before answering any question. The given system should have some capability to backtrack on its path and re-evaluate its own response. Tree/Algorithm of Thoughts uses a Tree-based or graph-based Data structure to navigate all the knowledge graphs.
To know more about self-reflection, Chain of Thought, Tree of Thought and Algorithm of Thought:
But I’m personally a bit cautious about these prompting strategies. We often think that LLM comes up with better planning when prompted through these advanced prompting strategies. But it has been shown by a few researchers, that we inadvertently feed the answer or hint to the the answer in these prompting strategies.
Read this awesome article to understand why LLMs can’t reason and plan?:
Using Tools to Operate Autonomously
An AI agent would definitely need the capability to use different types of tools, without this ability, we can’t have AI agents that can operate our computers and achieve certain tasks.
But why do we need tools, why can’t we give all the knowledge directly to the LLM itself?
LLMs are really bad at doing mathematical calculations. Earlier they couldn’t even access the internet, now they can. But why they are bad at even basic calculation? Baking precise information in LLMs is quite tough, that’s why it’s better that instead of LLM calculating a mathematical answer on its own, it uses a calculator or a similar tool.
But the question is how does an LLM know when to use a tool?
The newer version of LLMs can not only produce text, but they can also use different tools. For instance, LLMs have been given the capability to search the internet and use that information to give more up-to-date and better answers.
And this is how it works:

If you want a complete overview of LLMs:
Understanding AI Agents
Currently, AI agents are used in the context of LLMs. They are being looked at as the future of RAG pipelines or the next step towards AGI. The below diagram summarises what AI agents are:
An “agent” is an automated reasoning and decision engine. It takes in a user input/query and can make internal decisions for executing that query in order to return the correct result. The key agent components can include, but are not limited to:
- Breaking down a complex question into smaller ones
- Choosing an external Tool to use + coming up with parameters for calling the Tool
- Planning out a set of tasks
- Storing previously completed tasks in a memory module

We have different types of agents that can do from simple to very complex tasks like Dynamic planning. Or let me correct myself as RaoK puts it, can help in generating plans, that can be later on checked with automated planners for feasibility.


Agentic Workflows
Let’s look at how can we make LLM think a little more. But the real question is whether breaking down the problems into simpler problems makes LLMs smart. And the answers are YES and NO.
Keep in mind, that we are still using the same LLM, then why should the performance increase? The answer to this lies in context.
When we break down the problem into simpler problems, LLM will answer all of them and thus add more context while solving the overall problem.
But inadvertently we ourselves tell the LLM, how to break down the problem, thus it is we who do the planning. LLM can’t understand which plan is better and which is not. But in the coming months, we can train or instruct the LLM in such a way that it first breaks the problem into subproblems. And then use that to add more context and then solve the problem. But here lies the problem, it itself doesn’t know whether the sub-problems it came up with are correct or not. As of now, humans have to decide which subproblems to use to solve the main task.
Now, I know that this doesn't make much sense, but please check out https://twitter.com/rao2z he will explain in great detail why LLMs appear to plan but can’t actually plan. In the best case, they can just come up with average plans for similar problems.

I myself have seen that this revised behavior keeps breaking so many times when prompted about certain topics.
Check out this awesome lecture on LLM's limitations:










