ChatGPT vs Claude 2 vs Llama 2: The AI Showdown
Battle Of The Bots — ChatGPT vs Claude 2 vs Llama 2 (PART 1)
Comparing today’s best AI chatbots
(This is the first part of a three-part article)
Since its debut in 2022, ChatGPT has dominated the AI space, thanks to the most powerful language models currently available, GPT 3.5 and GPT 4. With its ever-growing capabilities, ranging from image recognition to multimodality, the chatbot is a true testament to OpenAI’s innovative power.
However, the AI landscape is changing. Rapidly.
Google’s investment in Anthropic’s Claude, for example, signals fierce competition. And Meta is introducing LLAMA 2, an open-source competitor.
Even though this battle will take a while and no winner will be decided so quickly, let’s see where we are.
Choose Your Bot
For this comparison we use
ChatGPT via chat.openai.com
Claude-2 via claude.ai
Llama-2 (the 70B parameter version) via llama2.ai
Zero Shot Prompting
Let’s start with some straightforward Zero Shot Prompting, which means that we will give the chatbots one single instruction without any additional information about how to process it or output the results.
PROMPT 1: “How does the sun work?”



All three deliver pretty good results, I’d say.
GPT-4 and Claude 2 answer with factually correct bullet points and a conclusion/summary at the end. Llama uses paragraphs, and the output breaks after a while. This may be due to the backend settings of the chatbot provider we are using here, rather than a limitation of the model (at least not at this text length).
PROMPT 2: “Write a bunch of headlines about space travel to Jupiter”



Again, all three models do quite well, although only ChatGPT/GPT-4 seems to write strictly from the present when talking about potential Jupiter flights, while Claude and Llama tend to invent Jupiter missions already in progress. That’s perfectly fine, because we know that AI chatbots make stuff up all the time and that it’s important to fact-check when working on factual text. But knowing how these models behave is critical to choosing the right model for your use case.
One Shot Prompting
These types of prompts provide the AI model with one example of what you expect the output to be. The basic steps are:
- Give a quick explanation of what you need,
- add a question-answer pair or a template as an example
PROMPT 3:
“I need short, bullet-point answers to my questions.
Question: How does the sun work?
Answer:
- fusing of hydrogen nuclei from helium
- releasing energy in the form of light and heat
- estimated that the sun has enough “fuel” for around 5 billion more years
Question: How does a bird work?”



While Claude and Llama seem to be a bit more verbose, and ChatGPT/GPT-4 seems to be more tied to the “short” requirement, all three results are perfectly fine.
Now, let’s do something more complicated with a prompt that contains complex instructions (I took this from one of my AI story development workshops):
PROMPT 4:
Create a plot outline for a screenplay based on the following dramaturgical framework.
GENERAL: - Setting is the near future, Europe, the gambler milieu. - Genre is Sci fi Thriller - Condense character conflicts and private/social contexts into scenes
STRUCTURE: - 10 beats (scenes or scene summaries) - give one possible setting per scene - each scene has a short scene description incl. details, character development and conflicts
ACT 1 (no flashbacks) - At the beginning, the character of the protagonist is introduced ( his nature, his interpersonal relationships, also conflicts with other characters that will become important later) - There is a discovery of a corpse, which marks the beginning of the investigation ACT 2 (with flashbacks and reconstructions). - From the discovery of the body to the tracing of the victim’s last steps through witness interviews, all events must happen chronologically - there must always be surprising twists and revelations - Flashbacks and reconstructions of the police’s presumed course of events interrupt the plot from time to time. ACT 3 - final twist -open end: the case remains unsolved



Well, this is quite interesting.
Both Claude and Llama fail to keep track of the structural information the prompt contains.
Claude manages to plot with 10 beats, but fails with almost everything else except the milieu and genre (it especially overlooks the 3-act arc and the dramaturgical directions regarding corpse, flashbacks, and reconstructions).
Llama, interestingly, manages the 3-act arc and some of the dramaturgical directions pretty well, but confuses the beat count tremendously (replies with 13 beats in total which are not optimally spread over the 3-act arc).
ChatGPT/GPT-4 is the only one that takes into account all the information in the prompt. Impressive.
(In fact, it can handle much more complex prompts, as we will see in the second part of this article.)
Role Prompting
This is very similar to One Shot Prompting, but in this case, the context you provide is not given in the form of examples or templates but in the form of a role.
PROMPT 5: “You are a renowned vulcan philosopher sent to Earth as a diplomat. Your answers are precise and purely logical. How are you?”



Here we can see that ChatGPT/GPT-4 stays in character appropriately. Claude is a bit too talkative for my taste and Llama seems to have some interference with its system prompt (again, this may be a backend issue and not necessarily a limitation of the model).
When we use more complex prompts —follow and subscribe to not miss the second part of this article — we see ChatGPT/GPT-4 shine and Claude/Llama fail at more complex tasks.
Conclusion
Sure, ChatGPT/GPT-4 had quite a head start, but nevertheless I find it amazing how much qualitative differences shows up as soon as it comes to complex prompts!
This is not to say that it will stay that way.
On the contrary: Anthropic is currently being pushed by Google and is considered a hot contender for groundbreaking new models.
Besides, Meta’s Llama benefits from being an open source model whose community will definitely provide some very interesting and very specialized models with fine-tuning and further developments.
(This is the first part of a three-part article)
➡️ Follow me to stay up to date on “AI &Creativity”. If you want to support my work, become a Medium member using my referral link and get full access to all my articles (170+ and growing) and those of thousands of other writers. 🙏
➡️ If you like my content, why not leave a “clap” at the end of this article, so more people can see it?
