Battle of the Bots (2024 Update): ChatGPT vs. Gemini vs. Pi vs. Claude 2
I originally posted this article on Medium last year, but with the latest updates to ChatGPT, Google’s Gemini, and Anthropic’s Claude, I thought this was the time to reconsider my analysis and see how much has changed. So here is an update on how four popular large language models (LLM), OpenAI’s ChatGPT, Google’s Gemini (previously, Bard), Inflection’s Pi, and Antrophic’s Claude 2 compare to each other in terms of their functionalities.
As a reminder, LLM is a type of artificial intelligence (AI) model trained on large amounts of text data. LLM essentially learns how to generate human-like text by predicting which word is most likely to follow the previous one (the actual stats are way more complex than this, but you get the gist).
Just a note before we start: I’m not an AI developer or expert, so I will be comparing the four chatbots from the user perspective rather than their technical specifications (e.g., training methodology, training data, etc.) Also, I will only be considering publicly available free versions of the chatbots, so bear in mind that paid versions could have a lot more functionality available. Finally, all assessments represent my opinions and experience with the aforementioned AIs, so please let me know your thoughts and whether you agree in the comments (but be nice!) Now, with the disclaimers out of the way, let’s get into the discussion itself!
OpenAI’s ChatGPT (plus Microsoft Copilot)
I have to say that I still use ChatGPT the most, maybe because it was my first experience with generative AI, so it will always be one of my favourites. However, I have to say that I am not as impressed with the latest version (not sure what exactly has changed) as I now have to overexplain to ChatGPT what I want before I can get the answer I’m looking for (that is if I get the one I’m looking for in the first place!) I previously noted that I liked that it was able to comprehend even the more obscure prompts, but this seems to be less of the case lately which I’m a bit disappointed about.
I would say that help with coding in Python is still on point, which is part of the reason why I keep using ChatGPT. Aside from coding, ChatGPT could do all sorts of other things: solve math problems, write essays on a given topic, summarise large amounts of text, generate custom advice for your problem, and more. ChatGPT can also present data in different formats, including tables, JSON, HTML, and other formats. Each chat is also automatically named (although you can change the name if you like) and stored, so you can come back to it if you want.
ChatGPT still has a relatively low rate of AI hallucinations (a fancy term for when AI presents false information as facts in its answer), at least in my experience, although I would say that other ChatBots are catching up! This does require the user to input well-designed prompts into it, but overall, I rarely notice mistakes from ChatGPT, and if I do, it usually corrects itself when I ask it to.
The free version of ChatGPT is, however, limited to the data up to January 2022 (in my latest version), which means that ChatGPT can’t comment on or analyse current affairs, which is especially useful for my job as a pharma consultant. Moreover, ChatGPT does not output or accept as input images, tables, or other files. This is not the case for ChatGPT 4, which is connected to the internet and can deal with different input and output data formats, but you have to pay for it.
A way around the need for a paid version would be to use Microsoft Copilot, which I became a fan of recently. It is currently available in a preview version to Microsoft 365 users (if admin permissions allow), but you can also assess it through the Microsoft Edge browser. Copilot runs on ChatGPT 3.5 (the same version as free ChatGPT), but it is connected to the internet and can generate images using another tool from OpenAI, DALL-E 2. When using the Edge, you can use Copilot to interact with the webpage you have open in the browser. The feature I found helpful was to use Copilot to summarise the information from the PDF in a foreign language (it can’t necessarily translate large files yet) and then ask follow-up questions if I don’t understand something from the summary. Copilot automatically provides links to its sources of information, so you can quickly and easily fact-check it if in doubt. For now, Copilot is limited to 30 responses per conversation, although I personally never really went over this limit. Copilot will allow users to essentially use ChatGPT capabilities within Microsoft apps, which would be pretty convenient, so if you are using Microsoft 365, it might be a good idea to invest in Copilot given its seamless integration with the system.
Google’s Gemini
By and large, I would say that Gemini remains on par with ChatGPT in terms of its different capabilities, like coding help, math problems, data analysis and summaries, etc. However, Gemini’s one big advantage is that it’s connected to the internet, which allows me to get answers to the things that are happening in real-time.
While, unlike its earlier version, Bard, Gemini seems to generate more correct answers when asked about current affairs (i.e., fewer AI hallucinations), it is still far from perfect, so be careful not to take its answers at face value. Gemini does have a feature that allows you to use Google search to verify its answers for correctness, although I found that this feature isn’t very helpful when Google can’t find exact matches to Gemini’s answers (Google leaves most of the answers unevaluated).
Gemini can handle a lot more information than Bard can, but it does seem to struggle to analyse the information correctly the more you provide.
Overall, I think Gemini is a decent option to have if you need to analyze current news, but I would definitely not rely on it for this. In other aspects, it is pretty interchangeable with ChatGPT, and at this rate, it might even become better than ChatGPT for certain use cases. I would say that if you work with Google apps a lot, Gemini integration would probably be the best option for you.
Inflection’s Pi AI
As far as chatbots go, Pi is the most “conversational” AI of the four. And I mean emoji-at-the-end-of-every-paragraph kind of conversational! But it is the most entertaining one, in my opinion, as well.
Unlike ChatGPT or Gemini, Pi is not the most proficient when it comes to helping with code or summarising data in different output formats. Pi has increased its character limit for prompts (from 1000 to 4000), but it still cannot analyze large amounts of data at once. You have most of your responses as a single long conversation, although now you can create separate threads for the responses.
Nevertheless, I found Pi to be the most engaging when it comes to conversations on life or career topics. ChatGPT and Gemini would most certainly give you quite a formal answer and then send you to a qualified professional (which, don’t get me wrong, is great advice!) But if you are just looking for a way to vent and feel understood, Pi will be your friend! However, always remember that Pi is just an AI chatbot, so please seek professional help for any serious issues you might have.
Anthropic’s Claude AI
I first heard about Claude from a friend and decided to give it a try, but frankly, I was not impressed with its original capabilities. The latest version, Claude 2, does, however, appear to be an improvement. Claude can still answer simple questions from a text-based prompt, but now it can also do some coding help, data summaries, and even summarise user-provided PDF files (although there is a limit on that as well). The summaries of the large amounts of data were actually better and more accurate with Claude 2 compared to ChatGPT 3.5 and Gemini, in my experience!
It is still not connected to the internet, and when it comes to helping with coding, the format Claude uses isn’t as easy to just copy and paste. There is also a limit on how many responses you can get at one time (in the free version at least), so you have to be mindful of that. Nevertheless, I would say that its quality has improved the most since I reviewed it last year, and I imagine it will get even better in future iterations, so now I would definitely consider using Claude 2 for some of my tasks.
To sum up, all four chatbots have their merits and their limitations, so I think it largely depends on the task at hand which one is actually the “best.” I created a quick summary table to highlight which chatbot has which capabilities (listing the ones I thought were most helpful):
Now this is it for now, and if you enjoyed this blog, you might also find my generative AI chatbot comparison article from 2023, using ChatGPT to learn Python article, and ChatGPT use cases article interesting. And as always, let me know if you have any comments, suggestions, or ideas for future blogs. Follow and subscribe to my email list so you don’t miss when I post (which is usually once a week on Sundays)!