avatarWayne Stelk, PhD

Summary

A science writer evaluates the effectiveness of Google's Gemini Ultra 1.0 against ChatGPT4 as research assistants, ultimately favoring ChatGPT4 for its depth and clarity in handling complex questions.

Abstract

In a comparative analysis, the author, a science writer focused on deep investigations into contemporary issues, tests the capabilities of Google's Gemini Ultra 1.0 and OpenAI's ChatGPT4. The writer assesses the AI models on a range of questions, from straightforward to highly abstract, to determine their utility as research assistants. Despite Gemini's promise of advanced coding and logical reasoning abilities, ChatGPT4 consistently outperforms it, particularly when addressing nuanced and complex issues. The author concludes that ChatGPT4 remains the preferred tool for

I Stumped Google’s Gemini — But Not ChatGPT

Can Gemini replace ChatGPT as a research assistant?

Image: Mojahid Mottakin for Shutterstock

I stumped Google’s new Gemini, Ultra 1.0, with a cannonball question. ChatGPT never blinked. Can Gemini replace ChatGPT as a research assistant? And here’s a more basic question: Can generative AI be a legitimate research assistant for Medium writers?

Note: If you are not a Medium member, you can read this story free using this link. I encourage you to join our community!

I am a science writer for a particular type of reader. I hope to connect with those who take a deeper look at the problems of the day, those who ask not only “What happened?” but “Why did this happen?” In a world gone mad, the goal of my writing genre is to bring “science to sanity.”

My stories are heavy on research. I always write my own articles, but topics related to the problems of the day sometimes put me outside the scope of my knowledge of the many sciences related to human nature. Some well-established writers, such as columnists at major newspapers, have research departments, personal assistants, and specialized databases at their disposal when they need greater depth on a topic.

Moi, je suis tout seul. I have no commercial or academic affiliations. My budget for writing shows negative numbers.

For low-budget (or no-budget) writers like me, the world changed with the advent of generative AI. As a long-term follower of the AI industry, I jumped on the many generative pre-training (GPT)-based research applications that suddenly appeared, such as AI-Writer, Perplexity.ai, ChatPDF, Scholarcy, Grammarly, Hemingway Editor, and Scite, to name a few. I found these specialty AI apps to be interesting but overshadowed by the behemoths: OpenAI’s ChatGPT and Google’s Bard.

After many queries that I put to both ChatGPT and Bard, I was happy to adopt ChatGPT as my go-to research assistant, with Bard as a backup for questions about headline-of-the-day events. ChatGPT gave me a lot of depth in its analysis of topics, but it was limited to the date of its latest training database. Bard generated less depth, but it had the capacity to search the Internet in real-time. Since the “early” days (one year ago), both models have evolved.

Drum-Roll, Please: ChatGPT Threatened. Google Announces Gemini.

On February 11, 2024, my treasured research assistant, ChatGPT, suddenly became threatened. I received an email from Google announcing that Google’s generative large-language model (LLM) had been updated and renamed Gemini, Ultra 1.0 (at a monthly subscription cost of $19.99).

Here is the advert-pitch sent by Google:

Google’s most capable AI model, Ultra 1.0

  • Far more capable at coding, logical reasoning, following nuanced instructions, and creative collaboration
  • Able to understand, explain, and generate high-quality code in many programming languages
  • Always improving, with more exclusive features and capabilities in the coming months

Wow! A new AI model has moved into the neighborhood. How do I decide between my tried-and-true office mate, ChatGPT, and the shiny new object, Gemini, to be my new go-to research assistant?

Taking advantage of Google’s two-month free trial period, I decided to put these two monster applications to the test. I came up with softball and hardball questions. They both did well. Then I came up with a cannonball question. To my surprise, it was the punch that decided the contest.

Staging the Comparative Tests of Gemini’s Ultra 1.0 Versus ChatGPT4

I began my initial tests of Gemini vs. ChatGPT with softball questions, like “What are the new National Football League (NFL) rules for overtime?” when the Superbowl match between the Kansas City Chiefs and the San Franciso 49ers ended in a tie. As expected, both gave me the correct updated rules.

I stepped it up with more difficult questions. Since systems biology is not my strength, I posed this high-geek question: “What is hematopoiesis?” (What? You haven’t yet talked about hematopoiesis over beer with your friends?)

You can find answers to geeky questions like these in Wikipedia, but you need to sort through lots and lots of detail to make the concept remotely understandable. You can think of generative AI as creating abstracts for lengthy Wiki articles.

For medium to hard questions, I found ChatGPT to give better-organized and more complete responses than Gemini. Generally, both models generated comparable content, but I preferred the output style and depth of ChatGPT.

But what about a really hard question — one that packs a cannonball’s punch? With that challenge in mind, I pitched a cannonball to both and was surprised by the results.

In the following two sections, I will reproduce each app’s verbatim responses (in italics) to a highly abstract question that I gave first to Gemini and then ChatGPT.

Gemini’s Ultra 1.0 Response to the Cannonball Test

Image: Mojahid Mottakin for Shutterstock (cropped by author)

My Prompt to Gemini:

By what psychological and social processes does a nation of people come to a consensus on a polarized perspective? For example, by what processes have U.S. citizens come to a consensus that Joe Biden is too old for a second presidency, but Donald Trump is not too old? What are the psychological and social determinants of this consensus?

Gemini’s Response:

Elections are a complex topic with fast-changing information. To make sure you have the latest and most accurate information, try Google Search.

Oops!

ChatGPT4’s Response to the Cannonball Test

Image: Mojahid Mottakin for Shutterstock (cropped by author)

My Prompt to ChatGPT:

(See the exact same prompt that I fed to Gemini, above.)

ChatGPT’s Response:

The consensus on political opinions, such as perceptions of a political figure’s suitability for office based on age or other characteristics, is shaped by a complex interplay of psychological and social processes. In the case of the U.S. citizens’ views on the suitability of Biden or Trump for presidency based on age, several factors contribute to this consensus-building process:

Psychological Processes

  1. Cognitive Biases: Humans are subject to cognitive biases that affect their judgment. For instance, confirmation bias leads people to favor information that confirms their pre-existing beliefs. If individuals already have a preference for or against a political figure, they are more likely to perceive information about that figure’s age in a way that confirms their existing views.
  2. Social Identity Theory: This theory suggests that individuals derive part of their identity from the social groups to which they belong, including political parties or movements. This identification can influence perceptions so that the attributes of a favored political figure are viewed more positively, and those of an opposing figure more negatively.
  3. Heuristic Processing: People often use mental shortcuts, or heuristics, to make judgments. The “representativeness heuristic,” for example, might lead individuals to judge a candidate’s suitability based on how well they match the individual’s prototype of a competent leader, which can include perceptions about age.

Social Processes

  1. Social Influence and Norms: People’s opinions are influenced by those around them, including friends, family, and influential figures in society and media. The consensus can be shaped by what is perceived as the dominant or most vocal opinion within one’s social network or media consumption.
  2. Media Framing and Representation: How media outlets frame the age and capabilities of political figures can significantly influence public opinion. Media framing can highlight certain aspects while ignoring others, shaping the narrative around a candidate’s suitability based on age.
  3. Political Campaigns and Rhetoric: Political campaigns strategically shape perceptions of candidates. The way a campaign addresses concerns about a candidate’s age, either by rebutting them or by highlighting the candidate’s experience and capability, can influence public opinion.
  4. Cultural Values and Historical Context: Cultural beliefs about leadership, age, and capability also play a role. In some cultures, age is associated with wisdom and experience, while in others, youth may be valued for innovation and energy. Historical events and the current sociopolitical context can also influence how age is perceived in political leaders.

These psychological and social determinants do not operate in isolation but interact in complex ways. Public opinion is dynamic and can be influenced by current events, changes in the information environment, and shifts in social norms and values. It’s important to note that while there may be a perceived consensus on issues like the suitability of presidential candidates based on age, public opinion is often more nuanced, with significant variation across different groups and individuals.

Success!

Which Generative AI App Shall Be My Favored Research Assistant?

As you can see, on this really tough and abstract question, Google’s Gemini failed, whereas ChatGPT gave a comprehensive response in fewer than 20 seconds. I retried this same question with Gemini several times over several days. The response was always the same.

I have thrown other cannonball questions at Gemini and got mixed results. I sometimes get a reasonable response, and sometimes not. These nothing-burger responses typically ended with this advice: “To make sure you have the latest and most accurate information, try Google Search.”

Which generative AI model shall be my favored research assistant?

For now, ChatGPT will continue to be my preferred research tool, but Google’s Gemini is a close second-finisher.

What Are the Ethics of Using Generative AI as a Legitimate Research Assistant?

As I compare these two AI products for their utility as a research tool for the science of human nature, I can anticipate questions from readers about whether a science writer should be using any AI product for research. Am I cheating or violating some principle of hard sweat by using AI prompts?

Medium’s CEO, Tony Stubblebein, has written about the injustice of AI companies scraping from the internet copyrighted information, including that of Medium writers, without giving back to writers the three “Cs”: consent, compensation, and credit.

Generative AI is in its early stages of development as a tool available to the public, which can now use or abuse this technology in countless ways.

I am well aware that generative AI poses many dangers from which consumers will need protection, such as deep fake videos and fake voice representations. These misrepresentations can have a devastating effect on a public that cannot tell if a political message has been weaponized by polarizing chicanery.

As noted by Tony Stubblebein, there is also the ethical dilemma of AI models generating their content using copyrighted data that has not been licensed. This is a major problem, with solutions being wrangled by governments worldwide.

As someone interested in science writing long before joining Medium, I have built my own indexed library of journal articles (thank you, EndNote) and news clippings (thank you, Evernote). Shall I turn my head away from the AI tools that can help me leverage my self-generated knowledge-base to gain deeper insights?

My response is that the ethic of the online content that has been sucked into LLMs is under consideration by governmental and industry regulators. I will look forward to rules that will improve the ethical use of generative AI.

For now, I will defend the use of generative AI models as legitimate research tools for these reasons:

1. We live in a global community of connected nations.

2. All nations share, unequally, the same problems of scarce resources, climate control, and quality of life.

3. Science and technology are important components for mitigating these problems.

4. No single science or technology can work in isolation to solve these problems.

5. The world’s complex problems will require solutions generated by transdisciplinary sciences and transcultural cooperation.

6. Large language AI models (LLMs) are the world’s best opportunity for creating a transdisciplinary knowledge base that can be accessed by researchers and policy-makers on scales large and small.

To say the least, I am a small-scale science writer addressing topics that require a large-scale transdisciplinary perspective. I believe that as generative AI continues to evolve, this resource will increasingly supply researchers, on scales small and large, with an expansive range of transdisciplinary knowledge.

I have noted that all of my articles are written by me. I always find a primary source for any scientific “opinion” rendered by my AI apps. Even though each article I write takes far longer than I would like, having the assistance of an AI tool helps me to concentrate my independent research efforts, which abbreviates a still lengthy process. For this, I am grateful for my office mate, ChatGPT, and now an evolving Gemini, which will improve with age.

Writing
AI
ChatGPT
Science
Research
Recommended from ReadMedium