avatarThe PyCoach

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

3649

Abstract

amp;url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DUIZAiXYceBI&image=https%3A%2F%2Fi.ytimg.com%2Fvi%2FUIZAiXYceBI%2Fhqdefault.jpg&key=a19fcc184b9711e1b4764040d3dc5c07&type=text%2Fhtml&schema=youtube" allowfullscreen="" frameborder="0" height="480" width="854"> </div> </div> </figure></iframe></div></div></figure><p id="ffac">In the demo, Google shows off Gemini’s multimodal capabilities. We see how we can easily talk with the AI, how it can recognize your images quickly, track objects in real time, and more.</p><p id="aff0">Very impressive … until you open the video description and read this.</p><blockquote id="bc7a"><p>For the purposes of this demo, latency has been reduced and Gemini outputs have been shortened for brevity.</p></blockquote><p id="5136">So neither the video happened in real-time nor the spoken prompts were used.</p><p id="d3e8">In fact, according to a <a href="https://www.theverge.com/2023/12/7/23992737/google-gemini-misrepresentation-ai-accusation">Bloomberg report</a>, Google admitted when asked for comment that the video demo didn’t happen in real-time with spoken prompts but instead used still image frames from raw footage and then wrote out text prompts to which Gemini responded.</p><p id="d462">When searching for more information about the demo released by Google, I came across this <a href="https://developers.googleblog.com/2023/12/how-its-made-gemini-multimodal-prompting.html">How it’s Made</a> article in the Google blog. I was surprised when I discovered that what seemed to be one of Gemini’s differentiators compared to GPT-4 (the ability to understand and generate responses considering the video modality) was, in reality, a sequence of pre-established image frames.</p><p id="ff0a">Here’s how the rock, paper, scissor clip was made.</p><figure id="7fca"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/0*p9IyVetVz39u7o-M.png"><figcaption></figcaption></figure><figure id="5d94"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/0*1y9LNg_6bgOwwrAE.png"><figcaption></figcaption></figure><figure id="ee24"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/0*xoMt6XGitQEixEXj.png"><figcaption></figcaption></figure><p id="7523">In the demo, the ability of Gemini to interpret the game of rock, paper, scissors <i>in real time</i> was impressive. However, in reality, it might not be that impressive.</p><p id="3210">We all know that for obvious reasons, they had to omit all the details above, but the hands-on looks more like an ad.</p><p id="6713">Besides the video editing, there’s also the actual prompt that was used.</p><p id="4ee3">The prompts used to get the results in the video and those we hear in the video are different.</p><p id="a6ea">Here’s an example. In minute 4:36 of the demo, we hear “based on the design, which of these would go faster?” referring to the two images on the table. Gemini responds “The car on the right will go faster. It’s more aerodynamic”</p><p id="e786">However, this was the actual prompt used.</p><figure id="aa96"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*VQ0OzM3aNKeApaCKZTrr3w.png"><figcaption>Google blog</figcaption></figure><p id="e179">As you can see, there’s a difference between the prompt spoken in the video and the prompt written to get the results we’ve seen.</p><p id="0b16">For some, the demo raises doubts about Gemini’s capabilities. I can’t tell whether Gemini Ultra is as good as some say until I do my own hands-on or see someone do one without all this fancy editing.</p><h2 id="4148">What about the training data

Options

?</h2><p id="25eb">Many have pointed out on Twitter (X) that Google hasn’t provided any information on how the training data was made or filtered, which is ironic because even they say the training data is key.</p> <figure id="7e33"> <div> <div> <img class="ratio" src="http://placehold.it/16x9"> <iframe src="https://cdn.embedly.com/widgets/media.html?type=text%2Fhtml&amp;key=a19fcc184b9711e1b4764040d3dc5c07&amp;schema=twitter&amp;url=https%3A//twitter.com/JesseDodge/status/1732444597593203111%3Fref_src%3Dtwsrc%255Etfw%257Ctwcamp%255Etweetembed%257Ctwterm%255E1732444597593203111%257Ctwgr%255E605f3091f7d8fe5c39fae4c5bee8ad5678e226f5%257Ctwcon%255Es1_%26ref_url%3Dhttps%253A%252F%252Fwww.businessinsider.com%252Fgoogle-gemini-ai-performance-openai-chatgpt-gpt4-2023-12&amp;image=https%3A//i.embed.ly/1/image%3Furl%3Dhttps%253A%252F%252Fabs.twimg.com%252Ferrors%252Flogo46x38.png%26key%3Da19fcc184b9711e1b4764040d3dc5c07" allowfullscreen="" frameborder="0" height="281" width="500"> </div> </div> </figure></iframe></div></div></figure><p id="abbc">This tweet was responded to by Jeff Dean, Chief Scientist of Google DeepMind and Google Research.</p><figure id="41da"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/0*U4qYbYPKJILg0IQ4.png"><figcaption><a href="https://twitter.com/JeffDean/status/1732461004901282238">Twitter (X)</a></figcaption></figure><p id="adbd">Hopefully, people will have access to the Gemini Ultra model soon and we’ll see whether it can live up to the hype.</p><p id="e9ca">While you may approach this latest news with excitement, it’s important to be cautious.</p><ul><li>Gemini might not be as good as it seems to be in the demo</li><li>Gemini Ultra isn’t available yet. Gemini Pro is available in Bard, but it only competes with GPT-3.5</li><li>Details about the training data used for the tests were not provided yet</li></ul><p id="50ca">We should be a bit cautious, considering the less-than-ideal experience when Bard was launched on a grand scale in early 2023. Despite the initial hype, it turned out to be a disappointment for users due to various errors that emerged when it was tried by users.</p><p id="7791">That said, if Gemini Ultra is as good as it seems, I’ll be praising it after I do my own hands-on with Gemini.</p><p id="b1d7">In the meantime, I just can’t buy the hype.</p><p id="5916"><a href="https://artificialcorner.substack.com/p/redeem-my-udemy-courses-for-free"><b>Join my newsletter with 35K+ people to get my free cheat sheets: ChatGPT, web scraping, Python for data science, automation, and more!</b></a></p><p id="9e58">If you enjoy reading stories like these and want to support me as a writer, subscribe to my <a href="https://artificialcorner.substack.com/subscribe">Substack</a>. On Substack, I publish articles that you won’t find on the other platforms where I create content.</p><div id="fae9" class="link-block"> <a href="https://artificialcorner.substack.com/subscribe"> <div> <div> <h2>Subscribe to Artificial Corner by ThePyCoach</h2> <div><h3>Artificial Intelligence in plain English. In-depth tutorials to make the most of ChatGPT and other AI tools. The latest…</h3></div> <div><p>artificialcorner.substack.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*DWj0dQVuxM-KZoQh)"></div> </div> </div> </a> </div></article></body>

Here’s Why I Still Don’t Buy the Hype of Google Gemini

Gemini might not be as good as it seems to be.

Source: Google

Google’s Gemini was just unveiled, and I haven’t seen so much hype since ChatGPT was released by OpenAI.

Gemini is Google’s most powerful AI model and what makes it different from others is its multimodality. Traditionally, achieving multimodality involved using different models trained for specific tasks separately (text, image, etc). However, Gemini was built from the ground up for multimodality, which allows it to reason seamlessly across text, images, video, audio, and code.

The result? An AI that beats GPT-4 … on paper (and demos).

At least that’s what some of us feel after discovering what I’m about to show you. Here’s why I still don’t buy the hype of Gemini.

Gemini AI beats GPT-4 … but the gap isn’t that big

Probably you’ve seen the image below where Google shows that Gemini Ultra is more powerful than GPT-4.

Google

And you might’ve also seen this detailed comparison of Gemini Ultra and GPT-4.

Google

In the detailed comparison, we can see that Gemini Ultra outperforms GPT-4, but the gap is reduced if you check the 60-page paper released by Google.

Check out the MMLU comparison. The 86.4% of GPT-4 increases to 87.29% if we consider the same prompting technique for evaluation CoT@32.

Paper

The only version of Gemini available for users right now is Gemini Pro, which was integrated into Bard and is no match for GPT-4.

Gemini was introduced in three different versions.

  • Gemini Ultra: The largest and most powerful model designed to handle highly complex tasks (the one that beats GPT-4)
  • Gemini Pro: Suitable for solving a wide range of tasks. It has fewer parameters in its construction but will directly compete with GPT-3.5
  • Gemini Nano: Tailored for on-device tasks.
Google

The thing is, the model everyone is talking about, Gemini Ultra, isn’t available yet. It should be available to users through “Bard Advanced” only early next year.

In the meantime, what know about Gemini Ultra is the numbers Google showed us and a Hands-on with Gemini made by Google, which isn’t much of a “hands-on”.

The Hands-on with Gemini demo isn’t that real

I have to admit the Gemini demo blew up my mind.

In the demo, Google shows off Gemini’s multimodal capabilities. We see how we can easily talk with the AI, how it can recognize your images quickly, track objects in real time, and more.

Very impressive … until you open the video description and read this.

For the purposes of this demo, latency has been reduced and Gemini outputs have been shortened for brevity.

So neither the video happened in real-time nor the spoken prompts were used.

In fact, according to a Bloomberg report, Google admitted when asked for comment that the video demo didn’t happen in real-time with spoken prompts but instead used still image frames from raw footage and then wrote out text prompts to which Gemini responded.

When searching for more information about the demo released by Google, I came across this How it’s Made article in the Google blog. I was surprised when I discovered that what seemed to be one of Gemini’s differentiators compared to GPT-4 (the ability to understand and generate responses considering the video modality) was, in reality, a sequence of pre-established image frames.

Here’s how the rock, paper, scissor clip was made.

In the demo, the ability of Gemini to interpret the game of rock, paper, scissors in real time was impressive. However, in reality, it might not be that impressive.

We all know that for obvious reasons, they had to omit all the details above, but the hands-on looks more like an ad.

Besides the video editing, there’s also the actual prompt that was used.

The prompts used to get the results in the video and those we hear in the video are different.

Here’s an example. In minute 4:36 of the demo, we hear “based on the design, which of these would go faster?” referring to the two images on the table. Gemini responds “The car on the right will go faster. It’s more aerodynamic”

However, this was the actual prompt used.

Google blog

As you can see, there’s a difference between the prompt spoken in the video and the prompt written to get the results we’ve seen.

For some, the demo raises doubts about Gemini’s capabilities. I can’t tell whether Gemini Ultra is as good as some say until I do my own hands-on or see someone do one without all this fancy editing.

What about the training data?

Many have pointed out on Twitter (X) that Google hasn’t provided any information on how the training data was made or filtered, which is ironic because even they say the training data is key.

This tweet was responded to by Jeff Dean, Chief Scientist of Google DeepMind and Google Research.

Twitter (X)

Hopefully, people will have access to the Gemini Ultra model soon and we’ll see whether it can live up to the hype.

While you may approach this latest news with excitement, it’s important to be cautious.

  • Gemini might not be as good as it seems to be in the demo
  • Gemini Ultra isn’t available yet. Gemini Pro is available in Bard, but it only competes with GPT-3.5
  • Details about the training data used for the tests were not provided yet

We should be a bit cautious, considering the less-than-ideal experience when Bard was launched on a grand scale in early 2023. Despite the initial hype, it turned out to be a disappointment for users due to various errors that emerged when it was tried by users.

That said, if Gemini Ultra is as good as it seems, I’ll be praising it after I do my own hands-on with Gemini.

In the meantime, I just can’t buy the hype.

Join my newsletter with 35K+ people to get my free cheat sheets: ChatGPT, web scraping, Python for data science, automation, and more!

If you enjoy reading stories like these and want to support me as a writer, subscribe to my Substack. On Substack, I publish articles that you won’t find on the other platforms where I create content.

ChatGPT
Artificial Intelligence
Technology
Science
Python
Recommended from ReadMedium