
Ding Dong, The King Is Dead
Friendship With Gemini Is Over, Claude 3 Is My New Best Friend
Yesterday was April Fools Day. This year we didn’t have many April Fools pranks. Especially from Google. In other years they had tons of April Fools gags. This year… Google didn’t do much. But I did get an email from them.

It explains that Google Gemini’s pay-as-you-go tier is almost ready. It will be ready on May 2nd. So let’s take a look.

What the literal… Surely this must be an April Fools joke, right? Well, it’s April second now so… no, not an April Fools joke.
So the reason why I’m so upset about this- Well, maybe upset is not the right word for it. I can still use Gemini 1.0 Pro for the same cost as GPT 3.5 but I’m just disappointed. I’m disappointed because I thought Gemini’s Pro models would all be pretty cheap. I thought that we’d get Gemini Ultra and that would be the GPT-4 competitor. But it looks like instead Gemini 1.5 Pro is the GPT-4 competitor.
I thought that Gemini 1.5 Pro was going to be roughly the same cost as Gemini 1.0 Pro’s original price because they announced that Gemini 1.0 Pro was getting a price drop at the same time that Gemini 1.5 Pro was announced. No, it’s more expensive. Not a little more expensive. A lot more expensive. A whopping 14x more expensive compared to Gemini 1.0 Pro.
And for some reason it’s limited to 2 requests per minute on the free plan and 5 on the paid plan. Whose idea was that? Does Google just not have enough servers available? You should be able to scale these things pretty easily. It’s not like a database where you have to keep things in sync.
And not just that, the normal Gemini 1.0 Pro plans got nerfed too. The free plan used to give you 60 requests per minute. Now it’s only 15. It’s not that bad but there are also a bunch of other requirements. Like the 32,000 tokens per minute requirement. Which sounds like a lot until you realize that Gemini Pro can use 32,768 tokens in a single request. 30,720 input tokens and 2048 output.
“Who would possibly use 32,000 tokens in a single request?” You say. I would! When summarizing YouTube videos in my app Stratum (iOS, Android) because some YouTube videos have transcripts that exceed 32,000 tokens.
Now maybe I shouldn’t be too mad. Because there is the free tier of the service. You can still pay and have these restrictions lifted. Not that much though. You’re still restricted to 120,000 tokens per minute, only about 4x more. And it’s still not clear what happens to your first few requests per minute. I thought they would be free because they overlap with the free plan but maybe not.
But perhaps the biggest reason I’m disappointed by this news is Claude 3. Now I’ve always been skeptical of Claude 3 ever since it was announced. Because Anthropic is not a huge name in the AI sphere. Not as big as Google or OpenAI. And they published this chart.
Which is definitely fishy. It shows that Claude 3 Opus beats every other model in every single test. This triggered alarm bells immediately. Like why are some of the cells blank? And many of the tests were not done under identical conditions. ‘Grade School Math’ was apparently done using 0-shot CoT for Claude 3, 5-shot CoT for GPT 4, 5-shot for GPT 3.5, and Maj1@32 for Gemini.
CoT refers to chain of thought reasoning which asks the model to explain its thinking process. 5-shot means 5 examples were given to the AI. 0-shot means 0 examples. Not sure what Maj1@32 is, but it appears to be some sort of test for AIs.
So it looks like Anthropic stitched together a bunch of data from different sources. And they didn’t even try to make the data consistent. Very suspicious.
And in fact, if you scroll down you can see this chart which shows Claude 3 not being the best in certain tests.
Now this is a much less biased chart. In fact it does not even show Claude 3 Opus winning at anything oddly enough. In fact you could argue that Gemini 1.0 Ultra beats out Claude 3 in these tests because Claude 3’s win are divided between Opus, Sonnet, and Haiku.
But as time has gone on some more news has trickled out. Most importantly the LLM leaderboard.
