Ding Dong, The King Is Dead

Friendship With Gemini Is Over, Claude 3 Is My New Best Friend

Yesterday was April Fools Day. This year we didn’t have many April Fools pranks. Especially from Google. In other years they had tons of April Fools gags. This year… Google didn’t do much. But I did get an email from them.

It explains that Google Gemini’s pay-as-you-go tier is almost ready. It will be ready on May 2nd. So let’s take a look.

What the literal… Surely this must be an April Fools joke, right? Well, it’s April second now so… no, not an April Fools joke.

So the reason why I’m so upset about this- Well, maybe upset is not the right word for it. I can still use Gemini 1.0 Pro for the same cost as GPT 3.5 but I’m just disappointed. I’m disappointed because I thought Gemini’s Pro models would all be pretty cheap. I thought that we’d get Gemini Ultra and that would be the GPT-4 competitor. But it looks like instead Gemini 1.5 Pro is the GPT-4 competitor.

I thought that Gemini 1.5 Pro was going to be roughly the same cost as Gemini 1.0 Pro’s original price because they announced that Gemini 1.0 Pro was getting a price drop at the same time that Gemini 1.5 Pro was announced. No, it’s more expensive. Not a little more expensive. A lot more expensive. A whopping 14x more expensive compared to Gemini 1.0 Pro.

And for some reason it’s limited to 2 requests per minute on the free plan and 5 on the paid plan. Whose idea was that? Does Google just not have enough servers available? You should be able to scale these things pretty easily. It’s not like a database where you have to keep things in sync.

And not just that, the normal Gemini 1.0 Pro plans got nerfed too. The free plan used to give you 60 requests per minute. Now it’s only 15. It’s not that bad but there are also a bunch of other requirements. Like the 32,000 tokens per minute requirement. Which sounds like a lot until you realize that Gemini Pro can use 32,768 tokens in a single request. 30,720 input tokens and 2048 output.

“Who would possibly use 32,000 tokens in a single request?” You say. I would! When summarizing YouTube videos in my app Stratum (iOS, Android) because some YouTube videos have transcripts that exceed 32,000 tokens.

Now maybe I shouldn’t be too mad. Because there is the free tier of the service. You can still pay and have these restrictions lifted. Not that much though. You’re still restricted to 120,000 tokens per minute, only about 4x more. And it’s still not clear what happens to your first few requests per minute. I thought they would be free because they overlap with the free plan but maybe not.

But perhaps the biggest reason I’m disappointed by this news is Claude 3. Now I’ve always been skeptical of Claude 3 ever since it was announced. Because Anthropic is not a huge name in the AI sphere. Not as big as Google or OpenAI. And they published this chart.

Which is definitely fishy. It shows that Claude 3 Opus beats every other model in every single test. This triggered alarm bells immediately. Like why are some of the cells blank? And many of the tests were not done under identical conditions. ‘Grade School Math’ was apparently done using 0-shot CoT for Claude 3, 5-shot CoT for GPT 4, 5-shot for GPT 3.5, and Maj1@32 for Gemini.

CoT refers to chain of thought reasoning which asks the model to explain its thinking process. 5-shot means 5 examples were given to the AI. 0-shot means 0 examples. Not sure what Maj1@32 is, but it appears to be some sort of test for AIs.

So it looks like Anthropic stitched together a bunch of data from different sources. And they didn’t even try to make the data consistent. Very suspicious.

And in fact, if you scroll down you can see this chart which shows Claude 3 not being the best in certain tests.

Now this is a much less biased chart. In fact it does not even show Claude 3 Opus winning at anything oddly enough. In fact you could argue that Gemini 1.0 Ultra beats out Claude 3 in these tests because Claude 3’s win are divided between Opus, Sonnet, and Haiku.

But as time has gone on some more news has trickled out. Most importantly the LLM leaderboard.

This is a test where users get responses from 2 random AIs and they’re asked which AI gives the better result.

Now I’ve always been skeptical of tests like this because it’s so subjective and different models do better at different things. I took the test once and I think I ended up saying that Llama 7B beat out GPT-4. 👀 OK. And people are going to write nonsense ‘tests’ like ‘Write 5 sentences each ending with the word ‘apple’’. It’s easy to test. But does it have any real-world relevance? Not really.

And in these tests it shows that GPT-4 beats out Gemini 1.0 Pro although in my apps I like the output from Gemini better. I think this is because people are biased to think a longer response is better. GPT-4 gives really long responses. But are they better? Not really. Because they don’t sound as natural, they sound like you’re talking to a robot because that’s exactly what you’re doing.

Well, regardless it does correlate with real-world performance. So it’s interesting that Claude 3 Opus has beat out all of the GPT 4 models. And what’s more, Claude 3 Haiku and Sonnet (the two lower-cost versions) are not too far behind. Both even manage to beat out an earlier version of GPT-4 and all but one Gemini Pro model.

I’m actually most interested in Claude 3 Haiku because it’s the cheapest and the fastest. I argued before that we don’t need GPT 5, we need 4.5:

What I’d like to see is for companies to take their foot off the gas pedal for a second and optimize their models for speed. Hold back on giving us GPT 5 and instead give us GPT 4.5. Like GPT 4 but cheaper and faster. And just like how GPT 3.5 revolutionized everything with its cheap and fast replies, I think GPT 4.5 can revolutionize everything once again too.

Well, isn’t that Claude 3 Haiku? If it really is as fast as it says it is it could be that mythical GPT 4.5 I’ve been looking for. In fact I like everything about Claude 3 Haiku.

It’s also cheaper than Gemini 1.0 Pro. The output tokens aren’t that much cheaper but the input tokens are a lot cheaper, half the price. And my prompts are very very large, like feeding it a 20,000+ character YouTube transcript large. And the response is only like a few thousand characters. So what this means is Claude 3 Haiku is going to give me a significant cost reduction.

I also like the context window. 200,000 characters. This is one of the reasons I was so excited for Gemini 1.5 Pro. As already mentioned some YouTube videos are too long to feed Gemini 1.0 Pro. But they’re not that much longer. They’re only like 80,000 characters. So 1 million tokens is a little overkill. 200,000 tokens is just right and all Claude 3 models deliver this.

So looking at the future I fully expect to be ripping out my Gemini code and putting in calls to Claude 3 Haiku. It’ll be pretty easy too because my code no longer calls the Gemini API directly. Instead it calls a Firebase Cloud Function which then forwards the request. This allows me to do some additional authentication on the server and I can avoid exposing my API keys. I’m actually planning another post on this.

I’m really disappointed by Gemini. It looked like Google was making all the right moves. But Anthropic somehow leapfrogged the trillion-dollar company and made a cheaper and better AI.

I wonder how Google and OpenAI will respond. I highly doubt they’re just going to wait to get clobbered by Anthropic. I fully expect OpenAI to announce something. GPT-4 is actually quite an old model and GPT-4 Turbo is almost 6 months old. GPT-3.5, their budget model, is even older. And Google… Google is going to have to do something. To see Gemini Pro get beaten so badly before its official launch?

The AI industry is just moving so quickly. It’s interesting to watch. It’s hard to believe just over a year ago I was paying way more money for responses that weren’t nearly as good. And the increased accuracy of these models and lower prices is opening up way more opportunities. AI is going to be shaking up the app industry in a big way. And I plan to be on the forefront of that shakeup.

Summarize

Ding Dong, The King Is Dead

Friendship With Gemini Is Over, Claude 3 Is My New Best Friend