Summary

A new open-source model named MiQu has emerged on HuggingFace, sparking excitement and speculation within the AI community due to its performance that rivals GPT-4, with connections to the successful AI start-up Mistral.

Abstract

The AI community is abuzz with the sudden appearance of MiQu, a 70B model on HuggingFace, which has shown performance comparable to GPT-4 on EQ-Bench. This mysterious model has prompted discussions and comparisons with Mistral, a prominent European AI start-up known for its open-source contributions and significant funding. Mistral's previous models, such as the Mistral 7B and 8x7B, have been well-received, and the community is intrigued by the possibility that MiQu could be a quantized version of a Mistral model. The model's unexpected release, potentially by an employee of Mistral or an early access customer, has raised questions about the intent behind its availability on HuggingFace. The situation is particularly noteworthy as it underscores the potential of open-source models to compete with proprietary giants like OpenAI and Google, offering state-of-the-art capabilities for free and challenging the status quo of AI development.

Opinions

The AI community is excited and intrigued by the performance of the MiQu model, considering it a potential rival to GPT-4.
Some speculate that MiQu could be a quantized version of a Mistral model, possibly released by an over-enthusiastic employee or in an unconventional manner, as seen with previous Mistral releases.
The release of MiQu has significant implications for the AI industry, demonstrating that open-source models can achieve state-of-the-art results and could potentially disrupt the dominance of large proprietary models.
The incident highlights the importance of open-source contributions in democratizing access to cutting-edge AI technologies.
The community is divided on whether the release was intentional or accidental, but the presence of the model on HuggingFace, despite the lack of immediate action to remove it, suggests a possible tacit endorsement of its availability.

| ARTIFICIAL INTELLIGENCE| LLMs | AI|

MiQu: Can a mysterious model be a GPT-4 rival?

An open-source model seems to be performing as GPT-4 but we do not know much about it

A new model appears on HuggingFace and immediately something strange is noticed. An unsolved mystery that inflames the data scientist community until official communication arrives. What is MiQu? Who built it? And why so much excitement?

In short, we discuss it here.

The mysterious model

screenshot by the author. image source: here

The open-source community has been shaken by a small earthquake, and this one is the epicenter. Who posted this model? More importantly, what model would it be?

A 70B model appeared on HuggingFace and immediately caught the eye. Why? Some users noticed that the prompt format is the same as Mistral.

To recap. In recent months Mistral has emerged as the leading open-source alternative to OpenAI. For the few who don’t know, Mistral is a Paris-based start-up founded by a few researchers who came out of large companies. Mistral in a few months became a unicorn and received significant funding. Mistral recently became one of the richest European start-ups in December with nearly $500 million raised (valued at nearly $2B to date).

Mistral AI continues its mission to deliver the best open models to the developer community. Moving forward in AI requires taking new technological turns beyond reusing well-known architectures and training paradigms. Most importantly, it requires making the community benefit from original models to foster new inventions and usages. (source)

From what we read on their website, Mistral is intent on revolutionizing AI by releasing models in open source. So far, it has released two models that have been hugely successful:

Mistral 7B, is a large language model that performs equal to LLaMA 13B.
Mistral 8x7B, is a high-quality sparse mixture of expert models (SMoE) that manages to compete with the largest model in the LLaMA family.

Returning to Miqu, it became apparent that something was out of the norm because a link was posted on 4Chan. Those who remember, when LLaMA was published one could only access the model weights after filling out a form on the META site. Until someone released the weights on 4Chan.

Ok, weird, but why so much interest?

Because it appears that the model is not simply an LLM, but something capable of rivaling GPT-4 on EQ-Bench. Clearly, this is a bomb.

The mystery deepens because this model has similar results to a model called Mistral Medium found on Perplexity. According to some, this model might stand for MIstral QUantized (or MiQu for friends).

So according to some, this model is a Mistral model that has been quantized. Quantization is a machine learning technique where you reduce the weight of a model at the cost of losing some accuracy.

The question remains whether it was Mistral who secretly released it or someone in the company who decided to release it without consent.

In the end, it seems the CEO of HuggingFace confirmed that it was an employee:

An over-enthusiastic employee of one of our early access customers leaked a quantised (and watermarked) version of an old model we trained and distributed quite openly.

He also seems to have taken it well, since he also commented like this

Parting Thoughts

We don’t know if it was intentional or not, though, the model is still on HuggingFace and the company does not seem intent on removing it. On the other hand, Mistral is no stranger to releasing models in unusual ways; the previous model Mixtral 8x7B was released via torrent.

Whether intended or not, this model looks extremely promising. We don’t know if an unquantized or fine-tuned version will be able to beat GPT-4, though if it did it would be a record for open-source (and certainly problematic for companies like Google or OpenAI). Why?

Well, because a model that reaches the state of the art would be available to everyone and for free. This also has a more political significance: the open-source community can get models that can play catch-up with the best proprietary models.

What do you think about it? Let me know in the comments!

If you have found this interesting:

You can look for my other articles, and you can also connect or reach me on LinkedIn. Check this repository containing weekly updated ML & AI news. I am open to collaborations and projects and you can reach me on LinkedIn.

Here is the link to my GitHub repository, where I am collecting code and many resources related to machine learning, artificial intelligence, and more.

GitHub — SalvatoreRa/tutorial: Tutorials on machine learning, artificial intelligence, data science…

Tutorials on machine learning, artificial intelligence, data science with math explanation and reusable code (in python…

github.com

or you may be interested in one of my recent articles:

Cognition is Struggling: Natural and Artificial Brains Evolve from Constriction

Evolutive forces shape the brain, what if we apply the same forces to AI?

levelup.gitconnected.com

Human-Centered Loss Functions: Not All the Risks Are the Same

Aligning large language models with human behavior in uncertain futures

levelup.gitconnected.com

SwitchHead: Be Faster To Catch the Prey

How MoE applied to self-attention can make your model faster and performing

levelup.gitconnected.com

Grokking: Learning Is Generalization and Not Memorization

Understanding how a neural network learns helps us to avoid that the model from forgetting what it learns

levelup.gitconnected.com