avatarTristan Wolff

Summary

Mistral AI has released an open-source language model named Mixtral-8x7B-32kseqlen, which utilizes the same architecture as OpenAI's GPT-4, potentially revolutionizing the open-source AI community.

Abstract

Mistral AI, a startup, has made headlines by releasing an open-source language model, Mixtral-8x7B-32kseqlen, that rivals the capabilities of OpenAI's GPT-4. This model, which can be downloaded from a link on X, boasts a context size of 32,000 tokens and employs a "Mixture of Experts" architecture, featuring 8 expert models with 7 billion parameters each. The release of Mixtral has sparked excitement within the developer community for its advanced architecture and potential to democratize access to cutting-edge AI technology, much like Stable Diffusion did for AI image processing. Unlike the fanfare surrounding Google's announcement of their "Gemini" model, Mistral AI's low-key approach targets practical AI users and developers, signaling a shift in how AI advancements are shared and utilized.

Opinions

  • The author suggests that Mistral AI's release of Mixtral could be more impactful than Google's highly publicized "Gemini" model, emphasizing substance over spectacle.
  • There is a subtle critique of Google's approach to announcing their AI model, with the author implying that the details and capabilities of Google's "Gemini" are less clear and possibly exaggerated.
  • The open-source nature of Mixtral is celebrated as it allows for widespread innovation and development in the AI field, potentially leading to a significant shift ("a quiet revolution") in AI accessibility.
  • The "Mixture of Experts" approach is highlighted as a key factor in the success of GPT-4 and now Mixtral, indicating its importance in the future of AI model training.
  • The author expresses enthusiasm for the potential of Mixtral to become a game-changer for the open-source community, drawing parallels to the impact of Stable Diffusion in AI image processing.
  • The article encourages reader engagement through claps, follows, and comments, and invites readers to support the author's work by using their Medium referral link to become a member.

New LLM by Mistral AI

A Quiet Revolution? Mistral AI Releases Sensational New AI Model

GPT-4’s “secret weapon” is available as open source

Image by the author & Midjourney

At the same time that Google announced their new “Gemini” model with great fanfare, including a press tour and a spectacular (but possibly not entirely honest) demo video, a quiet revolution may have begun.

No, we’re not talking about Gemini (which isn’t even fully available yet and where some details about the model’s capabilities remain unclear to say the least.

https://arstechnica.com/information-technology/2023/12/google-admits-it-fudged-a-gemini-ai-demo-video-which-critics-say-misled-viewers/

Instead, we will look at the startup Mistral AI, that posted a download link to their latest language model on X.

Just the link.

No comment.

Casually dropping the world’s first open source language model based on the same architecture the OpenAI flagship GPT-4 runs on.

MistralAI’s Download Link:www.twitter.com

Let’s find out why the open source community is so excited about this.

An Open-Source Alternative to GPT-4?

Let’s first look at that download link that has been posted on X and the files it leads to.

I mean, just reading the file name of that model must have been a true delight to developers: Mixtral-8x7B-32kseqlen

Nope, not the name of Elon Musk’s next child but actually a preview of the the new language model’s capabilities.

And these are impressive:

  • the context size is 32k tokens (equivalent to ChatGPT standards)
  • the model’s architecture is also that of GPT-4: the so-called “Mixture Of Experts,” where several highly specialized language models (“experts”) are combined (in the case of Mixtral, these are 8 experts with 7 billion parameters each: “8x7B”)

Why Is “Mixture of Experts” So Important?

The concept originates from the early 90s and, as it turned out, underpins the success of GPT-4: “Mixture of Experts” (MoE) is a training method for AI systems in which, instead of a single model learning everything, a combination of sub-models is used.

Imagine it as a team of experts working together to solve a complex problem.

To efficiently coordinate this expert round, an additional so-called gating network comes into play, which can be thought of as a kind of team leader assigning tasks to the experts.

https://machinelearningmastery.com/mixture-of-experts/

Interestingly, the gating network does not have to rely on a single expert but can combine the insights of several experts, with multiple nuanced viewpoints contributing to solving a problem.

In the case of Mixtral, the gating network would decide which experts should contribute to text prediction. Interestingly, Mixtral’s current metadata reveals that the model will consult 2 of the 8 available experts for each text prediction (i.e., each individual token is calculated through the interplay of two specialized sub-models).

Thus, Mixtral utilizes one of the most advanced model architectures and has the potential to become a game-changer for the open-source community — similar to what we experienced with the release of the AI image model Stable Diffusion, as countless developers worldwide were able to develop new AI models and workflows, which are now standard repertoire in generative AI for image processing.

How To Use Mixtral 8x7B?

At the time of writing, there’s only one platform offering free testing of Mixtral: Poe.com

(this section will be updated as soon as we get more inference endpoints)

A Quiet Revolution?

In contrast to Google’s glitz and glamour show, Mistral AI’s release strategy seems to specifically target those who actually work with AI: developers looking for a publicly accessible and extremely powerful AI model to adapt to their field of work.

And with MoE entering the open source space Mistral may indeed have started a quiet revolution — not just for developers but for anybody who is experimenting with AI and looking for new creative possibilities.

Please, if you liked the article, be so kind and leave some claps, follow me and feel free to comment with questions or suggestions. ❤️ 🙏

➡️ If you want to support my work, become a Medium member using my referral link and get full access to all my articles (180+ and growing) and those of thousands of other writers. 🙏

➡️ If you like my content, why not leave a “clap” at the end of this article, so more people can see it?

Artificial Intelligence
Open Source
Programming
Technology
Creativity
Recommended from ReadMedium