Mixtral MOE 8x7b — A new open-source giant

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

3501

Abstract

6P5.png"><figcaption></figcaption></figure><figure id="5a1c"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/0*yHhM7pcDftn1XBbI.png"><figcaption></figcaption></figure><h1 id="7224">Where can you try it out?</h1><p id="b85c">The model is available in two versions Mixtral 8X7B v0.1 and Mixtral 8X7B v0.1 Instruct.</p><p id="3a00">Here we have the original versionw by Mixtral on hugginfaces:</p><div id="0186" class="link-block"> <a href="https://huggingface.co/mistralai/Mixtral-8x7B-v0.1"> <div> <div> <h2>mistralai/Mixtral-8x7B-v0.1 · Hugging Face</h2> <div><h3>We're on a journey to advance and democratize artificial intelligence through open source and open science.</h3></div> <div><p>huggingface.co</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*pRk1yxr_mBDwvDCT)"></div> </div> </div> </a> </div><div id="c612" class="link-block"> <a href="https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1"> <div> <div> <h2>mistralai/Mixtral-8x7B-Instruct-v0.1 · Hugging Face</h2> <div><h3>We're on a journey to advance and democratize artificial intelligence through open source and open science.</h3></div> <div><p>huggingface.co</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*ujsSUS7sF-JKfLPE)"></div> </div> </div> </a> </div><p id="b416">But it is a very big model and to who may have difficulty to have access to some very powerful GPUs here are some other options:</p><ul><li><b>Quantized</b>:</li></ul><p id="c375">The Bloke added quantized versions of the model in the formats GPTQ, AWQ and GGUF on his hugginface. The GGUF version that runs on CPU has a smaller RAM requirement of 18.14G, a medium of 28.94G and the bigger with 49.62G.</p><div id="4066" class="link-block"> <a href="https://huggingface.co/TheBloke/Mixtral-8x7B-v0.1-GGUF"> <div> <div> <h2>TheBloke/Mixtral-8x7B-v0.1-GGUF · Hugging Face</h2> <div><h3>We're on a journey to advance and democratize artificial intelligence through open source and open science.</h3></div> <div><p>huggingface.co</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*ojMgySNrZHQIuoLl)"></div> </div> </div> </a> </div><div id="2223" class="link-block"> <a href="https://huggingface.co/TheBloke/Mixtral-8x7B-Instruct-v0.1-GGUF"> <div> <div> <h2>TheBloke/Mixtral-8x7B-Instruct-v0.1-GGUF · Hugging Face</h2> <div><h3>We're on a journey to advance and democratize artificial intelligence through open source and open science.</h3></div> <div><p>huggingface.co</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*EKZNi2AudrcAp0-l)"></div> </div> </div> </a> </div><p id="f7b7">And this hugginface user has made available a v

Options

ersion of the model in a chat tuned version:</p><div id="24a1" class="link-block"> <a href="https://huggingface.co/mattshumer/mistral-8x7b-chat"> <div> <div> <h2>mattshumer/mistral-8x7b-chat · Hugging Face</h2> <div><h3>We're on a journey to advance and democratize artificial intelligence through open source and open science.</h3></div> <div><p>huggingface.co</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*PE__FJ3BcFVTJ3V5)"></div> </div> </div> </a> </div><p id="debd">Now, if even the quantized version is to big for you there is a few other options.</p><ul><li><b>Mixtral API</b></li></ul><p id="744d">Mixtral has made avaliable versions of the model in their api that you can asked for access.</p><ul><li><b>Perplexity Labs</b></li></ul><p id="f906">Perplexity made available the model on the chatbot Perplexity Labs for testing purposes.</p><p id="95ee"><a href="https://labs.perplexity.ai">https://labs.perplexity.ai</a></p><ul><li><b>Vercel</b></li></ul><p id="c745">The model is also available for testing on Vercel online SDK.</p><div id="0dd4" class="link-block"> <a href="https://sdk.vercel.ai"> <div> <div> <h2>Vercel AI SDK</h2> <div><h3>Use the latest AI language models with the Vercel AI Playground</h3></div> <div><p>sdk.vercel.ai</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*xdK5CJAzvSTMuPox)"></div> </div> </div> </a> </div><p id="03ac">To who may interest, i am making a Generative AI Basic Course where i teach how to use diverse online tools to make all type of generations: text, image, video and audio. The course is still being developed but it already has the whole curriculum that will be covered displayed and the complete text, image and video generation are complete with audio following son after.</p><p id="c006">For those who join in this earlier stages i am giving a discount of 35% on the price with the cupom EARLIERADOPTERS. Follow the link to access it:</p><div id="86c1" class="link-block"> <a href="https://hotmart.com/en/marketplace/products/from-zero-to-generation-practical-introduction-to-generative-ai/I87440732O?source=post_page-----2f5d0af42529--------------------------------"> <div> <div> <h2>From Zero to Generation: Basic Introduction to Generative AI - Danielle Schmitt França | Hotmart</h2> <div><h3>An online space for you to learn all about Information Technology</h3></div> <div><p>hotmart.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*AHXM8ukH1e70R1rw)"></div> </div> </div> </a> </div><p id="f67e">Link: <a href="https://hotmart.com/en/marketplace/products/from-zero-to-generation-practical-introduction-to-generative-ai/I87440732O">https://hotmart.com/en/marketplace/products/from-zero-to-generation-practical-introduction-to-generative-ai/I87440732O?source=post_page-----2f5d0af42529--------------------------------</a></p></article></body>

Mixtral MOE 8x7b — A new open-source giant

The french company Mixtral really isn’t staying behind the american ones, her model Mistral 7b is considered one of the best open-source models released surpassing even bigger models and now it launches one of the first successful implementations of the MOE (Mix of Experts) architecture in a open-source model.

NA: To those not familiar with this architecture it is the same rumored used by GPT4 by the leak earlier this year and by the results we see on this model sure look like it’s true, to know more about this you can read my earlier post:

This weekend they released the model to anyone to download on torrent causing a commotion in the community. Today we see various fine-tuned versions of hugginface user’s that can be tried out.

But first let’s take a step back:

What is the MOE Architecture?

The “Mixture of Experts” (MoE) architecture in machine learning is like having a team of specialists where each member is good at solving a specific type of problem. Imagine you have a big, complex problem to solve. Instead of asking one person (or one model) to do everything, you divide the problem into smaller parts and assign each part to a different expert who is really good at that specific thing.

So basically instead of one big model running, we have a set of smaller specialists on a task model and an entering gateway that decides which model to use depending on the prompt.

A few advantages of using this is a faster inference (since the smaller model executes faster than a big one), more efficient and effective problem-solving, can be scaled easily by adding more experts to the model and an efficient dynamic resources allocation.

Some disadvantages can be complexity in the implementation and difficulty in training.

How was Mixtral MOE implemented?

Mistral as appointed in his name has 8 experts of 7b parameters, it’s a decoder only model (as is more common to llms) where the feedforward block (the gate) chooses one or at least two of those each time and then combines their output to generate the final answer.

Mixtral has 46.7B total parameters but only uses 12.9B parameters per token. It, therefore, processes input and generates output at the same speed and for the same cost as a 12.9B model.

It has a context window of 32K, can understand English, French, Italian, German and Spanish and shows strong performance in code generation.

Mixtral MOE 8x7b — A new open-source giant

GPT4- All Details Leaked

The details about the best LLM model trainning and architecture and others revealed,

What is the MOE Architecture?

How was Mixtral MOE implemented?

How it performed?

Where can you try it out?

mistralai/Mixtral-8x7B-v0.1 · Hugging Face

We're on a journey to advance and democratize artificial intelligence through open source and open science.

mistralai/Mixtral-8x7B-Instruct-v0.1 · Hugging Face

We're on a journey to advance and democratize artificial intelligence through open source and open science.

TheBloke/Mixtral-8x7B-v0.1-GGUF · Hugging Face

We're on a journey to advance and democratize artificial intelligence through open source and open science.

TheBloke/Mixtral-8x7B-Instruct-v0.1-GGUF · Hugging Face

We're on a journey to advance and democratize artificial intelligence through open source and open science.

mattshumer/mistral-8x7b-chat · Hugging Face

We're on a journey to advance and democratize artificial intelligence through open source and open science.

Vercel AI SDK

Use the latest AI language models with the Vercel AI Playground

From Zero to Generation: Basic Introduction to Generative AI - Danielle Schmitt França | Hotmart

An online space for you to learn all about Information Technology