Meta's LLaMA 2.0 is a disruptive, commercially available AI model that aims to reshape the chatbot and large language model (LLM) landscape through its open-source approach and collaboration with Microsoft.
Abstract
Meta has launched LLaMA 2.0, an advanced AI model developed in collaboration with Microsoft, which is not only freely available for commercial use but also demonstrates superior performance compared to its predecessor. This new model, which has been trained on a significantly larger dataset and offers a longer context window, is poised to challenge the dominance of closed-source models like GPT-4 and PaLM. While it does not surpass the performance of these models, LLaMA 2.0's open-source nature positions it as a community-driven alternative that promises to foster innovation and democratize access to cutting-edge AI technology. The model's architecture includes technical enhancements such as Grouped-Query Attention, and it has been fine-tuned to mitigate biases and toxic behaviors, although it is not entirely free from such issues. LLaMA 2.0's release has sparked discussions about the future of AI, with its potential impact on research, business, and the broader societal implications of open-source AI models.
Opinions
The partnership between Meta and Microsoft is seen as a strategic move to democratize access to AI technologies and counter the dominance of companies like Google and OpenAI.
The open-source release of LLaMA 2.0 is expected to encourage widespread experimentation and the creation of new applications within the research community and beyond.
Some opinions suggest that the risks associated with releasing LLaMA 2.0 as open-source are lower for Meta compared to its competitors, potentially offering significant advantages without the same level of exposure.
There is an opinion that the choice of using only public data for training LLaMA 2.0 is a wise decision to avoid legal issues related to data privacy and to align with the ethos of open-source development.
The release of LLaMA 2.0 has sparked a debate about the balance between model performance and the ethical implications of AI, including concerns about bias and toxicity in language models.
The integration of LLaMA 2.0 into consumer applications like Instagram and WhatsApp is anticipated to have a significant impact on how users interact with AI in their daily lives.
There is an opinion that the open-source nature of LLaMA 2.0 could lead to the development of niche applications that may disrupt the market share of larger companies offering proprietary AI solutions.
Some critics have pointed out that despite being labeled as open-source, LLaMA 2.0's license includes usage restrictions, which may limit its application for certain large-scale users and uses.
The article conveys an opinion that the future of AI will be shaped by the interplay between open-source innovation and the regulatory landscape, including lawsuits and emerging AI regulations.
|CHATBOT | LLM | ARTIFICIAL INTELLIGENCE|
META LLaMA 2.0: the most disruptive AInimal
Meta LLaMA can reshape the chatbot and LLM usage landscape
Meta announced LLaMA2, which is not only commercially available but has outstanding performance. In this article, we find out what’s new and why it’s important
Second, the model is commercially available, can be downloaded freely, and is free to use. LLaMA 1.0, on the other hand, was only for the use of researchers and could only be used after filling out a form (then the model weights are leaked but that is another story). Third, it is trained on much more data and has better performance.
and here is a summary of the news (do not worry we will discuss it in detail):
“Llama 2 pretrained models are trained on 2 trillion tokens, and have double the context length than Llama 1. Its fine-tuned models have been trained on over 1 million human annotations.” image source: here
To be fair, META and Microsoft have collaborated before. In fact, they have collaborated in setting up Open Neural Network Exchange (ONNX) format, a system that allows a deep learning model to be transposed between formats. In addition, Microsoft also collaborates with PyTorch (which was a project that META bet on). Microsoft has also decided to participate to create immersive experiences for the metaverse. In addition, both META and Microsoft participate together in several initiatives.
Now, with this expanded partnership, Microsoft and Meta are supporting an open approach to provide increased access to foundational AI technologies to the benefits of businesses globally. (source)
So far, though, LLMs are one of the businesses of the future, and it was hard to imagine such a partnership on what is one of the hottest technologies. After all, Microsoft collaborates extensively with OpenAi (provided servers to train the models), invested $10 B in OpenAI, and also used OpenAI models such as ChatGPT and GPT-4. Why is Microsoft collaborating with META’s open-source answer?
Meanwhile, as we said there are not only closed-source models, but several open-source models have come out in recent months (Alpaca, Dolly, Falcon, and so on). This shows the communities are more active than ever and it is difficult to have a monopoly. Not to mention that the future of LLMs is still very uncertain and the battle is open.
It’s not just Meta and Microsoft that believe in democratizing access to today’s AI models. We have a broad range of diverse supporters around the world who believe in this approach too — including companies that have given us early feedback and are excited to build new products with Llama 2, cloud providers that will include Llama 2 in their offerings for customers, research institutions who are collaborating with us on the safe and responsible deployment of large generative models, and people across tech, academia, and policy who see the benefits as we do. (source)
META must have noticed the incredible success of LLaMA. Dozens and dozens of articles citing LLaMA have been published, and many companies have decided to use it for internal products, while others have moved toward using other open-source models for models to be released publicly.
Having an open-source and available model, on the one hand, might help competitors (or at least it might seem that way). In reality, if it is much riskier for Google or OpenAI to release Bard or GPT-4 in open-source, for META the risks are much less.
First, META used only public data for LLaMA, so releasing the model has no risk of data leakage. Second, Google and OpenAI are fighting for first place in the race, LLaMA aims to be on the podium instead. While Google and OpenAI aim to have the best-performing model ever at the cost of a huge number of parameters and costs, LLaMA wants at most not to be too inferior (and LLaMA despite using the latest in technology is not aiming for any huge breakthroughs). So while the other two companies lose an advantage in releasing their model recipe, META instead assembles already-known ingredients.
Instead, the advantages are enormous. The research community is skeptical of anything that is not open-source, plus most researchers do not want to pay to use the models. LLaMA can be the hub of an active community that can experiment and create new applications. Second, LLaMA can become the community standard and thus attract more and more companies and researchers. For META it is then a no-brainer to import the published code based on their model and use it for their own internal applications.
So what about Microsoft?
As mentioned, a new open-source model comes out every month, and a standard has not yet been established. It will certainly continue to integrate ChatGPT and GPT-4 into its products, but Microsoft has a cross-sector business and could also use LLaMA (which is a much lighter family of models) in other products.
We expand the context window for Llama 2 from 2048 tokens to 4096 tokens. The longer context window enables models to process more information, which is particularly useful for supporting longer histories in chat applications, various summarization tasks, and understanding longer documents. (source)
Now as mentioned, attention has a quadratic cost and this scales with the number of tokens, so it becomes very expensive to double the context length. There are several tricks, though, to be able to succeed in enlarging the context window.
On the one hand, LLaMA 1 already used flash attention, now the authors have added Grouped-Query Attention:
For larger models, where KV cache size becomes a bottleneck, key and value projections can be shared across multiple heads without much degradation of performance (Chowdhery et al., 2022). Either the original multi-query format with a single KV projection (MQA, Shazeer, 2019) or a grouped-query attention variant with 8 KV projections (GQA, Ainslie et al., 2023) can be used. (source)
In this, the authors decided to use a greater amount of data but also an approach of greater attention to quality:
The model was trained on 40% more data than its predecessor. Al-Dahle says there were two sources of training data: data that was scraped online, and a data set fine-tuned and tweaked according to feedback from human annotators to behave in a more desirable way. The company says it did not use Meta user data in LLaMA 2, and excluded data from sites it knew had lots of personal information. (source)
Now actually this was imaginable for a variety of reasons. These models are much fatter (PaLM is 540 B of parameters) or they are not even a single model (GPT-4 is an ensemble of models). Of other models, nothing is even known about either the training or the architecture (PaLM-2). These models have also been trained with data obtained from the Internet and private data, while LLaMA is smaller and trained only with public data.
In any case, LLaMA’s real competitors are open-source models. Moreover, META has every interest in having its models used (fine-tune, adapted for other tasks) so it has released smaller models without participating in the parameter race.
The authors also focused on the safety of the model showing that it is superior to currently available open-source models.
Getting LLaMA 2 ready to launch required a lot of tweaking to make the model safer and less likely to spew toxic falsehoods than its predecessor, Al-Dahle says. (source)
Despite that, LLaMA 2 still spews offensive, harmful, and otherwise problematic language, just like rival models. Meta says it did not remove toxic data from the data set, because leaving it in might help LLaMA 2 detect hate speech better, and removing it could risk accidentally filtering out some demographic groups. (source)
In the end, you can try to mitigate but that still remains a limitation of the transformer and thus of all derived architectures (garbage in, garbage out).
There is one advantage with LLaMa though, the community can test and inspect it in both its parameters and its behaviors, this may allow for a better understanding of where some of its bias and toxic behaviors stem from:
The fact that LLaMA 2 is an open-source model will also allow external researchers and developers to probe it for security flaws, which will make it safer than proprietary models, Al-Dahle says. (source)
An added benefit for META is to allow the community to be active in testing the limitations and proposing solutions to flaws in its model.
Llama 2-Chat, a fine-tuned version of Llama 2 that is optimized for dialogue use cases. We release variants of this model with 7B, 13B, and 70B parameters as well. (source)
This is one of the most interesting new features. As mentioned some time ago, Google in an internal memo stated that the company did not have MOAT in the AI market and that open-source could win the race.
LLaMA 2.0 was designed for a very specific reason, to talk to them. As Mark said in an interview with Lex Friedman, the idea is for the model to be integrated into both Instagram and Whatsapp:
Zuckerberg: You’ll have an assistant that you can talk to in WhatsApp. I think in the future, every creator will have kind of an AI agent that can kind of act on their behalf that their fans can talk to. I want to go get to the point where every small business basically has an AI agent that people can talk to to do commerce and customer support and things like that. (source)
In fact, as seen the community has already responded and there are implementations of LLaMA chat already available (for example here or tutorials on how to use it)
The model will be available through a wide network of resources:
The tool can also run directly on Windows PCs, and will be available through outside providers like Amazon Web Services and Hugging Face. (source)
and also:
We’ve collaborated with Meta to ensure smooth integration into the Hugging Face ecosystem. You can find the 12 open-access models (3 base models & 3 fine-tuned ones with the original Meta checkpoints, plus their corresponding transformers models) on the Hub. (source)
HuggingFace collaborated with META, and the model can be found on the huggingFace site. In theory, you could also download it from the META site, but the fact that it is integrated into the HuggingFace ecosystem helps its use and spread.
It also appears that the model already has the top of HuggingFace’s leaderboard:
As Yann LeCun said LLaMA 2.0 will probably have a big impact on the community.
Few companies can afford to train such models, and the fact that it is released to the public allows the possibility for many derivative applications to be born.
Meta executives say they believe public releases of technologies actually reduce safety risks by harnessing the wisdom of the crowd to identify problems and build resilience into the systems. (source)
Similar models allow researchers to explore the limitations of LLMs and also the potential biases that are present in the model. The study of risk can only be done if a model can be tested.
META plans to integrate it inside Instagram and WhatsApp although it is probably too early for it to be safely used. LLaMA is better than its predecessors but is not without bias.
Google and OpenAI also do well to care about the open-source community. Lots of groups are working and have produced lots of interesting techniques to reduce the cost of inferences and the technical requirements to be able to use a model. Models such as LLaMA 7B may soon be used on a cell phone.
Bard is a very promising model, but ChatGPT is currently more widely used. Microsoft has resurrected Bing through AI. Open-source is lurking, though. How many companies and developers will want to use models they have to pay for when they can use free alternatives?
LLaMA may not have the best performance, but it is good enough and most importantly it is not huge. It can be fine-tuned by anyone and used at any site. There are many developers who could use it to create niches and take the market away from larger companies. Also, lawsuits (even Musk launched one) and new regulations could reduce the advantage of large companies that have used non-open-source data.
The model however is not exactly open-source as revealed later but some commercial applications are restricted:
These critics highlight that Meta’s license places usage restrictions on Llama 2, excluding licensees with over 700 million active daily users (mentioned above) and restricting the use of its outputs to improve other LLMs. (source)
You can look for my other articles, you can also subscribe to get notified when I publish articles, you can become a Medium member to access all its stories (affiliate links of the platform for which I get small revenues without cost to you) and you can also connect or reach me onLinkedIn.
Here is the link to my GitHub repository, where I am planning to collect code and many resources related to machine learning, artificial intelligence, and more.