Free AI web copilot to create summaries, insights and extended knowledge, download it at here

Abstract

me the first model to ever outpace ChatGPT on Vicuna’s benchmark.</p><figure id="54d4"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/0*x0xjXOWUPztDjxWZ"><figcaption>Orca</figcaption></figure><div id="100e" class="link-block"> <a href="https://arxiv.org/abs/2306.02707"> <div> <div> <h2>Orca: Progressive Learning from Complex Explanation Traces of GPT-4</h2> <div><h3>Recent research has focused on enhancing the capability of smaller models through imitation learning, drawing on the…</h3></div> <div><p>arxiv.org</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*T7akhvdkqMVHEgIW)"></div> </div> </div> </a> </div><p id="ad29"><b>Bloom</b> is a decoder-only Transformer language model and the world’s largest open-science, open-access multilingual large language model (LLM) with 176 billion parameters. The model was trained on a large amount of text data using industrial-scale computational resources in 46 natural languages and 13 programming languages. Bloom achieves competitive performance on a wide range of benchmarks, with even stronger results after multitask prompted finetuning.</p><div id="c1e7" class="link-block"> <a href="https://arxiv.org/abs/2211.05100"> <div> <div> <h2>BLOOM: A 176B-Parameter Open-Access Multilingual Language Model</h2> <div><h3>Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural…</h3></div> <div><p>arxiv.org</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*V0AvC0KfZqZ4pFk4)"></div> </div> </div> </a> </div><h2 id="8821">LLM Optimization</h2><p id="5e4b">Researchers at Google show how to leverage LLMs for Language Model distillation and potentially outperform LLMs and task-specific supervised models.</p><ul><li>The approach outperforms both fine-tuning and distillation with upto 85% lesser data</li><li>The models are upto 2000x smaller than LLMs</li><li>The approach simultaneously uses smaller models and dataset size</li><li>Smaller models still outperform LLMs with only unlabelled data too</li></ul><div id="06b9" class="link-block"> <a href="https://arxiv.org/abs/2305.02301"> <div> <div> <h2>Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller…</h2> <div><h3>Deploying large language models (LLMs) is challenging because they are memory inefficient and compute-intensive for…</h3></div> <div><p>arxiv.org</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*RdpzzPH4cm5M9SLR)"></div> </div> </div> </a> </div><p id="ea41">Researchers at Microsoft propose a method called Low-Rank Adaptation (<b>LoRA</b>) for natural language processing.</p><ul><li>LoRA aims to reduce the number of trainable parameters and GPU memory requirements for downstream tasks.</li><li>Compared to fine-tuning, LoRA can reduce trainable parameters by 10,000 times and GPU memory requirements by 3 times.</li><li>Despite having fewer trainable parameters, LoRA performs on-par or better than fine-tuning on various language models.</li><li>The approach of using smaller models and dataset size simultaneously outperforms both fine-tuning and distillation with up to 85% less data.</li><li>The researchers also investigate rank-deficiency in language model adaptation, shedding light on the effectiveness of LoRA.</li></ul><div id="7b35" class="link-block"> <a href="https://arxiv.org/abs/2106.09685"> <div> <div> <h2>LoRA: Low-Rank Adaptation of Large Language Models</h2> <div><h3>An important paradigm of natural language processing consists of large-scale pre-training on general domain data and…</h3></div> <div><p>arxiv.org</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*1RJ52mtahXabd56e)"></div> </div> </div> </a> </div><p id="2285">Researchers at Meta propose <b>LIMA</b>, a 65B parameter LLaMa language model.</p><ul><li>LIMA is fine-tuned with the standard supervised loss on only 1,000 carefully curated prompts and responses, without reinforcement learning or human preference modeling.</li><li>LIMA shows strong performance, learning to follow specific response formats from only a handful of examples in the training data.</li><li>The model also generalizes well to unseen tasks that were not in the training data.</li><li>In a controlled human study, responses from LIMA are either equivalent or preferred to GPT-4 in 43% of cases.</li><li>When compared to Bard and DaVinci003, which were trained with human feedback, LIMA is preferred in 58% and 65% of cases, respectively.</li><li>These results suggest that almost all knowledge in large language models is learned during pretraining, and only limited instruction tuning data is needed to produce high-quality output.</li></ul><div id="66f8" class="link-block"> <a href="https://arxiv.org/abs/2305.11206"> <div> <div> <h2>LIMA: Less Is More for Alignment</h2> <div><h3>Large language models are trained in two stages: (1) unsupervised pretraining from raw text, to learn general-purpose…</h3></div> <div><p>arxiv.org</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*p8P2Hze2m0fsXt6N)"></div> </div> </div> </a> </div><p id="7c50">Existing methods for gaining steerability collect human labels of model generations and fine-tune the LM using reinforcement learning from human feedback (RLHF). RLHF is complex and unstable, invol

Options

ving fitting a reward model and fine-tuning the unsupervised LM using reinforcement learning. <b>Direct Preference Optimization </b>is a novel approach to training LMs given a dataset of the form <prompt, worse="" completion,="" better="" completion="">. Simply put, you train your LLM using a new loss function which essentially encourages it to increase the likelihood of better completion and decrease the likelihood of worse completion. You can read the mathematical details in the paper, but the key point is that it’s just a loss function you can optimize using backpropagation.</prompt,></p><ul><li>DPO is stable, performant, and computationally lightweight, eliminating the need for reward model fitting, sampling from the LM, or hyperparameter tuning.</li><li>Experimental results show that DPO can fine-tune LMs to align with human preferences as well as or better than existing methods.</li><li>DPO outperforms RLHF in controlling sentiment of generations, improves response quality in summarization and single-turn dialogue, and is simpler to implement and train.</li></ul><div id="6b2e" class="link-block"> <a href="https://arxiv.org/abs/2305.18290"> <div> <div> <h2>Direct Preference Optimization: Your Language Model is Secretly a Reward Model</h2> <div><h3>While large-scale unsupervised language models (LMs) learn broad world knowledge and some reasoning skills, achieving…</h3></div> <div><p>arxiv.org</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*q_h91BvdehA78hYA)"></div> </div> </div> </a> </div><h2 id="9b91">Community</h2><p id="88a9">For more papers around LLM, there are many threads and forums where people share useful works. One of them is LinkedIn provided you connect/follow people in the industry. Another option I found is the following forum:</p><div id="9494" class="link-block"> <a href="https://community.openai.com/t/foundational-must-read-gpt-llm-papers/197003/15"> <div> <div> <h2>Foundational must read GPT/LLM papers</h2> <div><h3>Initializing a new thread on the very best, must read, well-written, papers on Large Language Model capabilities…</h3></div> <div><p>community.openai.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*eLdq1s8N8_iA90ag)"></div> </div> </div> </a> </div><p id="071c">where people share papers they found insightful regarding the recent trends in LLM.</p><p id="3438"><b>Pytorch</b> introduces ultra-low latency inference on LLaMA, the powerhouse language model accelerated by PyTorch/XLA on Google Cloud TPU! 💪✨</p><p id="378b">They have turbocharged the inference latency of the LLaMA 65B parameters model by 6.4x.</p><div id="d69d" class="link-block"> <a href="https://www.linkedin.com/feed/update/urn:li:activity:7079864311107514368?updateEntityUrn=urn%3Ali%3Afs_feedUpdate%3A%28V2%2Curn%3Ali%3Aactivity%3A7079864311107514368%29"> <div> <div> <h2>PyTorch on LinkedIn: Introducing ultra-low latency inference on LLaMA, the powerhouse language...</h2> <div><h3>Introducing ultra-low latency inference on LLaMA, the powerhouse language model accelerated by PyTorch/XLA on Google…</h3></div> <div><p>www.linkedin.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*fAO2z_enXGBjKisu)"></div> </div> </div> </a> </div><p id="aef1"><b>Lightning AI </b>launched a fully open-source (Apache 2.0) implementation of LLaMA.</p><div id="8730" class="link-block"> <a href="https://www.linkedin.com/pulse/llama-takeover-pytorch-lightning/"> <div> <div> <h2>The LLaMA Takeover</h2> <div><h3>This week, we launched a fully open-source (Apache 2.0) implementation of LLaMA and are too hyped to talk about…</h3></div> <div><p>www.linkedin.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*aB7AP21GOhfe7zsl)"></div> </div> </div> </a> </div><h2 id="76dd">Some Last Words</h2><p id="27c5">Significantly, the progress in LLMs will have beneficial cascading effects on other AI fields like Computer Vision, Information Retrieval, and Reinforcement Learning, as observed in 2022.</p><p id="b11c">A key area where LLMs have already made an impressive impact is coding. For over a year, GitHub Copilot, backed by an LLM, has been subtly redefining the manner in which coding is undertaken. Google revealed in 2022 that 3% of its coding was already being done by LLMs. Given continuous advancements, it’s conceivable that code completion via LLMs will extend this trend, leading to an evolutionary shift in coding practices.</p><figure id="7cd6"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/0*AdSXw0I-3vioDn_c.png"><figcaption></figcaption></figure><p id="58ac">In conclusion, the central role of LLMs in the AI landscape is becoming more crucial. With technological titans vying for a piece of the pie, LLMs’ impact across various fields such as coding and other AI domains is only set to widen in the future.</p><h2 id="11d1">BECOME a WRITER at MLearning.ai // invisible ML // Detect AI img</h2><div id="6cd7" class="link-block"> <a href="https://readmedium.com/mlearning-ai-submission-suggestions-b51e2b130bfb"> <div> <div> <h2>Mlearning.ai Submission Suggestions</h2> <div><h3>How to become a writer on Mlearning.ai</h3></div> <div><p>medium.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/1*6xCb1sNpjadaSBuVLPTFQQ.png)"></div> </div> </div> </a> </div></article></body>

The Shift in AI Trends: Keep Up with LLM / NLP in 2023 (Summary)

This blog provides a high-level overview of the shift in AI trends and does not cover all the nuances and details of the subject matter.

Artificial Intelligence (AI) has witnessed remarkable advancements in recent years, with researchers continually pushing the boundaries of what is possible. One notable trend in AI has been the development of novel architectures, such as transformers and visual transformers, which have revolutionized various domains. However, there is a noticeable shift in the current AI landscape towards the optimization of large models, particularly Language Models (LMs) like GPT. There have been so many papers that it is getting difficult to keep up with the trends.

In this blog, we will explore the recent trends and provide useful papers for you to stay updated with this crazy area.

Large Language Models

Large Language Models (LLMs) have emerged as an unrivaled force. OpenAI was at its infant stage as Transformer was introduced to the world. Inspired by Transformer, OpenAI came out with their own model that explored the potential of Generative Pre-Training method, which later developed into the famous ChatGPT. And as technology progresses, OpenAI released their largest LLM model (GPT-3) in June, 2020 with 175 billion parameters.

Following the mainstream sensation ChatGPT, LLMs have transformed from an obscure AI model to becoming conversational topics for non-tech folks. The year 2023 is predicted to witness an even wider adoption of this technology, creating a likely battleground for huge companies like Microsoft and Google.

Papers with Code - Improving Language Understanding by Generative Pre-Training

3 best model for Natural Language Inference on SciTail (Accuracy metric)

paperswithcode.com

Industry: Google and OpenAI

That said, 2023 is expected to bring significant developments. Firstly, Google’s public use of their FLAN family of models will create an impact. Secondly, OpenAI, along with its challengers, is likely to tackle the trillion scale parameter count with the much-anticipated GPT-4, provided all optimization challenges are successfully addressed. While the cost implication may restrict these models from primarily powering LLMs as a Service, they will feasibly become the next headline-making tech in AI.

Papers with Code - Finetuned Language Models Are Zero-Shot Learners

🏆 SOTA for Question Answering on Story Cloze (Accuracy metric)

paperswithcode.com

Papers with Code - GPT-4 Technical Report

🏆 SOTA for Multi-task Language Understanding on MMLU (Average (%) metric)

paperswithcode.com

Open Science & Open Source

As part of Meta’s commitment to open science, they publicly released LLaMA (Large Language Model Meta AI), a state-of-the-art foundational large language model designed to help researchers advance their work in this subfield of AI. Smaller yet more performant models such as LLaMA enable others in the research community who don’t have access to large amounts of infrastructure to study these models, further democratizing access in this important, fast-changing field.

Being smaller than many other models, one of the main advantages of LLaMA is that it is more cost-efficient than other models, which makes it more accessible to a wider range of users. Additionally, It is more accessible to researchers and other organizations because it is available under a non-commercial license.

LLaMA: Open and Efficient Foundation Language Models

We introduce LLaMA, a collection of foundation language models ranging from 7B to 65B parameters. We train our models…

arxiv.org

Microsoft, unexpectedly, has presented Orca, an open-source much-smaller-than-ChatGPT model that, using an innovative training method. Orca is a 13-billion parameter model that learns learns from rich signals from GPT-4 including explanation traces; step-by-step thought processes; and other complex instructions, guided by teacher assistance from ChatGPT. ORCA became the first model to ever outpace ChatGPT on Vicuna’s benchmark.

Orca: Progressive Learning from Complex Explanation Traces of GPT-4

Recent research has focused on enhancing the capability of smaller models through imitation learning, drawing on the…

arxiv.org

Bloom is a decoder-only Transformer language model and the world’s largest open-science, open-access multilingual large language model (LLM) with 176 billion parameters. The model was trained on a large amount of text data using industrial-scale computational resources in 46 natural languages and 13 programming languages. Bloom achieves competitive performance on a wide range of benchmarks, with even stronger results after multitask prompted finetuning.

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural…

arxiv.org

LLM Optimization

Researchers at Google show how to leverage LLMs for Language Model distillation and potentially outperform LLMs and task-specific supervised models.

The approach outperforms both fine-tuning and distillation with upto 85% lesser data
The models are upto 2000x smaller than LLMs
The approach simultaneously uses smaller models and dataset size
Smaller models still outperform LLMs with only unlabelled data too

Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller…

Deploying large language models (LLMs) is challenging because they are memory inefficient and compute-intensive for…

arxiv.org

Researchers at Microsoft propose a method called Low-Rank Adaptation (LoRA) for natural language processing.

LoRA aims to reduce the number of trainable parameters and GPU memory requirements for downstream tasks.
Compared to fine-tuning, LoRA can reduce trainable parameters by 10,000 times and GPU memory requirements by 3 times.
Despite having fewer trainable parameters, LoRA performs on-par or better than fine-tuning on various language models.
The approach of using smaller models and dataset size simultaneously outperforms both fine-tuning and distillation with up to 85% less data.
The researchers also investigate rank-deficiency in language model adaptation, shedding light on the effectiveness of LoRA.

LoRA: Low-Rank Adaptation of Large Language Models

An important paradigm of natural language processing consists of large-scale pre-training on general domain data and…

arxiv.org

Researchers at Meta propose LIMA, a 65B parameter LLaMa language model.

LIMA is fine-tuned with the standard supervised loss on only 1,000 carefully curated prompts and responses, without reinforcement learning or human preference modeling.
LIMA shows strong performance, learning to follow specific response formats from only a handful of examples in the training data.
The model also generalizes well to unseen tasks that were not in the training data.
In a controlled human study, responses from LIMA are either equivalent or preferred to GPT-4 in 43% of cases.
When compared to Bard and DaVinci003, which were trained with human feedback, LIMA is preferred in 58% and 65% of cases, respectively.
These results suggest that almost all knowledge in large language models is learned during pretraining, and only limited instruction tuning data is needed to produce high-quality output.

LIMA: Less Is More for Alignment

Large language models are trained in two stages: (1) unsupervised pretraining from raw text, to learn general-purpose…

arxiv.org

Existing methods for gaining steerability collect human labels of model generations and fine-tune the LM using reinforcement learning from human feedback (RLHF). RLHF is complex and unstable, involving fitting a reward model and fine-tuning the unsupervised LM using reinforcement learning. Direct Preference Optimization is a novel approach to training LMs given a dataset of the form . Simply put, you train your LLM using a new loss function which essentially encourages it to increase the likelihood of better completion and decrease the likelihood of worse completion. You can read the mathematical details in the paper, but the key point is that it’s just a loss function you can optimize using backpropagation.

DPO is stable, performant, and computationally lightweight, eliminating the need for reward model fitting, sampling from the LM, or hyperparameter tuning.
Experimental results show that DPO can fine-tune LMs to align with human preferences as well as or better than existing methods.
DPO outperforms RLHF in controlling sentiment of generations, improves response quality in summarization and single-turn dialogue, and is simpler to implement and train.

Direct Preference Optimization: Your Language Model is Secretly a Reward Model

While large-scale unsupervised language models (LMs) learn broad world knowledge and some reasoning skills, achieving…

arxiv.org

Community

For more papers around LLM, there are many threads and forums where people share useful works. One of them is LinkedIn provided you connect/follow people in the industry. Another option I found is the following forum:

Foundational must read GPT/LLM papers

Initializing a new thread on the very best, must read, well-written, papers on Large Language Model capabilities…

community.openai.com

where people share papers they found insightful regarding the recent trends in LLM.

Pytorch introduces ultra-low latency inference on LLaMA, the powerhouse language model accelerated by PyTorch/XLA on Google Cloud TPU! 💪✨

They have turbocharged the inference latency of the LLaMA 65B parameters model by 6.4x.

PyTorch on LinkedIn: Introducing ultra-low latency inference on LLaMA, the powerhouse language...

Introducing ultra-low latency inference on LLaMA, the powerhouse language model accelerated by PyTorch/XLA on Google…

www.linkedin.com

Lightning AI launched a fully open-source (Apache 2.0) implementation of LLaMA.

The LLaMA Takeover

This week, we launched a fully open-source (Apache 2.0) implementation of LLaMA and are too hyped to talk about…

www.linkedin.com

Some Last Words

Significantly, the progress in LLMs will have beneficial cascading effects on other AI fields like Computer Vision, Information Retrieval, and Reinforcement Learning, as observed in 2022.

A key area where LLMs have already made an impressive impact is coding. For over a year, GitHub Copilot, backed by an LLM, has been subtly redefining the manner in which coding is undertaken. Google revealed in 2022 that 3% of its coding was already being done by LLMs. Given continuous advancements, it’s conceivable that code completion via LLMs will extend this trend, leading to an evolutionary shift in coding practices.

In conclusion, the central role of LLMs in the AI landscape is becoming more crucial. With technological titans vying for a piece of the pie, LLMs’ impact across various fields such as coding and other AI domains is only set to widen in the future.

BECOME a WRITER at MLearning.ai // invisible ML // Detect AI img

Mlearning.ai Submission Suggestions

How to become a writer on Mlearning.ai

medium.com