Free AI web copilot to create summaries, insights and extended knowledge, download it at here

Abstract

//twitter.com/sama/status/1590416386765254656&image=https%3A//i.embed.ly/1/image%3Furl%3Dhttps%253A%252F%252Fabs.twimg.com%252Ferrors%252Flogo46x38.png%26key%3Da19fcc184b9711e1b4764040d3dc5c07" allowfullscreen="" frameborder="0" height="281" width="500"> </div> </div> </figure></iframe></div></div></figure><p id="aaf3">Thus, we do believe that the “100T parameters” extrapolation is wrong. Yet regardless of its size increase, we expect latest <b>major large models optimizations and fine tuning approaches to be part of this release</b> (see below).</p><p id="aafe">Here a quick summary of what we know and can anticipate for this model.</p><p id="dfcc"><i>But First please consider supporting us: 🔔 <b>clap </b>& <b>follow 🔔</b></i></p><h1 id="123f">What do we know about its size and multi-modality?</h1><h2 id="2637">GPT4 Size</h2><blockquote id="1c30"><p><b>A 100T parameters model?</b> no.</p></blockquote><p id="b6d5"><b>Size will not be the main improvement factor</b>: <b>Sam Altman announced that GPT4’s size will not be much larger than GPT3.</b></p><p id="c92f">This comes as no surprise since we know that GPT3 isn’t compute optimal (too large considering the number of training tokens). “Simply” adding more data to it can still help GPT3 improve significantly.</p><p id="719f">It doesn’t say much … but we <b>shouldn’t expect </b>something above one order of magnitude larger ~1TB parameters.</p><h2 id="5d50">Will GPT4 be multi modal?</h2><blockquote id="14f8"><p><b>A multi-modal model?</b> no.</p></blockquote><p id="e417">Sam Altman also announced during a Q&A session that GPT4 won’t be multi-modal.</p><p id="cb81"><b>We should expect a text-only model fine tuned to better follow our instructions</b> (i.e., better “alignment” with our intended prompt)</p><h1 id="558c">Still GPT4 will be way better than GPT3, and here’s why</h1><h2 id="83b0">Better in what sense?</h2><p id="d4b3"><b>We can still expect a significant improvement compared to GPT3</b>, one that can justify current hype.</p><p id="3298">Why? because new optimizations show that you can achieve the same performance as GPT3 with 100x smaller models (see reinforcement learning fine tuning below). This means that a well optimized GPT4 model, with a size similar to GPT3, would still perform waaaay better.</p><p id="7b74">Namely, it should have</p><ol><li><b>Improved existing emergent abilities, i.e., </b>following instructions as well as reasoning, coding, and so on: <a href="https://readmedium.com/large-language-models-emergent-abilities-how-they-solve-problems-they-were-not-trained-to-address-90da1ee7ae6d">see this article about emergent abilities</a> if you’re not fami

Options

liar with this concept.</li><li><b>New emergent abilities, e.g., </b>implicit chain of thoughts reasoning</li></ol><h2 id="fc27">Major optimizations that should be in GPT4</h2><p id="8703">Mainly 3 that had a significant impact on models’ performances in 2022.</p><p id="bbc8"><b>(1) Compute Optimal: </b>In a nutshell, increasing the training data can still improve GPT3’s performance without increasing the model’s size. As a matter of fact, Deepmind showed through an empirical analysis that there is an optimal relationship between the data size and parameter size of a trained model. This result showed that GPT3 can be outperformed by a model 3 times smaller (aka Chinchilla). <a href="https://readmedium.com/training-compute-optimal-large-language-models-deepminds-70b-parameter-chinchilla-outperforms-b6098d040265">Here a post to read more about it.</a></p><p id="4fd2"><b>(2) Reinforcement Learning Fine Tuning</b>: OpenAI released in 2022 a model called InstructGPT (1.3B parameters). <a href="https://arxiv.org/pdf/2203.02155.pdf">In their paper</a>, they show that this fine-tuned model, based on human feedback, can “outperform outputs from the 175B GPT-3, despite having 100x fewer parameters”. GPT3.5 and ChatGPT are both based on this new approach.</p><p id="2135"><b>(3) New parameterization (μP):</b> This is a <a href="https://www.microsoft.com/en-us/research/blog/%C2%B5transfer-a-technique-for-hyperparameter-tuning-of-enormous-neural-networks/">method for optimizing the training of large neural networks</a>. In a nutshell, µP can be used to transfer hyper-parameters across different model sizes, reducing the need for trial and error in finding the optimal hyper-parameters for a given model. <b>In other words, we can optimize small models’ hyper-parameters and use the same parameters for larger models!</b></p><p id="bd2b"><b>Let’s summarize this in the context of GPT4</b>: we can add more data in GPT3 without having to increase the model size. Relying on reinforcement learning training with human feedback, small models can perform as well as models 100x larger. And thanks to μP parameterization, we can optimize hyper-parameters on small models and reuse optimal parameters on the largest models.</p><p id="87f7">All of it shows a potential huge improvement solely relying on data quality and training optimization! I would still bet on 5–10x model size increase though, because at the end of the day we still know that under the same training conditions, larger is always better.</p><p id="1f03">Happy New Year everyone and see you in 2023!</p><p id="342c">If you liked this post, please consider supporting us: 🔔 <b><i>clap </i></b>& <b><i>follow </i>🔔</b></p></article></body>

GPT4 — Facts & Reasonable Expectations

GPT-4 (Generative Pre-trained Transformer 4) is a highly anticipated language model that is expected to be released in the first quarter of 2023. While not much is known about GPT-4 at this time, it is likely to be a significant advancement in the performance of Large Language Models.

If you search for GPT4, you may come across mentions of a “100T parameters” model. This estimate is based on the significant increase in parameters between GPT2 (1.5B parameters) and GPT3 (175B parameters). However, it is important to note that this is just an extrapolation and the actual number of parameters for GPT4 has not been officially announced.

Then, what do we know about GPT4? How large will it be? What training and optimization techniques will be included in its design?

Let’s dig into it… But when a tweet by Sam Altman hints at GPT4’s ability to consistently pass the Turing test… you can be sure that there will be a before GPT4 and an after!

Thus, we do believe that the “100T parameters” extrapolation is wrong. Yet regardless of its size increase, we expect latest major large models optimizations and fine tuning approaches to be part of this release (see below).

Here a quick summary of what we know and can anticipate for this model.

But First please consider supporting us: 🔔 clap & follow 🔔

What do we know about its size and multi-modality?

GPT4 Size

A 100T parameters model? no.

Size will not be the main improvement factor: Sam Altman announced that GPT4’s size will not be much larger than GPT3.

This comes as no surprise since we know that GPT3 isn’t compute optimal (too large considering the number of training tokens). “Simply” adding more data to it can still help GPT3 improve significantly.

It doesn’t say much … but we shouldn’t expect something above one order of magnitude larger ~1TB parameters.

Will GPT4 be multi modal?

A multi-modal model? no.

Sam Altman also announced during a Q&A session that GPT4 won’t be multi-modal.

We should expect a text-only model fine tuned to better follow our instructions (i.e., better “alignment” with our intended prompt)

Still GPT4 will be way better than GPT3, and here’s why

Better in what sense?

We can still expect a significant improvement compared to GPT3, one that can justify current hype.

Why? because new optimizations show that you can achieve the same performance as GPT3 with 100x smaller models (see reinforcement learning fine tuning below). This means that a well optimized GPT4 model, with a size similar to GPT3, would still perform waaaay better.

Namely, it should have

Improved existing emergent abilities, i.e., following instructions as well as reasoning, coding, and so on: see this article about emergent abilities if you’re not familiar with this concept.
New emergent abilities, e.g., implicit chain of thoughts reasoning

Major optimizations that should be in GPT4

Mainly 3 that had a significant impact on models’ performances in 2022.

(1) Compute Optimal: In a nutshell, increasing the training data can still improve GPT3’s performance without increasing the model’s size. As a matter of fact, Deepmind showed through an empirical analysis that there is an optimal relationship between the data size and parameter size of a trained model. This result showed that GPT3 can be outperformed by a model 3 times smaller (aka Chinchilla). Here a post to read more about it.

(2) Reinforcement Learning Fine Tuning: OpenAI released in 2022 a model called InstructGPT (1.3B parameters). In their paper, they show that this fine-tuned model, based on human feedback, can “outperform outputs from the 175B GPT-3, despite having 100x fewer parameters”. GPT3.5 and ChatGPT are both based on this new approach.

(3) New parameterization (μP): This is a method for optimizing the training of large neural networks. In a nutshell, µP can be used to transfer hyper-parameters across different model sizes, reducing the need for trial and error in finding the optimal hyper-parameters for a given model. In other words, we can optimize small models’ hyper-parameters and use the same parameters for larger models!

Let’s summarize this in the context of GPT4: we can add more data in GPT3 without having to increase the model size. Relying on reinforcement learning training with human feedback, small models can perform as well as models 100x larger. And thanks to μP parameterization, we can optimize hyper-parameters on small models and reuse optimal parameters on the largest models.

All of it shows a potential huge improvement solely relying on data quality and training optimization! I would still bet on 5–10x model size increase though, because at the end of the day we still know that under the same training conditions, larger is always better.

Happy New Year everyone and see you in 2023!

If you liked this post, please consider supporting us: 🔔 clap & follow 🔔