idgets/media.html?type=text%2Fhtml&key=a19fcc184b9711e1b4764040d3dc5c07&schema=twitter&url=https%3A//twitter.com/goodside/status/1635711013566795776&image=https%3A//i.embed.ly/1/image%3Furl%3Dhttps%253A%252F%252Fabs.twimg.com%252Ferrors%252Flogo46x38.png%26key%3Da19fcc184b9711e1b4764040d3dc5c07" allowfullscreen="" frameborder="0" height="281" width="500">
</div>
</div>
</figure></iframe></div></div></figure><p id="f637">As noted, it is still not publicly available yet. Additionally, while it can take images as input, it is not capable of <i>generating</i> images. The output is still only text.</p><h2 id="ff21">Increased capabilities</h2><p id="4dc7">GPT-4 exhibits enhanced collaborative and creativity capabilities in comparison to its predecessors but also improved reasoning. While GPT-3.5 was amazing at different tasks, it lacked the ability to logically solve some problems that were different from its training data. I wrote one article illustrating some of these weaknesses:</p><div id="6c9a" class="link-block">
<a href="https://levelup.gitconnected.com/the-surprising-things-chatgpt-cant-do-yet-4362842da5b7">
<div>
<div>
<h2>The Surprising Things ChatGPT Can’t Do (Yet)</h2>
<div><h3>At this point, most of us have seen amazing examples of ChatGPT and its abilities. Everyone is eager to find out what…</h3></div>
<div><p>levelup.gitconnected.com</p></div>
</div>
<div>
<div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/1*mos_Z_8XciH5uOoJhVKv2Q.png)"></div>
</div>
</div>
</a>
</div><p id="0871">During evaluations, GPT-4 has displayed clear improvements, with the ability to solve more difficult problems than GPT-3.5. For instance, it was able to pass a simulated bar exam in the top 10% of test participants, in comparison to GPT-3.5 which was in the bottom 10%.</p><h2 id="ca3f">Safer</h2><p id="179f">Not surprisingly, the learning experiences gained from letting the public test ChatGPT have led to a model that is less prone to “going rogue” and acting outside of the predetermined instructions. The current model has been improved to stick with the initial instructions and is 82% less likely to respond to disallowed content.</p><p id="213c">Not only that, GPT-4 achieved a 40% better score on a set of factual evaluations than GPT-3.5. Apparently, GPT-4 itself was used to produce training data in order to improve the safety of the model.</p><h1 id="55e8">Applications</h1><p id="6018">For the general public, everything with GPT-4 is new. But OpenAI has been cooperating with various companies for some time, such as <a href="https://openai.com/customer-stories/duolingo">Duolingo</a>, <a href="https://openai.com/customer-stories/be-my-eyes">Be My Eyes</a> and <a href="https://openai.com/customer-stories/khan-academy">Khan Academy</a> to utilize GPT-4.</p><p id="90c9">Here are some early examples. Doing taxes:</p>
<figure id="c25c">
<div>
<div>
<img class="ratio" src="http://placehold.it/16x9">
<iframe class="" src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fwww.youtube.com%2Fembed%2FoutcGtbnMuQ%3Fstart%3D1143%26feature%3Doembed%26start%3D1143&display_name=YouTube&url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DoutcGtbnMuQ&image=https%3A%2F%2Fi.ytimg.com%2Fvi%2FoutcGtbnMuQ%2Fhqdefault.jpg&key=a19fcc184b9711e1b4764040d3dc5c07&type=text%2Fhtml&schema=youtube" allowfullscreen="" frameborder="0" height="480" width="854">
</div>
</div>
</figure></iframe></div></div></figure><p id="4772">Analyzing smart contracts:</p>
Options
<figure id="2803">
<div>
<div>
<img class="ratio" src="http://placehold.it/16x9">
<iframe class="" src="https://cdn.embedly.com/widgets/media.html?type=text%2Fhtml&key=a19fcc184b9711e1b4764040d3dc5c07&schema=twitter&url=https%3A//twitter.com/jconorgrogan/status/1635695064692273161&image=https%3A//i.embed.ly/1/image%3Furl%3Dhttps%253A%252F%252Fabs.twimg.com%252Ferrors%252Flogo46x38.png%26key%3Da19fcc184b9711e1b4764040d3dc5c07" allowfullscreen="" frameborder="0" height="281" width="500">
</div>
</div>
</figure></iframe></div></div></figure><p id="5ebc">Building simple games in seconds:</p>
<figure id="c297">
<div>
<div>
<img class="ratio" src="http://placehold.it/16x9">
<iframe class="" src="https://cdn.embedly.com/widgets/media.html?type=text%2Fhtml&key=a19fcc184b9711e1b4764040d3dc5c07&schema=twitter&url=https%3A//twitter.com/skirano/status/1635736107949195278&image=" allowfullscreen="" frameborder="0" height="281" width="500">
</div>
</div>
</figure></iframe></div></div></figure><h1 id="590e">Weaknesses</h1><p id="b799">While it is considered safer and less error-prone, the weaknesses of GPT-3.5, such as hallucinations and bias, still exist. Similarly, while it showed great results on the bar exam, it achieved poor results in the Codeforces programming contest, where it had a rating of 392 (below the 5th percentile). It is also stated that it can be “confidently wrong in its predictions”.</p><h1 id="127f">Summary</h1><p id="9d03">There has been a lot of speculation about GPT-4, but now we finally have it in front of us. The most impressive feature is the multimodality, which is a necessary step to achieve any form of artificial general intelligence.</p><p id="e3b5">The evaluations show it’s better than previous models, but it’s difficult to say by how much in practice. As the model is rolled out to the public, I believe we will slowly get a better and better feel for how capable the model is as more use cases surface. Similarly, we will also get an idea of what limitations still exist, and if there are other problems, like those <a href="https://readmedium.com/the-new-bing-is-terrible-3d923e876fd9">we saw with the early release of Bing</a>.</p><p id="7826">If you enjoyed this article:</p><ul><li>👏 Clap, this will help me understand what my readers like and wants more of</li><li>🙏 Follow or subscribe, if you would like to read my upcoming articles, new ones every week!</li><li>📚 If you are looking for more content, check out my reading lists in <a href="https://medium.com/@dreamferus/list/ai-ea01474f2db5">AI</a>, <a href="https://medium.com/@dreamferus/list/python-c8e4719d93da">Python</a> or <a href="https://medium.com/@dreamferus/list/data-science-57808dcf16f0">Data Science</a></li></ul><p id="f761">Thanks for reading and have a great day.</p><h1 id="8aa6">Level Up Coding</h1><p id="5495">Thanks for being a part of our community! Before you go:</p><ul><li>👏 Clap for the story and follow the author 👉</li><li>📰 View more content in the <a href="https://levelup.gitconnected.com/?utm_source=pub&utm_medium=post">Level Up Coding publication</a></li><li>💰 Free coding interview course ⇒ <a href="https://skilled.dev/?utm_source=luc&utm_medium=article">View Course</a></li><li>🔔 Follow us: <a href="https://twitter.com/gitconnected">Twitter</a> | <a href="https://www.linkedin.com/company/gitconnected">LinkedIn</a> | <a href="https://newsletter.levelup.dev">Newsletter</a></li></ul><p id="a9a0">🚀👉 <a href="https://jobs.levelup.dev/talent/welcome?referral=true"><b>Join the Level Up talent collective and find an amazing job</b></a></p></article></body>
The wait is over, GPT-4 is finally here. With increased context length, more advanced reasoning and the capability of processing visual input, we are in for a treat.
Let’s dive in.
Access
You can try it out right now if you have ChatGPT Plus or join the waitlist for the API. For now, only text input is available publicly, as image input is still in research preview where they are collaborating with Be My Eyes, an app that assists blind and low-vision people using tech. Here’s how they use GPT-4:
Features
Increased context size
The context size tells us how much information a GPT model is able to process and produce and was previously limited to 4097 tokens or approximately 3072 words. This meant that if you wanted to process content that was longer than this, you would have to utilize different tricks, such as iterative summarization. In practice though, it’s impossible to achieve the same performance as processing everything in one go would, both in terms of results and speed.
The new base GPT-4 model will have this context limit doubled, with approximately 6144 words. Better yet, they are also providing limited access to a model with a context size of 32768 tokens or about 50 pages of text. This is huge.
Multimodality
The AI is no longer limited to text input. It can now understand and process images in combination with text to generate descriptions, categorizations, and other analyses with comparable capabilities as it does with only text.
Here’s an example from OpenAI’s developer live stream where a photo was processed featuring a hand-written mockup of an app:
More examples:
As noted, it is still not publicly available yet. Additionally, while it can take images as input, it is not capable of generating images. The output is still only text.
Increased capabilities
GPT-4 exhibits enhanced collaborative and creativity capabilities in comparison to its predecessors but also improved reasoning. While GPT-3.5 was amazing at different tasks, it lacked the ability to logically solve some problems that were different from its training data. I wrote one article illustrating some of these weaknesses:
During evaluations, GPT-4 has displayed clear improvements, with the ability to solve more difficult problems than GPT-3.5. For instance, it was able to pass a simulated bar exam in the top 10% of test participants, in comparison to GPT-3.5 which was in the bottom 10%.
Safer
Not surprisingly, the learning experiences gained from letting the public test ChatGPT have led to a model that is less prone to “going rogue” and acting outside of the predetermined instructions. The current model has been improved to stick with the initial instructions and is 82% less likely to respond to disallowed content.
Not only that, GPT-4 achieved a 40% better score on a set of factual evaluations than GPT-3.5. Apparently, GPT-4 itself was used to produce training data in order to improve the safety of the model.
Applications
For the general public, everything with GPT-4 is new. But OpenAI has been cooperating with various companies for some time, such as Duolingo, Be My Eyes and Khan Academy to utilize GPT-4.
Here are some early examples. Doing taxes:
Analyzing smart contracts:
Building simple games in seconds:
Weaknesses
While it is considered safer and less error-prone, the weaknesses of GPT-3.5, such as hallucinations and bias, still exist. Similarly, while it showed great results on the bar exam, it achieved poor results in the Codeforces programming contest, where it had a rating of 392 (below the 5th percentile). It is also stated that it can be “confidently wrong in its predictions”.
Summary
There has been a lot of speculation about GPT-4, but now we finally have it in front of us. The most impressive feature is the multimodality, which is a necessary step to achieve any form of artificial general intelligence.
The evaluations show it’s better than previous models, but it’s difficult to say by how much in practice. As the model is rolled out to the public, I believe we will slowly get a better and better feel for how capable the model is as more use cases surface. Similarly, we will also get an idea of what limitations still exist, and if there are other problems, like those we saw with the early release of Bing.
If you enjoyed this article:
👏 Clap, this will help me understand what my readers like and wants more of
🙏 Follow or subscribe, if you would like to read my upcoming articles, new ones every week!
📚 If you are looking for more content, check out my reading lists in AI, Python or Data Science
Thanks for reading and have a great day.
Level Up Coding
Thanks for being a part of our community! Before you go: