avatarThe PyCoach

Summary

ChatGPT's latest upgrade includes vision capabilities, allowing users to interact with images, enhancing learning, app development, and everyday tasks, though it has some limitations.

Abstract

The integration of vision capabilities into ChatGPT represents a significant advancement, enabling users to upload and discuss images, interpret complex visual information, and leverage the AI for educational purposes, creative projects, and practical applications. This feature simplifies the understanding of visual data, such as parking signs or biology diagrams, and can even generate code for app development from sketches. While it offers powerful use cases and streamlines various tasks, it is not without its imperfections, occasionally misinterpreting images or missing obvious details.

Opinions

  • The author views ChatGPT's vision upgrade as a substantial improvement, particularly for visual learners and developers.
  • There is an acknowledgment that this technology can both facilitate learning and enable cheating more easily.
  • The author expresses excitement about the potential for generating apps from simple sketches, highlighting the efficiency and creativity this can bring to development processes.
  • Despite the enthusiasm, the author recognizes the current limitations and imperfections of the vision feature, such as misidentifying objects or failing to notice anomalies in images.
  • The author encourages readers to explore the possibilities of this new feature and stay informed about its capabilities and limitations.

Vision is The Biggest Upgrade to ChatGPT in Months. Here Are Some Ways To Make The Most of It

ChatGPT now can see. Here’s how you can benefit from it.

Source: Pexels

After plugins and the code interpreter, vision is one of the biggest upgrades to ChatGPT.

With ChatGPT vision, you only need to snap pictures and share them with ChatGPT to ask questions about them, figure things out, get explanations, create apps, and more!

ChatGPT vision is currently available to some ChatGPT Plus users and, in this article, I’ll highlight the best use cases I found on the internet.

Take a photo, upload it to ChatGPT, and chat about it

Some images can be a pain in the neck sometimes. They might be easy to understand for some people, but for others, they’re like reading a text in another language.

Imagine you have to look at the image below to understand the parking rules and avoid getting a ticket.

That’s too much information for a human to process quickly, but with ChatGPT vision, you only need to take a photo with your phone and share it with ChatGPT to know whether now is a good time for parking.

You can also upload pictures from the internet and figure out what they’re about. Very useful when you don’t get the message of the story or joke in a meme.

Passing images to ChatGPT has more powerful use cases. Let’s see some of them.

Learning (and cheating) is easier than ever now

Visual learners will definitely benefit from ChatGPT vision. If you love learning by reading or seeing pictures, now you can upload your material from school or university to ChatGPT and ask questions.

Here’s how ChatGPT vision would help a biology student understand an image of a human cell.

The way it breaks down the labels in the diagram is amazing! Now you don’t need to google each label one by one to get a definition but ChatGPT gives the definition of all the labels in seconds.

But with easy learning also comes easy cheating.

Now solving questions and exercises from an exam or homework is as simple as taking a photo of the exam, sharing it to ChatGPT, and letting AI answer all the questions for you. No need to type the questions anymore!

I think many of you wish you were in school now.

Generate apps from a sketchpad

One of the most powerful things you can do with ChatGPT vision (and my favorite) is create apps. If you watched the introduction of GPT-4, you might remember how they could build a joke website from a napkin sketch.

That was impressive, but what I’m about to show you is even more impressive. Imagine you’re a developer and after hours of having a session with your team, you take a photo of everything planned in the whiteboarding. Now you can give that photo to ChatGPT to create the app for you.

ChatGPT is able to successfully understand things in the sketch like the arrows between the email and name blocks, the two branches from the age block, and the cross over the sorry screen block.

Once it understands the image, it generates the code for you. Then with some tweaks, you have a fully working web app!

The sky is the limit

As I always say when talking about this type of tech — the sky is the limit.

  • You can share the image of a chart and get a quick analysis.
  • You can share the image of a malfunctioning artifact and get instructions on how to fix it.
  • You can share the image of your favorite dish and get the recipe.

Just remember that this new feature isn’t perfect, has some limitations, and might hallucinate now and then.

In the examples below, you’ll see how GPT-4 vision confuses the Jack for a Queen in the first picture, mistakes the axe for a wrench in the second picture, and isn’t able to quickly notice that the woman in the last picture has three legs! That last one is very funny since it’s something that would quickly draw our attention.

That’s all the use cases I found on Twitter. As soon as I have the chance to play with this new feature, I’ll share with you other things we can do with it.

Join my newsletter with 30K+ people to get my free ChatGPT cheat sheet.

If you enjoy reading stories like these and want to support me as a writer, subscribe to my Substack. On Substack, I publish articles that you won’t find on the other platforms where I create content.

ChatGPT
Technology
Artificial Intelligence
Science
Python
Recommended from ReadMedium