avatarTristan Wolff

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

1232

Abstract

re><p id="1643">Here are some examples of experiments conducted by the Visual ChatGPT research team. As you can see, not everything is working as expected, but building a ChatGPT-like interface with access to visual foundation models is actually a sneak peek into the upcoming paradigm of multimodality.</p><figure id="c2a7"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*EvZSnsbkEt61iVIjPf6_RA.png"><figcaption></figcaption></figure><figure id="296d"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*ZVIb8Wz4zwryJ9nXOvY0KA.png"><figcaption><a href="https://arxiv.org/abs/2303.04671">https://arxiv.org/abs/2303.04671</a></figcaption></figure><h2 id="aee3">How to use Visual ChatGPT</h2><p id="0159">If you have some 50GB of disk space ready you can <a href="https://github.com/microsoft/visual-chatgpt">clone it here</a> (attention Mac users you’ll need <a href="https://github.com/microsoft/visual-chatgpt/issues/37">this workaround</a>). Otherwise you can use one of the online demos below or the Google colab.</p><p id="2f3c">Link to GitHub repo: <a href="https://github.com/microsoft/visual-chatgpt/">https://github.com/microsoft/visual-chatgpt/</a></p><p id="7d3b">Link to Huggingface demo:

Options

<a href="https://huggingface.co/spaces/RamAnanth1/visual-chatGPT">https://huggingface.co/spaces/RamAnanth1/visual-chatGPT</a></p><p id="ba12">Link to Google colab: <a href="https://colab.research.google.com/drive/11BtP3h-w0dZjA-X8JsS9_eo8OeGYvxXB">https://colab.research.google.com/drive/11BtP3h-w0dZjA-X8JsS9_eo8OeGYvxXB</a></p><p id="a11e">Link to original paper: <a href="https://arxiv.org/abs/2303.04671">https://arxiv.org/abs/2303.04671</a></p><div id="82ae" class="link-block"> <a href="https://medium.com/@tristwolff/membership"> <div> <div> <h2>Join Medium with my referral link — Tristan Wolff</h2> <div><h3>Read every story from Tristan Wolff (and thousands of other writers on Medium). Your membership fee directly supports…</h3></div> <div><p>medium.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*AqmxZxStyox2ABzI)"></div> </div> </div> </a> </div><p id="dbdd">➡️ If you like my content, why not leave a “clap” at the end of this article, so more people can see it?</p></article></body>

What Is Visual ChatGPT?

A sneak peek at multimodality

Visual ChatGPT is not a new model. Instead, it uses existing vision and language foundation models and merges them into a ChatGPT-like interface.

https://github.com/microsoft/visual-chatgpt

Visual ChatGPT allows us to use text prompts to control so-called Visual Foundation Models (existing models like Stable Diffusion, ControlNet, Pix2Pix, and others) that are included in the Visual ChatGPT framework.

This gives Visual ChatGPT completely new capabilities, for example:

  • understanding images and providing the description of an image (via the BLIP foundation model)
  • generate images (via the Stable Diffusion foundation model)
https://arxiv.org/abs/2303.04671

Here are some examples of experiments conducted by the Visual ChatGPT research team. As you can see, not everything is working as expected, but building a ChatGPT-like interface with access to visual foundation models is actually a sneak peek into the upcoming paradigm of multimodality.

https://arxiv.org/abs/2303.04671

How to use Visual ChatGPT

If you have some 50GB of disk space ready you can clone it here (attention Mac users you’ll need this workaround). Otherwise you can use one of the online demos below or the Google colab.

Link to GitHub repo: https://github.com/microsoft/visual-chatgpt/

Link to Huggingface demo: https://huggingface.co/spaces/RamAnanth1/visual-chatGPT

Link to Google colab: https://colab.research.google.com/drive/11BtP3h-w0dZjA-X8JsS9_eo8OeGYvxXB

Link to original paper: https://arxiv.org/abs/2303.04671

➡️ If you like my content, why not leave a “clap” at the end of this article, so more people can see it?

Artificial Intelligence
Technology
Innovation
ChatGPT
Future
Recommended from ReadMedium