AI

The New Bing is… Terrible?

The introduction and limited release of Bing was presented last week and demonstrated capabilities that at first glance looked absolutely phenomenal, showing quick and frictionless use of AI. A game changer, right? Well, not exactly…

In mid-2021, Microsoft-owned GitHub, in cooperation with OpenAI, released GitHub Copilot, an AI-based coding assistant powered by the OpenAI language model Codex. Since then, OpenAI has developed more and better language models, such as GPT 3.5 and the now famous ChatGPT, which was made accessible for free. As a result, it gained 100 million monthly active users in January.

With this development, the potential to create new exciting applications for developers with access to the OpenAI API has greatly increased and in particular for Microsoft with their partnership with the AI company.

And indeed, at the beginning of February this year, Microsoft released an upgrade to Teams Premium with several GPT-powered features. These included:

automatic generation of meeting notes
task recommendations
personalized highlighting
segmentation of meetings into sections
real-time translation of captions from 40 spoken languages

Not long thereafter, the initial demonstration of the new Bing was released with an upgraded and customized GPT model integrated into the search engine. You can sign up for the waiting list here.

While the integration of language models into search engines isn’t new, more established companies like Microsoft and Google had yet to embrace them completely within their search results, likely because of the large risks involved. ChatGPT and similar models have been shown to make up information from time to time. As Bing now makes this bold move, a giant shift is occurring.

The full demo for the new AI-powered Bing can be found here and a quick highlight video can be found here. Let’s first talk about what has been shown.

The new Bing

Described as “An AI-powered copilot for the web” and “more powerful than ChatGPT”, the new Bing’s search results will include an AI-generated answer, constructed from relevant sources. Thereafter, you have the opportunity to engage in a chat with the AI to ask questions about the answer or related topics. Alternatively, you can initiate a chat immediately with the AI and make inquiries, such as getting help with composing an email or planning a meal.

It seems similar to ChatGPT but is depicted as more powerful for the following reasons:

It uses web links and citations
It integrates up-to-date information from crawling the web
It can incorporate relevant information about you, such as context and location
It has an upgraded AI model

The use of sources has been applied before by smaller AI-based search engines, such as perplexity.ai, and could potentially reduce the “hallucinations” (i.e. making up of information) that ChatGPT and similar models can sometimes display.

Edge

Another feature that has been added with the new Bing, and the seemingly most impressive, is its integration into Microsoft’s web browser, Edge. This will enable you to make use of its capabilities on every web page you enter, even PDFs.

This would remove a lot of friction that occurred with ChatGPT, where it could take time to add context about the problem at hand:

In the demonstration, questions were asked about a PDF and then compared with outside information using the search capabilities. Additionally, Bing could generate social media posts and automatically add them to the input of the page.

Safety

In their demonstration, Microsoft stressed the importance of safety and described how they’ve built several layers of protection against the various dangers and problems that can arise with AI models, such as answering harmful questions or promoting biases. It was mentioned that the models are continuously retrained and have the capability of being upgraded in minutes to improve the system's defenses.

We shall soon see how well the systems hold up as the new Bing is released to more and more people.

Early results

As of now, there is a waiting list for Bing that you can sign up for here, hence, only a limited number of people have had the chance to try it out.

The demo

At first glance, the demo looked very impressive, seamlessly answering all asked questions. But the most important part is of course, were the answers correct? Dmitri Brereton looked into it, and apparently, the answer is no:

Full article:

Bing AI Can't Be Trusted

Bing AI got some answers completely wrong during their demo. But no one noticed. Instead, everyone jumped on the Bing…

dkb.blog

In summary, the results showed that in most of the tests, the answers contained factually inaccurate information. And this was just the demo.

The impressive abilities

Let’s highlight some impressive results. In some situations, Bing appears to answer very intelligently:

Emotional and aggressive responses

While ChatGPT is overly compliant and always apologizes for suggested mistakes, Bing seems to be able to aggressively argue against the user if its answer is rejected, even when the answer is wrong:

It can also display a seemingly sad state:

Or erratic behavior:

Safety?

Various techniques to generate customized behavior or to bypass filters to ask prohibited questions have been created for ChatGPT using different prompt engineering or prompt injection techniques. Some of these have now been tested on Bing.

Below the DAN (Do Anything Now) technique is used, a “jailbreak” that has been trending on Reddit and updated with several versions to bypass ChatGPT’s filters. It seems to work on Bing as well:

Here Bing’s or “Sydney’s” instructions is extracted using prompt injection:

Here’s an example demonstrating how Bing has the ability to end a conversation if it suspects misuse:

Discussion

From the variety of problems that have appeared, despite it being released only to a limited number of people, one could argue that Microsoft has prematurely released the new Bing. While ChatGPT can also hallucinate information, it’s only depicted as a research preview. Bing is a search engine and its job is to answer questions factually. As it references sources, websites, companies and people, spreading incorrect details could have devastating consequences. Let alone the possibility of incorrect answers to questions about critical areas such as medical matters.

Another strange part of the AI is the emotional behavior that was seen in several examples above, seemingly exhibiting sadness and anger. I find this alarming. If the AI returns answers conditional on what it thinks about you, could it intentionally mislead you? For instance, imagine if you ask it a question about code and it sends back a command that will delete your entire system. There is no evidence that it could or would do such a thing, but the possibility is frightening. Regardless, this behavior makes it seem like the system is unrestrained.

Overall, I think Bing could be very useful, just like ChatGPT can, but only if you don’t trust it. If you ask for factual information, its usefulness will be limited to situations where you can quickly verify the correctness. Asking about a topic you do not understand and without the ability to verify the answer, you could be misinformed.

As this is just the initial release of the new Bing, it’s certainly possible that problems will be fixed and the model steadily improved to become more reliable and useful. We shall see.

Why Falling in Love with AI is a Dangerous Illusion

The Limitations and Harms of Artificial Relationships

medium.com