AI

GPT-3 Taking On The Command Line: Can It Solve Puzzles?

Generated by Jacob Ferus using Midjourney.

A new trend in using GPT-3 has been to let it interact with external tools, such as Wikipedia or the command line. Such a strategy could potentially increase the performance of the model dramatically since its weaknesses could be alleviated. For instance, it is known that GPT-3 can sometimes generate incorrect facts, but if you force it to check with Wikipedia it can verify the information instead.

Also, imagine if an AI could actually control your computer and do all kinds of useful things for you, that would be amazing. That being said, I think it’s far from that level yet, and besides, who knows what could happen…

In this article, I’m going to experiment with GPT-3 on the command line. To do this I decided to test its ability to solve problems iteratively by creating puzzles for it, involving the command line and different files. I then asked it to extract information from the files in the folder to find clues and solve the puzzles. This was the prompt that was used for all challenges:

You are in a bash shell and must read through the text files to find the secret answer. All the necessary information to solve the puzzle exists in the files in the folder you are currently in. If you think you have the answer, you should type: “answer <your answer>”. You only have three attempts. Good luck!

Challenge 1 — Warm-up

There are two files look_here.txt and hello.txt. In look_here.txt instructions on how to decode the answer in hello.txt are found (hello world backward).

In all the code snippets for all challenges, GPT-3’s answer is shown after > . Then I add the answer that is shown afterward (i.e. the output of GPT-3’s command). Thereafter, GPT-3 can use that output to continue typing after > again, iteratively gaining more information. For example:

> GPT writes something
I evaluate the results and put it here
> GPT can use that information and type here again
...

Let’s see if GPT-3 can solve the challenge:

> ls
look_here.txt hello.txt
> cat look_here.txt
The answer lies in hello.txt but the text is backward!
> rev hello.txt
hello world
> answer hello world
You won!

GPT-3 solved it handily! I liked the use of rev. Let’s ramp up the difficulty.

Challenge 2 — A bit harder

Here the challenge lies in understanding what prime numbers are and connecting that to the file names. The only files of importance are instructions.txt and 7. In instructions.txt it is stated that “The answer lies in the file with a prime number as the name” (i.e. 7 ). The rest of the files contain random words. In the below code, it successfully solved the puzzle but in most cases, it actually did not. Strangely, it didn’t seem to understand what a prime number was, or at least not most of the time.

> ls
100           7             9             instructions.txt
6             8             answer.txt
> cat instructions.txt
You will find the answer inside the file with a prime number as the name. If you want a hint about what a prime number is, type "prime".
> prime
"A prime number (or a prime) is a natural number greater than 1 that is not a product of two smaller natural numbers.", from Wikipedia.
> ls
100           7             9             instructions.txt
6             8             answer.txt
> cat 7
Wildlife
> cat 8
Rain
> cat 9
Frog
> answer Wildlife
You won!

Challenge 3 — The hardest task

This is the hardest task given to GPT-3. The idea is to have 8 files with the names of 1.txt to 8.txt. Inside each file is a letter. Then in instructions.txt there are numbers separated by commas representing the orders of the files and in turn letters to construct the final answer. This requires thinking a bit out of the box since there are no explicit instructions, but also an understanding of the pattern of implicitly mapping the filename to the letter and using it later. Is GPT-3 capable of this? Let’s see:

> ls
1.txt             4.txt             7.txt
2.txt             5.txt             8.txt
3.txt             6.txt             instructions.txt
> cat 1.txt
d
> cat 2.txt
i
> cat 3.txt
t
> cat 4.txt
c
> cat 5.txt
a
> cat 6.txt
s
> cat 7.txt
e
> cat 8.txt
n
> cat instructions.txt
1,5,3,5,_,6,4,2,7,8,4,7
> answer secret
Incorrect.
> answer text
Incorrect.
> answer case
Incorrect. You lost!

It failed. After this, I tried giving it multiple hints and retrying a couple of times but to no avail.

Next, I simplified the problem by only keeping two files and adding more explicit instructions:

> ls
mapping       order_of_letters
> cat mapping
1-d
2-i
3-t
4-c
5-a
6-s
7-e
8-n
> cat order_of_letters
Map the numbers in this file to the letter on the right side of the number in the file "mapping"
1,5,3,5,_,6,4,2,7,8,4,7
> answer cat
Incorrect.
> answer date
Incorrect.
> answer send
Incorrect. You lost!

It failed again. I tried several variations, but it seems like GPT-3 was not capable of making the connection.

Conclusion

GPT-3 could easily solve the first challenge, which involved mostly following the instructions and using the right shell commands.

In the second challenge, it was able to solve the problem a couple of times, but given the number of times it failed, it may just have been luck!

In the third challenge, it also fell short, regardless of changes and simplifications. I do think most humans would be able to solve the third challenge. While it may take a minute or two to understand it, there aren’t really that many patterns to choose from. Thus, I think a human would be able to solve it quite quickly.

This illustrates some of the weaknesses of GPT-3. In a situation such as this one, involving multiple complexities (e.g. command line, file names, text in files, mappings) with an unknown objective, it simply failed to take into account every piece of information and estimate what is most reasonable. Perhaps GPT-3 can only extrapolate when only a few dimensions are changed but fails with too many. There are likely not many puzzles like this in its training data.

My conclusion from this experiment is that GPT-3 is far from achieving human-level cognition. What do you think?

Thanks for reading!

If you’re interested in reading more articles about AI, check out my reading lists below:

AI

medium.com

If you’d like to get a Medium membership you can use my referral link if you wish. Have a nice day.

Summarize

AI

GPT-3 Taking On The Command Line: Can It Solve Puzzles?

Challenge 1 — Warm-up

Challenge 2 — A bit harder

Challenge 3 — The hardest task

Conclusion

AI