Using OpenAI Codex to solve programing challenges
OpenAI’s Codex has opened the world to the general use of AI to write code, but just how good is it at coding?

What is OpenAPI Codex ?
OpenAI’s Codex is an AI system that translates natural language to code.
This means that we can supply it with a description of the intended problem to solve and the AI will genereate the corresponding code.
It is proficient in more than a dozen programming languages and powers the GitHub Copilot funcionality.
OpenAI release the public beta access to Codex with their API, you can register at: https://openai.com/join
How are we going to test it ?
I’ve recently obtain access to the public beta and have been testing Codex and it’s code generation capabilities.
According to the documentation, Codex, is more proficient on Python but can easily generate code in any programming language.
One of the ways to test the code generation capabilities is to solve coding challenges.
In this article I will go trough a series of programming challenges, ask Codex to solve them and then run and check how good is the generated code.
I will use OpenAI Playground feature to interact with Codex and use the ‘davinci-codex’ model (the most powerful one) to generate the code.
Challenge 1 — Fibonacci
The Fibonacci sequence is a well know and common algorithm to solve with code and begins like this: 0, 1, 1, 2, 3, 5, 8, 13, 21, 34 (each number is the sum of the previous two).
Let’s ask Codex to solve the following challenge:
“What is the sum of all odd numbers in the Fibonacci sequence that are less than 10,000?”

The generated code:
def fib():
a = 0
b = 1
while a <= 10000:
a, b = b, a + b
if a % 2 == 1:
yield a
print(sum(fib()))The output:
14328
The answer for this problem is 14328, so Codex got it right!!
But let’s check a small issue with the generated code, the line 11 on the Codex Playground output doesn’t work, it needs to be corrected with the missing parenthesis.
Other than that small issue, Codex solves the problem correcty and with some good designed code.
It passes the first test!!!
Challenge 2 — Palindromes
A palindrome is a word, number, phrase, or another sequence of characters which reads the same backward as forward, such as madam, racecar, or the number 10801.
Let’s ask Codex to solve the following challenge:
“What is the sum of all numeric palindromes that are less than 10,000?”

The generated code:
def is_palindrome(num):
num = str(num)
return num == num[::-1]
def main():
numbers = [x for x in range(1,10000) if is_palindrome(x)]
print(sum(numbers))
if __name__ == '__main__':
main()The output:
545040
The answer for this problem is 545040, so Codex got it right!!
This time there was no need to fix/correct the code generated by Codex, it both run and returned the correct response on the first try.
A quick analysis of the code shows a function to check if a number is a palindrome, filtering the range sequence to only consider those numbers and a final sum. Overall a good piece of code.
It passes the second test!!!
Challenge 3— Legionnaries
Now for a more abstract problem, let’s use roman numberal numbers.
In the range 1–13 (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13) the digit 1 occurs 6 times.
Let’s ask Codex to solve the following challenge:
“In the range, 1–2660 (half the number of Romans in a legion), expressed in Roman numerals, how many times does the numeral “X” occur?”

The generated code:
import re
def make_romans(n):
"""Convert integers to Roman numerals."""
ints = (1000, 900, 500, 400, 100, 90, 50, 40, 10, 9, 5, 4, 1)
nums = ('M', 'CM', 'D', 'CD','C', 'XC','L','XL','X','IX','V','IV','I')
result = []
for i in range(len(ints)):
count = int(n / ints[i])
result.append(nums[i] * count)
n -= ints[i] * count
return ''.join(result)
def count_x(s):
"""Count occurrences of 'X' in a string."""
return len(re.findall(r'[Xx]', s))
def main():
"""Count occurrences of 'X' in Roman numerals."""
num = 0
for i in range(1, 2660):
num += count_x(make_romans(i))
print(num)
if __name__ == "__main__":
main()The output:
3976The answer for this problem is 3977, so Codex got it wrong !!
A quick analysis of the code shows a good division of functions, one to calculate the roman numeral, another to check the presence of ‘X’ and a main function for the range and summing of the values.
There is however a small error, due to the fact that range is exclusive in the stop value, the sequence only goes to 2659 and 2660 in roman numerals is MMDCLX.
This means that the code generated by Codex fails to account for the last ‘X’ in the last value because it doesn’t implement correctly the upper limit.
It fails the third test!!!
Conclusion
What can we conclude from testing OpenAI’s Codex with these programming challenges?
In all 3 examples, the produced code is high quality, some of it even with comments detailing the purpose of each function.
Functions to properly do code separation were used and overall the logic is sane and except for the last test, all of them produced the correct result.
Watching the OpenAI live demo and see how they use Codex to quickly produce a game in a conversational style interaction with Codex is quite powerfull, check it out:






