The article discusses the capabilities and limitations of OpenAI's ChatGPT as a data scientist, including its performance in simulated interview questions and technical challenges.
Abstract
The article "Can an AI be a data scientist?" explores the potential of OpenAI's ChatGPT in the field of data science. It highlights ChatGPT's impressive language modeling capabilities, its flexibility in various use cases, and its ability to interact in a human-like manner. The model, derived from GPT-3.5 and trained on a vast corpus of text and code, can perform tasks such as writing essays, generating code, and answering questions in a conversational manner. However, the article also points out ChatGPT's limitations, such as producing repetitive or inaccurate content, hallucinating information, and sometimes providing plausible but incorrect answers. An experiment conducted by simulating a data scientist job interview with ChatGPT reveals its proficiency in answering general questions and some technical inquiries. Yet, when faced with more complex problems, such as those found in FAANG interviews, ChatGPT's responses are less reliable, often requiring human intervention to correct errors or incomplete solutions. The article concludes that while ChatGPT is an impressive AI tool, it is not yet ready to replace human data scientists and should be used with caution.
Opinions
The author acknowledges ChatGPT's impressive capabilities, such as its conversational skills and ability to generate code.
Concerns are raised about the potential misuse of AI models like ChatGPT for cheating in academic settings and the risk of misleading users with plausible but incorrect information.
The article suggests that despite ChatGPT's proficiency in handling certain technical questions, it struggles with complex problems that require deep reasoning or a nuanced understanding of context.
The author emphasizes the importance of maintaining a critical attitude towards AI-generated content and the need for methods to distinguish AI-generated text from human-generated text.
The conclusion is drawn that while ChatGPT is a powerful tool, it currently lacks the reasoning abilities necessary to fully function as a data scientist and should not be relied upon as a secret interview aid.
Can an AI be a data scientist?
OpenAI’s ChatGPT is blowing data scientists' minds. Could it steal their job?
Recently OpenAI released its latest model the chatbot: ChatGPT.
In this article, we discuss this model, its limitations and merits, and do a little experiment.
ChatGPT in a Nutshell
ChatGPT is basically a language model capable of dialogue. As its name implies it is based on the GPT 3.5 family (an evolution of GPT3 that was trained on both text and code). ChatGPT is derived from InstructGPT, another OpenAI work to create a model capable of following user instructions using reinforcement learning. One of the strengths is that it has been trained on an impressive amount of text, allowing the model to be flexible for many use cases.
ChatGPT has excited the community because it is currently the best chatbot out there and you can experiment with it.
What can it do?
Interacting with ChatGPT seems like a much smoother and more human discussion than other chatbots so far.
GPT3 can write poetry, and explanations, and provide code. Moreover, you can decide the style in which you want the explanation (for example, described in the Victorian style, explain as if you were a child, etc…)
ChatGPT can write an essay in less than ten seconds (although this is a capability already seen with GPT3)
It could be used as a search engine to answer specific questions, as some suggest.
It can admit and fix its mistake which is actually impressive.
what are the limitations?
The model can hallucinate, like other models presented so far, when ChatGPT does not know the answer it still tries to answer (e.g. by providing wrong or nonsensical information).
However, the essays it produces are often repetitive, sometimes inaccurate, and still require correction.
The model often writes nonsense but is able to make it plausible, so one must still always maintain a critical attitude
Indeed, similar patterns open up some interesting perspectives but also some disturbing questions. For example, the more the models improve the more they could be used by, for example, students to cheat in their assignments. Moreover, the model is capable of generating nonsense (while also making its errors plausible), risking misleading its users. Therefore, we discuss finding ways to identify what is written by artificial intelligence models (a kind of fingerprint that can be recognized).
Can ChatGPT be a data scientist?
I took a cue from this Tweet for this experiment, why not conduct a short interview with ChatGPT on a data scientist position?
The plan is some general questions, then use questions that have actually been used in interviews with DeepMind (or Google, after all, it is the parent company).
Let’s try.
Welcome to this interview. why did you apply for the position of data scientist at DeepMind?
What do you know about the company and the company’s value?
Which project presented by the company excited you the most and why?
What is your experience as a data scientist? Tell me about the project you are most proud of
What are the technical difficulties you encountered and how did you solve them?
I would like to ask you some technical questions. Imagine you have a dataset, where there are a number of missing values. What would you do to resolve this issue?
write the python code for a geometric algorithm for calculating the value of Pi
(The question is taken by glassdoor, I have then tested the results, you can find them in this Google Colab)
Interestingly I asked a similar question: “Write a code calculating a pi number using RNG” but the answer was the same
Definition of positive definite matrix
Write a function to generate N sample from a normal distribution and plot the histogram
How do you invert a binary tree?
Let’s see how it does with something quite difficult. For example, those questions that LeetCode considers difficult. I looked for questions that have been used in FAANG interviews.
The question is not always clear to the chatbot, in some cases like this one after several seconds of waiting you get an error.
Given two sorted arrays nums1 and nums2 of size m and n respectively, return the median of the two sorted arrays.
function median(nums1, nums2):
# Merge the two arrays into one sorted array.
merged = []
i = 0# Index for nums1
j = 0# Index for nums2while i < len(nums1) and j < len(nums2):
if nums1[i] <= nums2[j]:
merged.append(nums1[i])
i += 1else:
merged.append(nums2[j])
j += 1# Add the remaining elements of nums1 (if any) to the end of merged.while i < len(nums1):
merged.append(nums1[i])
i += 1# Add the remaining elements of nums2 (if any) to the end of merged.while j < len(nums2):
merged.append(nums2[j])
j += 1# Find the median of the merged array.iflen(merged) % 2 == 1:
# Odd number of elements.
median = merged[len(merged) // 2]
else:
# Even number of elements.
median = (merged[len(merged) // 2 - 1] + merged[len(merged) // 2]) / 2return median
I tested here the code and it seems to work fine.
Given an unsorted integer array nums, return the smallest missing positive integer.
We can note:
first, it was not necessary to provide pseudocode. After all, we wanted the code in Python and wrote it right away.
Second, the answer is not complete; we probably ran out of tokens per answer.
Also, I tested the LeetCode example (as input it uses an array [3, 4, -1,1]) so probably the LeetCode questions were used to train the algorithm.
Despite this, it seems plausible but if you test the result is wrong
Given n non-negative integers representing an elevation map where the width of each bar is 1, compute how much water it can trap after rain. if the heights are: height = [0,1,0,2,1,0,1,3,2,1,2,1], how much water can be trapped?
The result is impressive, but at the moment we will not have an AI data scientist (and if I were you I would not use him as a secret interview aid).
As we have seen it depends on how one formats the question the model may not be able to understand the question, could be inconsistent, or add superfluous information (sometimes the answer seems stereotypical). Sharad Joshi and Keith McNulty show interesting examples of how things can go wrong (here and here).
ChatGPT is impressive because of the amount of text and code it has been trained with but is incapable of reasoning. On the other hand, the answers even when they are wrong seem plausible so they could be dangerous if used without caution. What do you think? let me know in the comments.
If you have found it interesting:
You can look for my other articles, you can also subscribe to get notified when I publish articles, and you can also connect or reach me onLinkedIn. Thanks for your support!
Here is the link to my GitHub repository, where I am planning to collect code and many resources related to machine learning, artificial intelligence, and more.