Only Use LLMs If You Know How to Do the Task on Your Own
Otherwise you may end up with silent mistakes or harsh consequences
To most of us (or all of us), LLMs are mysterious boxes that get complicated things done surprisingly quickly. We’re usually not interested in the “how” part as long as they give us what we need.
ChatGPT, and other LLMs, are surely a productivity booster. They can easily handle a variety of tasks, which otherwise would be tedious and time-consuming.
However, we can’t always rely on them entirely. For instance, when it comes to data analysis, how can we make sure ChatGPT’s insights about the data are accurate? Yes, it knows Pandas, a popular data analysis library, but what if it makes a mistake? Or, what happens if it partially does the task and fails to perform the rest?
The best solution that complements ChatGPT is you. You need to know how to do the task on your own so that:
- You can make sure ChatGPT’s solution is correct.
- You can substitute ChatGPT when it fails to perform or doesn’t know how to.
In this article, I’ll show you three examples that back up my aforementioned two claims.
Example 1: Data cleaning with Pandas
I had a used car dataset that contains prices and some other attributes of used cars. It was kind of a messy dataset and required lots of cleaning. I used the ChatGPT Advanced Data Analysis (ADA) plugin for this task.
It performed well for the most part. However, there was a very specific operation that ChatGPT ADA failed to do. Hence, the task couldn’t be completed if I didn’t know how to do it myself.
Let me first explain the part ChatGPT ADA could not handle.
Probably due to data-entry errors, some rows of the make (i.e. car brand or manufacturer) column include year values. When I investigated these rows, I realized that the make values were recorded in the model column. To fix this, the make part needs to be extracted from the model column and used for replacing the year value in the make column.
Here is how to fix it:
I asked ChatGPT ADA several times but it failed to provide an approach for fixing this issue. Its best shot was to replace these year values with “Unknown”.
Then, I wrote the following prompt in excruciating detail to get ChatGPT ADA give me the Python code needed to solve the task:
For rows in which the make value is actually a year value, the model column includes the correct make value. For these rows, extract the first word of the value in the model column and use it to replace the year value in the make column.
Here is the code written by ChatGPT ADA for this specific issue:
# Identify rows where 'make' is a year value
is_year = data['make'].str.isnumeric() & data['make'].str.len() == 4
# For these rows, split the 'model' column and get the first word
correct_make = data.loc[is_year, 'model'].str.split().str.get(0)
# Update the 'make' column with the correct values
data.loc[is_year, 'make'] = correct_make
It’s not the most optimal solution but does the task.
Example 2: Replacing missing values
The dataset I was trying to clean had missing values (indicated with “other”, “unknown”, or None) in a column. I asked ChatGPT to replace them with the most frequent value in the column.
Here is ChatGPT’s response:
most_common_cylinder = data['cylinders'].mode()[0]
data['cylinders'] = data['cylinders'].apply(lambda x: most_common_cylinder if "cylinders" not in x else x)
It’s correct as it replaces the values that do not contain “cylinders” with the most common value. However, it includes the use of the apply
function, which is not suggested when working with large datasets. The apply
function is not a vectorized operation and can be a performance bottleneck.
A better approach that uses a vectorized operation is as follows:
df.loc[~df["cylinders"].str.contains("cylinders"), "cylinders"] = df["cylinders"].mode()[0]
If I didn’t know Pandas, I wouldn’t be able to realize the use of the apply
function could potentially cause performance issues and look for alternative solutions.
Example 3: Writing a unit test in a more Pythonic way
I wanted to test if ChatGPT can improve unit tests or make them more Pythonic.
I wrote the following unit test, which is actually quite simple:
def test_query(submission):
query = submission.query
assert query.lower().count("where") == 1
When I asked ChatGPT to improve it, what I expected was a small update as follows:
def test_query(submission):
assert submission.query.lower().count("where") == 1
The second version eliminates the creation of the intermediate variable query
, which is unnecessary.
At the first attempt, ChatGPT wrote the unit test as follows:
# first solution
def test_query(submission):
query = submission.query
assert query.count("where", flags=re.IGNORECASE) == 1
This is wrong. The count
method doesn’t have a flags
parameter. Also, how is this simpler (or more Pythonic) than my first attempt?
The second try was correct but still not any simpler.
# second solution
import re
def test_query(submission):
query = submission.query
assert len(re.findall(r'where', query, flags=re.IGNORECASE)) == 1
Then, I told ChatGPT this was not simpler than my solution and suggested using the following (which was what I had in mind):
def test_query(submission):
assert submission.query.lower().count("where") == 1
ChatGPT approved my new suggestion by accepting it was more compact and Pythonic.
Final thoughts
The example use cases I showed in this article do not make ChatGPT or other LLMs any less useful. I’ve used it for many different tasks and got satisfying results.
What I wanted to emphasize is that they can make mistakes. Some of these mistakes are clear and some can be silent. To make sure you have accurate results, keep an eye on how ChatGPT does what it does. I suggest not relying on it entirely on tools you have no idea about. You can still use it for learning new tools but make sure you test it before taking any important action.
If you liked the article, make sure you clap and comment to help me earn more. Follow me for more content on Python, Data Science, Machine Learning, and AI.
Thank you for reading. Please let me know if you have any feedback.