Data Cleaning: 10 Essential SQL Queries for Data Scientists
Effective data cleaning is a crucial step in the data science pipeline. SQL, a powerful language for managing and querying databases, offers a variety of tools to streamline the data cleaning process. In this article, we’ll explore 10 essential SQL queries that data scientists can leverage to clean and prepare their datasets for analysis.

Section 1: Removing Duplicates
1.1 Remove Exact Duplicates
DELETE FROM your_table
WHERE rowid NOT IN (
SELECT MAX(rowid)
FROM your_table
GROUP BY column1, column2, ...
);1.2 Identify and Remove Partial Duplicates
DELETE FROM your_table
WHERE column1 IS NULL OR column2 IS NULL;Section 2: Handling Missing Values
2.1 Remove Rows with NULL Values
DELETE FROM your_table
WHERE column1 IS NULL OR column2 IS NULL;2.2 Fill NULL Values with Defaults
UPDATE your_table
SET column1 = 'default_value'
WHERE column1 IS NULL;Section 3: Data Standardization
3.1 Convert Text to Uppercase
UPDATE your_table
SET column1 = UPPER(column1);3.2 Trim Whitespace
UPDATE your_table
SET column1 = TRIM(column1);Section 4: Date and Time Manipulation
4.1 Convert String to Date
UPDATE your_table
SET date_column = TO_DATE(date_string, 'YYYY-MM-DD');4.2 Extract Year/Month/Day
SELECT EXTRACT(YEAR FROM date_column) AS year,
EXTRACT(MONTH FROM date_column) AS month,
EXTRACT(DAY FROM date_column) AS day
FROM your_table;Section 5: Handling Outliers
5.1 Identify and Remove Outliers
DELETE FROM your_table
WHERE column1 < lower_threshold OR column1 > upper_threshold;5.2 Cap Outlier Values
UPDATE your_table
SET column1 = CASE
WHEN column1 < lower_threshold THEN lower_threshold
WHEN column1 > upper_threshold THEN upper_threshold
ELSE column1
END;Conclusion:
These 15 SQL queries provide a solid foundation for data cleaning tasks. Depending on your specific dataset and requirements, you can customize and combine these queries to address unique challenges. Remember to always work on a copy of your data to avoid accidental data loss, and iteratively refine your cleaning process as you gain insights from the data. By mastering these SQL queries, you’ll enhance your ability to transform raw data into a clean and reliable foundation for analysis.
SQL Fundamentals
Thank you for your time and interest! 🚀 You can find even more content at SQL Fundamentals 💫






