avatarAsad iqbal

Summary

The provided web content is a comprehensive guide on learning SQL, detailing data manipulation, querying, aggregate functions, and working with multiple tables, along with practical examples and references for further learning.

Abstract

The web content serves as an educational resource for individuals looking to understand and utilize SQL for data management and analysis. It begins by emphasizing the importance of SQL in various fields, including data science and analytics, and then proceeds to cover key SQL concepts and operations. The guide explains how to create and manipulate tables using SQL commands, demonstrates how to perform complex queries with conditions and sorting, and introduces aggregate functions for data summarization. It also discusses the use of constraints to maintain data integrity and illustrates how to work with multiple interconnected tables. The article concludes with additional resources for readers to further their understanding of SQL and related technologies.

Opinions

  • The author believes in the significance of SQL as a tool for handling and analyzing data effectively, regardless of one's technical background.
  • The guide is structured to cater to both beginners and those looking to refresh their SQL knowledge, suggesting a broad target audience.
  • The inclusion of practical examples and code snippets indicates the author's commitment to providing hands-on learning experiences.
  • The mention of external resources, such as books and online tutorials, reflects the author's view on the value of continuous learning and the importance of diverse educational materials.
  • The encouragement for readers to engage with the content by clapping, following the author, and visiting associated websites suggests the author values community interaction and feedback.

Learn SQL

Learn how to manage large datasets and analyze real data

Information is everywhere around us, and knowing how to use it effectively is a important skill. SQL (Structured Query Language) is a language used in data science, analytics, and engineering to handle and analyze data. By learning SQL basics, you can manage large datasets, extract meaningful insights, and make better decisions in various fields, even if you’re not in a technical role. This article will guide you through the importance of SQL, helping you understand how to work with data and perform important operations like creating tables, running queries, and using aggregate functions.

Image by Gerd Altmann from Pixabay

What we are going to discuss:

  1. Data Manipulation: Creating, updating, deleting, and altering tables involves several commands. To create a new table, you can use the CREATE TABLE command, providing the table name and column details. Use the INSERT INTO command to insert data into a table, specifying the target table name and the data values. Modify existing tables with the ALTER TABLE command, adding new columns or changing properties. Delete records with the DELETE command, specifying the table name and optional filters. Update records with the UPDATE command, setting new values for specific columns and applying filters as needed.
  2. Queries: Querying data enables analysis and decision making. Combine conditions with logical operators such as AND, OR, and LIKE. Alias columns or tables with the AS keyword, renaming items temporarily for readability. Sort results using the ORDER BY clause with ASC (ascending) or DESC (descending) keywords. Filter ranges of values with the BETWEEN operator, while comparing patterns with the LIKE operator. Select unique entries with the DISTINCT keyword, and control displayed records with the LIMIT keyword. Finally, handle missing values using IS NULL and IS NOT NULL comparisons within the WHERE clause.
  3. Aggregate Functions: Analyze your dataset by summarizing data points. Count elements with the COUNT() function, calculating minimums with MIN(), maximums with MAX(), averages with AVG(), and totals with SUM(). Group related records together using the GROUP BY clause, optionally ordering these subsets with the ORDER BY clause. Narrow down groups meeting specific criteria with the HAVING clause, allowing advanced conditional aggregation.
  4. Working with Multiple Tables: Work efficiently with complex data structures involving multiple interconnected tables. Perform inner joins with the JOIN clause, returning results satisfying the specified join condition. Utilize cross joins via the CROSS JOIN clause, generating all possible combinations across two tables. Merge results from separate queries with the UNION clause, eliminating duplicate records automatically. Lastly, store intermediate results using the WITH clause, enabling reusable subqueries throughout your code.

1. Manipulation

Column Constraints

When you create a table in SQL, you can set certain rules for the columns, known as constraints. These rules ensure that the data in your table follows specific guidelines.

  • PRIMARY KEY constraint can be used to uniquely identify the row.
  • UNIQUE columns have a different value for every row.
  • NOT NULL This column must always have a value; it can’t be empty.
  • DEFAULT If you don’t provide a value for this column, it will automatically use a default value that you specify.

There can be only one PRIMARY KEY column per table and multiple UNIQUE columns.

CREATE TABLE students (
  student_id INT PRIMARY KEY,
  student_name VARCHAR(100) UNIQUE,
  student_grade INT NOT NULL,
  student_age INT DEFAULT 15
);

CREATE TABLE Statement

The CREATE TABLE statement is how you make a new table in your database. You define the table’s name and its columns.

CREATE TABLE employees (
  employee_id INT,
  first_name VARCHAR(50),
  last_name VARCHAR(50),
  hire_date DATE
);

INSERT Statement

The INSERT INTO statement is used to add a new record (row) to a table.

It has two forms as shown:

  • Insert into columns in order.
  • Insert into columns by name.
-- Insert values in order:
INSERT INTO employees
VALUES (101, 'John', 'Doe', '2024-01-15');
-- Insert values by column name:
INSERT INTO employees (employee_id, last_name, first_name)
VALUES (102, 'Smith', 'Jane');

ALTER TABLE Statement

The ALTER TABLE statement is used to modify the columns of an existing table. When combined with the ADD COLUMN clause, it is used to add a new column.

ALTER TABLE employees
ADD department VARCHAR(50);

DELETE Statement

The DELETE statement is used to remove rows from a table. You can specify which rows to delete using the WHERE clause. If you don’t use WHERE, all rows in the table will be deleted.

DELETE FROM employees
WHERE employee_id = 101;

UPDATE Statement

The UPDATE statement is used to edit records (rows) in a table. It includes a SET clause that indicates the column to edit and a WHERE clause for specifying the record(s).

UPDATE employees
SET last_name = 'Brown'
WHERE employee_id = 102;

2. Queries

AND Operator

The AND operator lets you combine multiple conditions in a query. All conditions must be true for the rows to be included in the results. For example, to find cars that are red and were manufactured in 2015 or later, you would write:

SELECT model 
FROM vehicles 
WHERE color = 'red' 
  AND year >= 2015;

AS Clause

You can use the AS clause to give columns or tables a temporary name (alias) in your result set. For example, if you want to display the title column as "book_title," you can do this:

SELECT title AS 'book_title'
FROM library;

OR Operator

The OR operator allows you to include rows that meet at least one of several conditions. For example, to find customers from either Florida (FL) or Texas (TX), you would use:

SELECT customer_name
FROM orders 
WHERE state = 'FL' 
   OR state = 'TX';

% Wildcard

The % wildcard is used with the LIKE operator to match patterns. It can stand for any sequence of characters. For example, to find products that start with "Pro," you would use:

SELECT product_name
FROM products
WHERE product_name LIKE 'Pro%';

SELECT Statement

The SELECT statement retrieves specific columns from a table. For example, this query will return the name and year columns from the "movies" table:

SELECT name, year
FROM movies;

_ Wildcard

The _ wildcard is used with the LIKE operator to match any single character. For instance, to find books where the title starts with "H" followed by four characters and ending with "y," you would write:

SELECT title
FROM books
WHERE title LIKE 'H____y';

ORDER BY Clause

The ORDER BY clause can be used to sort the result set by a particular column either alphabetically or numerically. It can be ordered in two ways:

  • DESC is a keyword used to sort the results in descending order.
  • ASC is a keyword used to sort the results in ascending order (default).
SELECT *
FROM contacts
ORDER BY birth_date DESC;

LIKE Operator

The LIKE operator can be used inside of a WHERE clause to match a specified pattern. The given query will match any movie that begins with Star in its title.

SELECT name
FROM movies
WHERE name LIKE 'Star%';

DISTINCT Clause

Unique values of a column can be selected using a DISTINCT query. For a table contact_details having five rows in which the city column contains Chicago, Madison, Boston, Madison, and Denver, the given query would return:

  • Chicago
  • Madison
  • Boston
  • Denver
SELECT DISTINCT city
FROM contact_details;

BETWEEN Operator

The BETWEEN operator can be used to filter by a range of values. The range of values can be text, numbers, or date data. The given query will match any movie made between the years 1980 and 1990, inclusive.

SELECT *
FROM movies
WHERE year BETWEEN 1980 AND 1990;

LIMIT Clause

The LIMIT clause is used to narrow, or limit, a result set to the specified number of rows. The given query will limit the result set to 5 rows.

SELECT *
FROM movies
LIMIT 5;

NULL Values

Column values can be NULL, or have no value. These records can be matched (or not matched) using the IS NULL and IS NOT NULL operators in combination with the WHERE clause. The given query will match all addresses where the address has a value or is not NULL.

SELECT address
FROM records
WHERE address IS NOT NULL;

WHERE Clause

The WHERE clause is used to filter records (rows) that match a certain condition. The given query will select all records where the pub_year equals 2017.

SELECT title
FROM library
WHERE pub_year = 2017;

3. Aggregate Functions

Column References

The GROUP BY and ORDER BY clauses can reference the selected columns by number in which they appear in the SELECT statement. The example query will count the number of movies per rating, and will:

  • GROUP BY column 2 (rating)
  • ORDER BY column 1 (total_movies)
SELECT COUNT(*) AS 'total_movies', 
   rating 
FROM movies 
GROUP BY 2 
ORDER BY 1;

SUM() Aggregate Function

The SUM() aggregate function takes the name of a column as an argument and returns the sum of all the value in that column.

SELECT SUM(salary)
FROM salary_disbursement;

MAX() Aggregate Function

The MAX() aggregate function takes the name of a column as an argument and returns the largest value in a column. The given query will return the largest value from the amount column.

SELECT MAX(amount) 
FROM transactions;

COUNT() Aggregate Function

The COUNT() aggregate function returns the total number of rows that match the specified criteria. For instance, to find the total number of employees who have less than 5 years of experience, the given query can be used.

Note: A column name of the table can also be used instead of *. Unlike COUNT(*), this variation COUNT(column) will not count NULL values in that column.

SELECT COUNT(*)
FROM employees
WHERE experience < 5;

GROUP BY Clause

The GROUP BY clause will group records in a result set by identical values in one or more columns. It is often used in combination with aggregate functions to query information of similar records. The GROUP BY clause can come after FROM or WHERE but must come before any ORDER BY or LIMIT clause.

The given query will count the number of movies per rating.

SELECT rating, 
   COUNT(*) 
FROM movies 
GROUP BY rating;

MIN() Aggregate Function

The MIN() aggregate function returns the smallest value in a column. For instance, to find the smallest value of the amount column from the table named transactions, the given query can be used.

SELECT MIN(amount) 
FROM transactions;

AVG() Aggregate Function

The AVG() aggregate function returns the average value in a column. For instance, to find the average salary for the employees who have less than 5 years of experience, the given query can be used.

SELECT AVG(salary)
FROM employees
WHERE experience < 5;

HAVING Clause

The HAVING clause is used to further filter the result set groups provided by the GROUP BY clause. HAVING is often used with aggregate functions to filter the result set groups based on an aggregate property. The given query will select only the records (rows) from only years where more than 5 movies were released per year.

The HAVING clause must always come after a GROUP BY clause but must come before any ORDER BY or LIMIT clause.

SELECT year, 
   COUNT(*) 
FROM movies 
GROUP BY year
HAVING COUNT(*) > 5;

4. Multiple Tables

Outer Join

An outer join will combine rows from different tables even if the join condition is not met. In a LEFT JOIN, every row in the left table is returned in the result set, and if the join condition is not met, then NULL values are used to fill in the columns from the right table.

SELECT column_name(s)
FROM table1
LEFT JOIN table2
  ON table1.column_name = table2.column_name;

WITH Clause

The WITH clause stores the result of a query in a temporary table (temporary_movies) using an alias.

Multiple temporary tables can be defined with one instance of the WITH keyword.

WITH temporary_movies AS (
   SELECT *
   FROM movies
)
SELECT *
FROM temporary_movies
WHERE year BETWEEN 2000 AND 2020;

UNION Clause

The UNION clause is used to combine results that appear from multiple SELECT statements and filter duplicates.

For example, given a first_names table with a column name containing rows of data “James” and “Hermione”, and a last_names table with a column name containing rows of data “James”, “Hermione” and “Cassidy”, the result of this query would contain three names: “Cassidy”, “James”, and “Hermione”.

SELECT name
FROM first_names
UNION
SELECT name
FROM last_names

CROSS JOIN Clause

The CROSS JOIN clause is used to combine each row from one table with each row from another in the result set. This JOIN is helpful for creating all possible combinations for the records (rows) in two tables.

The given query will select the shirt_color and pants_color columns from the result set, which will contain all combinations of combining the rows in the shirts and pants tables. If there are 3 different shirt colors in the shirts table and 5 different pants colors in the pants table then the result set will contain 3 x 5 = 15 rows.

SELECT shirts.shirt_color,
   pants.pants_color
FROM shirts
CROSS JOIN pants;

Inner Join

The JOIN clause allows for the return of results from more than one table by joining them together with other results based on common column values specified using an ON clause. INNER JOIN is the default JOIN and it will only return results matching the condition specified by ON.

SELECT * 
FROM books
JOIN authors
  ON books.author_id = authors.id;

References

  1. https://www.codecademy.com/learn/learn-sql/modules/learn-sql-manipulation/cheatsheet
  2. https://www.youtube.com/watch?v=p3qvj9hO_Bo

Thanks for reading✨ If you like the article make sure to:

Programming
Machine Learning
Data Science
Python
Sql
Recommended from ReadMedium