A gentle Introduction to SQL (Part II)


Chainlink Fence

This is part 2 of a series. The first can be found here

The scripts I use to create the tables and insert the data we are going to use in the rest of the tutorial are in github here

The examples I’m using is that of a small College that maintains a small amount of data in a MySQL database. The tables are;

In the previous article, we talked about the basic SQL query commands (SELECT, JOIN). In this article, we go a bit deeper with more complex SQL queries.

GROUP BY

The GROUP BY clause groups similar rows/columns, and is usually used in combination with other query modifiers/functions. For example;

Show me how much each student should be paying this semester based on the classes that they have enrolled in. Order the results from students who owe the most to students who owe the least.

Let us break this more complex question into a series of simpler questions.

First, we know that we need to get student information (stored in the students table) as well as what they are enrolled in (enrollments). We also know that in order to find out what courses they are enrolled in, we need to join enrollments to courseInstructor . To get how much each course costs, we would join to coursePricings.

Lets look at this for a single student

Show me what courses Awori Otieno is enrolled in, as well as how much each course costs.

SELECT c.name, cp.price from enrollments e
JOIN students s
    ON s.studentId = e.studentId
JOIN courseInstructors ci
    ON ci.courseInstructorId = e.courseInstructorId
JOIN courses c
    ON c.courseId = ci.courseId
JOIN coursePricings cp
    ON cp.type = c.type
WHERE s.firstName = Awori
    AND s.lastName =Otieno


Results -----------------------------------


name               price
              
College Algebra  0000002000
Figure Drawing   0000002000

Awori Otieno is enrolled in 2 courses, both of which cost $2000 each

Now, I’d like to know how much Awori Otieno owes for this semester. I could do this the hard way, fetch the rows above, and then programmatically sum the price column. If I wanted to do this for every student I would have to repeat the above procedure for each and every student.

But there is a better way using plain SQL). Enter GROUP BY!

How much should Awori Otieno pay this sememster?

SELECT s.firstName, s.lastName, SUM(cp.price)
FROM enrollments e
JOIN CourseInstructors ci
    ON ci.courseInstructorId = e.courseInstructorId
JOIN students s
    ON s.studentId = e.studentId
JOIN courses c
    ON c.courseId = ci.courseId
JOIN coursePricings cp
    ON cp.type = c.type
WHERE s.firstName = Awori
    AND s.lastName =Otieno
GROUP BY s.studentId

Results -----------------------------------

firstName  lastName SUM(cp.price)
                     
Awori      Otieno   4000

which makes sense (2000 + 2000 = 4000)

Here is the fully constructed query to get how much each student owes sorted from highest to lowest.

Show me what each student should pay this semester, and order it from highest to lowest.

SELECT s.firstName, s.lastName, SUM(cp.price)
FROM enrollments e
JOIN CourseInstructors ci
    ON ci.courseInstructorId = e.courseInstructorId
JOIN students s
    ON s.studentId = e.studentId
JOIN courses c
    ON c.courseId = ci.courseId
JOIN coursePricings cp
    ON cp.type = c.type
GROUP BY s.studentId
ORDER BY SUM(cp.price)  DESC

Results -----------------------------------

firstName, lastName, SUM(cp.price)
                   
Claire,    Donovan,  11000
Samuel,    Johnston, 10500
Maureen,   Solano,   10500
John,      Kamaru,   6500
Xie,       Chung,    6500

the SUM function, in conjunction with the GROUP BY clause, will add up all similar rows. So if a student has 3 rows in the enrollments, in the query above the price for each course would be summed into one. I can specify how I want the data ordered (ORDER BY) i.e. order by the specified column (SUM(cp.price)) and DESC is shorthand for DESCENDING, meaning the rows will be ordered from highest to lowest.

LEFT JOIN

Most of the SQL we have seen so far has been about finding data, but the LEFT JOIN, a variant of JOIN, behaves a bit differently. When you LEFT JOIN table A to table B, you will get one row for each record in table A, rows in table B will also be returned (if they match on the join column), but if a row in Table A has no associated row in table B, then a NULL row will be returned from Table B.

Why would anyone want that?

Lets look at a practical example. One of our students is a notorious hacker, and he/she has realized that we determine how much a student owes by looking at the accounts. Therefore, if we don’t have a student record in the accounts table, we don’t know to charge the student. Genius! (How the student was able to access the database is left up to your imagination)

So our hacker has hacked the database and dropped her row from the accounts table.

/* I am Hacker, hear me roar/hack!!! . They will never catch me!! */
DELETE from accounts WHERE studentId = 1023;

So how do we foil the hackers nefarious plan to get free education (a worthy goal, just illegal). None of the queries so far will tell us that something is amiss.

Show me students who are currently enrolled including those missing a row from the accounts table when they shouldn’t

So, lets start with our regular JOIN (we need to join to the enrollments table because it is possible for a student to not have any rows in the accounts table because they are not currently enrolled in any courses). The DISTINCT is needed because we will get a single row for each student enrollment, so if they are enrolled in more than one course. DISTINCT will collapse similar results into a single row.

SELECT DISTINCT s.firstName, s.lastName, a.total, a.paid, a.balance, e.studentId
FROM students s
JOIN enrollments e
    ON e.studentId = s.studentId
LEFT JOIN accounts a
    ON s.studentId = a.studentId;

Results -----------------------------------

firstName, lastName,  total,                    paid,     balance
                   

Sarah,      Abebi,    0000004500,  0000000000,  0000000000,1022
Allison,    Hacker,   NULL,        NULL,        NULL,      1023

Only show me the rows that are missing

SELECT DISTINCT s.firstName, s.lastName, a.total, a.paid, a.balance, e.studentId
FROM students s
JOIN enrollments e
    ON e.studentId = s.studentId
LEFT JOIN accounts a
    ON s.studentId = a.studentId
WHERE a.total IS NULL;

Results -----------------------------------

firstName, lastName, total, paid, balance,studentId
                   
Allison,    Hacker,  NULL,  NULL, NULL,    1023

the IS NULL construct checks for the absence of a value (keep in mind = NULL will not work)

We have successfully thwarted Allison Hackers attempt to get free education (though considering how much we are charging who could blame her for trying)!

Article originally published here

2022

2020

2019

2017

2016

2012

2008