4 - Sub- or Nested Queries

Table of Contents

coming soon

Types of Query Results

In order to understand when and why you might use sub-queries, it is important to first understand the results that we get from non nested queries. All queries produce a table of data, regardless of the number of columns or rows that are returned; the results is still a table. However, looking closer we can determine that not all tables are the same and that tables that look one way might actually not be what they appear to be.

So query results can generally be divided into 3 hierarchical categories.

It is sometimes misleading when you get a result that gives a single value (1 column and 1 row) but it cannot be considered a scalar, because if in the future the data changes and you could get more than one row, than it cannot be a scalar, but is a list.

Note that scalars are also lists and tables, lists are tables, but not scalars, and tables are tables, but not lists or scalars.

Examples

All of the following SELECT statements will produce tables, but cannot be classified as lists or scalars.

Tables

    SELECT * FROM employees;

    SELECT firstname, lastname 
    FROM employees;

    SELECT employeeID, firstName, lastName 
    FROM employees 
    WHERE lower(lastname) LIKE 'jones' AND lower(firstname) LIKE 'andrew';

The third example above has multiple columns and looks like it would return a single row, but could have multiple rows as well.

Lists

    SELECT employeeID FROM employees;

    SELECT lastname FROM employees;

    SELECT employeeID
    FROM employees 
    WHERE lower(lastname) LIKE 'jones' AND lower(firstname) LIKE 'andrew';

The third example above can sometimes be confusing because we are obviously intending to find a single person and return their one employeeID, but there is no guarantee that there is not more than one 'Andrew Jones' working for the company, and therefore it is possible to get more than one row returned, and thus it can not be considered a scalar.

Scalar

    SELECT count(employeeID) FROM employees;

    SELECT max(orderNumber)
    FROM orders;

    SELECT teamName
    FROM teams
    WHERE teamID = 12;

All three of the above queries can only ever return a single column and single row and therefore can be considered scalar queries.

Why is Query Type Important

The reason why we must consider the type of query is because we are about to embark on the implementation of sub-queries. Inside of SELECT statements, we can place alternative sources for the information by nesting queries or placing a sub-query inside another query. The type of query that can be placed will depend on where it will be placed.

Inserting a sub-query in the FROM clause will require a table (remembering that both lists and scalars are also tables).

Sub-queries can also be used within the WHERE clause can include lists and scalars.
    - Placing a sub-query after an "=" will require a scalar query as the comparison operator requires a single value to be compared.
    - Placing a sub-query within an IN() comparison operator will accept a list query, as the IN operator contains a comma separated list.

In advanced queries, the IN comparator can accept multiple columns on both sides.
SELECT x, y, title
FROM coordinates
WHERE (x, y) IN ( (3,4), (5,6), (7,3), (-2,8) );
is a possible use of the IN comparator including multiple columns on both sides.

Examples

Let us learn through examples using the following table schema.

Example 1

List all the employees, by name, whom work in the city of Seattle! (Do not use JOINS)


We will start this solution by looking at the information we know (City = Seattle) and obtaining the location_id for that record or records

-- STEP 1
SELECT location_id FROM locations WHERE lower(city) = 'seattle';
-- let us say this query return the value of 1700
-- it must also be noted that although we only got 1 result, it is possible to get more than one, 
--     so this must be considered a list query and not a scalar Query

Now using this result of 1700 we can get the departments that are in that location or locations.

-- STEP 2
SELECT department_id
FROM departments
WHERE location_id IN (1700);
-- note that we used IN rather than =, because the previous query may have produced more than one value
-- we might get results like 10, 90, 110, 190

having returned a single column list of department_id's gives us then the opportunity to get the employee information whom work in those departments.

-- STEP 3
SELECT employee_id, first_name, last_name
FROM employees
WHERE department_id IN (10, 90, 1010, 1090);
-- this should give us the list employees whom work in Seattle.

so although we now have the answer we were looking for, we have created 3 queries and hard coded values at more than one step. We should create queries that are more dynamic and allow us to use the same query for the results to many cities directly.

-- first we will replace 'Seattle' from Step 1 above with a parameter
SELECT location_id 
FROM locations 
WHERE lower(city) LIKE '%' || TRIM(lower(&cityInput)) || '%';
-- this allows us to find the location_ids from any city typed in by a user.

-- Next we will take step 2 and replace the value 1700 with the query that got us that value, with the parameter
SELECT department_id
FROM departments
WHERE location_id IN (
    SELECT location_id 
    FROM locations 
    WHERE lower(city) LIKE '%' || TRIM(lower(&cityInput)) || '%'
    );

-- Now we take step 3 and replace the hard coded values with the query above that would produce those values anyways
SELECT employee_id, first_name, last_name
FROM employees
WHERE department_id IN (
    SELECT department_id
    FROM departments
    WHERE location_id IN (
        SELECT location_id 
        FROM locations 
        WHERE lower(city) LIKE '%' || TRIM(lower(&cityInput)) || '%'
        )
    );
-- first we will replace 'Seattle' from Step 1 above with a parameter
DECLARE @city varchar(20);
SET @city = 'Seattle';

SELECT location_id 
FROM locations 
WHERE lower(city) LIKE '%' + TRIM(lower(@city)) + '%';

-- this allows us to find the location_ids from any city typed in by a user.

-- Next we will take step 2 and replace the value 1700 with the query that got us that value, with the parameter
SELECT department_id
FROM departments
WHERE location_id IN (
    SELECT location_id 
    FROM locations 
    WHERE lower(city) LIKE '%' + TRIM(lower(@city)) + '%'
    );

-- Now we take step 3 and replace the hard coded values with the query above that would produce those values anyways
SELECT employee_id, first_name, last_name
FROM employees
WHERE department_id IN (
    SELECT department_id
    FROM departments
    WHERE location_id IN (
        SELECT location_id 
        FROM locations 
        WHERE lower(city) LIKE '%' + TRIM(lower(@city)) + '%'
        )
    );

In both MS SQL and Oracle SQL, this final query above will now let us get the employees from any city at any time as typed in by a parameter. Much more useful and reusable.

So far we have seen examples where the sub-query is nested within the WHERE clause. We can also use Tabular queries as the data source for queries as well.

Example 2

List all employees whom work in Seattle whose first name starts with an 'S'.


we will start where we ended with the last example!
SELECT employee_id, first_name, last_name
    FROM employees
    WHERE department_id IN (
        SELECT department_id
        FROM departments
        WHERE location_id IN (
            SELECT location_id 
            FROM locations 
            WHERE lower(city) LIKE '%' || TRIM(lower(&cityInput)) || '%'
            )
        );
SELECT employee_id, first_name, last_name
    FROM employees
    WHERE department_id IN (
        SELECT department_id
        FROM departments
        WHERE location_id IN (
            SELECT location_id 
            FROM locations 
            WHERE lower(city) LIKE '%' + TRIM(lower(@city)) + '%'
            )
        );

and then wrap those queries with a simple query!

SELECT * 
FROM employees
WHERE upper(first_name) LIKE 'S%';
A simple statement that does not yet take into account employees in Seattle yet!

becomes ....

SELECT * 
FROM (
    SELECT employee_id, first_name, last_name
    FROM employees
    WHERE department_id IN (
        SELECT department_id
        FROM departments
        WHERE location_id IN (
            SELECT location_id 
            FROM locations 
            WHERE lower(city) LIKE '%' || TRIM(lower(&cityInput)) || '%'
            )
        )
    )
WHERE upper(first_name) LIKE 'S%';    
DECLARE @city varchar(20);
SET @city = 'Seattle';

SELECT * 
FROM (
    SELECT employee_id, first_name, last_name
    FROM employees
    WHERE department_id IN (
        SELECT department_id
        FROM departments
        WHERE location_id IN (
            SELECT location_id 
            FROM locations 
            WHERE lower(city) LIKE '%' + TRIM(lower(@city)) + '%'
            )
        )
    )
WHERE upper(first_name) LIKE 'S%';

so sub-queries can be used as the course for a FROM clause as well. This is very similar to the use of a view, in fact sub-queries are used in conjunction with views very often to help assist in creating object oriented SQL code.

Example 3

List the employees, by name, and the department name from which they work, without using joins.


So to get information, that will be an output column, in the output without using joins, it is important to get information from multiple tables. We can utilize the fact that sub-queries can also be implemented in the SELECT clause in addition the the previous examples using the WHERE and FROM clauses.

SELECT
    first_name,
    last_name,
    (   SELECT department_name
        FROM departments
        WHERE department_id = e.department_id )
FROM employees e
ORDER BY last_name, first_name;

In the above solution, we can insert the employees department_id into the WHERE clause of the sub-query by using the e table alias.

If we also recall from order of execution, that the SELECT clause executes iteratively, as a loop, through each row of the result set. Therefore, the sub-query is run on each row of the employees table and thus we can get the specific department_id for each row.