When we think about the business world and what questions are being asked in the workplace, it is absolutely impossible to not include questions that include "How much ... ", "How many ... ", "What is the highest ... ", "What is the lowest ... ". With questions like "How much money did we make today?", "How many employees work for us?"," etc.... it is easy to see that some answers to questions will require the aggregation of data across multiple rows of data. To this point in your SQL learning, all `SELECT`

statements have been executed on single rows of data, one at a time. Now we move into the world of multiple-row functions.

So far we have learned 4 clauses in the SELECT statement:

- SELECT
- FROM
- WHERE
- ORDER BY

Now by adding aggregate functions, we introduce 2 new clauses, GROUP BY and HAVING.

SELECT < comma separated field list > FROM < table(s) with/without JOINS > WHERE < conditions not involving aggregate functions > GROUP BY < comma separate field list (group by expression) > HAVING < conditions involving aggregate functions > ORDER BY < comma separated columns >

There are exactly 7 different aggregate functions in SQL, and these 7 will enable the programmer to answer almost any business question. The 7 aggregate functions are:

Function | Description | Example |

SUM() |
the summation (addition) of data | the total owing on a receipt adding up multiple purchased products |

COUNT() |
counting the number of instances (rows) | The number of employees that work at a specific location |

AVG() |
Average - the non-weighted average of a group of given numbers | The average price of products being sold on a website. |

MIN() |
Minimum - the smallest number, the first alphabetical string, and the earliest date chronologically. | The earliest date (first event) in a list of events |

MAX() |
Maximum - the largest number, last alphabetical string, or latest date | The largest number in a set of numbers |

STDDEV() |
Standard Deviation - A statistical measure of the horizontal distribution of data points. | |

VARIANCE() |
Standard Deviation - A statistical measure of the vertical distribution of data points. |

Some examples of the main functions might be:

SELECT count(playerID) AS numPlayers FROM players;

will return the number of rows in the player table. This should product a unique list as *playerid* would be the primary key, so in this case `DISTINCT`

would not be required, but it might be in other cases.

SELECT sum(balanceOwing) AS totalOwing FROM customers;

would return the total owing from ALL customers. This would be an important number for companies to know how much money they should have coming in within the next short time period: assuming customers all pay their bills that is :)

SELECT min(eventDate) AS firstEventDate FROM events;

would return the date of the first event in a table of events, if multiple events had the same date, it would still only return the date once.

It is often required to have more complex answers that involve grouping data rows together. For example: How many players in a players table, does not requiring any grouping, but a slightly different question: How many players play on each team in a league would require you to group players together that play on the same team and then count them. This is achieved through the GROUP BY statement.

SELECT teamName, firstName, lastName FROM players JOIN teams USING (teamID) ORDER BY teamName;

gives us the players on each team, but we can only count them manually because we sorted it by teamName. It is still a manual count. Therefore, if we want the SQL to produce the actual numbers automatically, we need to group the players together by team and then use the `count()`

aggregate function

SELECT teamName, count(playerID) AS numPlayers FROM players JOIN teams USING (teamID) GROUP BY teamName ORDER BY teamName;

gives us the results that we need. However, what if there is a possibility that the teamName is not unique and we therefore need to ensure we are not mixing players from multiple teams.

```
-- THIS CAUSES AN ERROR
SELECT
teamID,
teamName,
count(playerID) AS numPlayers
FROM players JOIN teams USING (teamID)
GROUP BY teamName
ORDER BY teamName;
```

will give us an error as the teamID is not part of an aggregate function nor is it in the `GROUP BY`

clause. *Note: That if we had used ON rather than USING then we would also have an ambiguous field error.*

Clint's Law: Any field in the `SELECT`

clause that is not part of an aggregate function, or a sub-query, MUST be in the `GROUP BY`

clause.*yes I called it my law, I made it up, but it works. Realistically it is just a different way of saying what other books have said, but I feel it simplifies it and makes it an easy to follow rule.*

So because of the above rule, the teamID in the above statement is a field that is not part of the aggregate function, and therefore must be in the `GROUP BY`

. Let us correct this along with the ambiguous issue we had above if we had used ON.

SELECT t.teamID, -- because an inner join is used, it does not matter if this is t.teamID or p.teamID teamName, -- outer joins are different and careful consideration is required. count(playerID) AS numPlayers FROM players p JOIN teams t ON p.teamID = t.teamID GROUP BY t.teamID, teamName ORDER BY teamName;

Caution when grouping by a`PRIMARY KEY`

, as grouping on a primary key can often result in nothing more than groups of one. In this case the`INNER JOIN`

will result in repeated values for the teamID in the out put so it would be okay.

Using the class sample database, How much money goes out for salaries per month per department

SELECT departmentID, sum(monthlySalary) AS totalSalary FROM employees GROUP BY departmentID ORDER BY departmentID; -- but this does not include commission yet SELECT department_id, sum(monthlySalary) AS totalSalary, sum(monthlySalary * IsNull(commissionPercent,0)) AS totalCommission FROM employees GROUP BY departmentID ORDER BY departmentID; -- SELECT department_id, sum(monthlySalary) AS totalSalary, sum(monthlySalary * IsNull(commissionPercent,0)) AS totalCommission, sum(monthlySalary) + sum(monthlySalary * IsNull(commissionPercent, 0)) AS totalIncome, sum(monthlySalary + monthlySalary * IsNull(commissionPercent, 0)) AS totalIncome2, count(departmentID) AS numEmployee FROM employees GROUP BY departmentID ORDER BY departmentID;

Using the class sample database, What is the average sales commission per month?

SELECT Avg(commissionPercent) AS AvgComm FROM employees;

results in 0.2125 (does this make sense)

*let us do this manually*

-- 0.2 + 0.3 + 0.2 + 0.15 = 0.85 -- 0.85 / 4 = 0.2125

But now we can question this answer, as the average is calculated based on the 4 employees whom receive a commission, but there are 20 employees? So the `avg()`

aggregate function simply ignores the *NULL* values present and only counts records with numeric values. But, does the average commission involve only those employees whom receive commission or is it the average commission for all employees; which would be:

-- 0.2 + 0.3 + 0.2 + 0.15 = 0.85 -- 0.85 / 20 = 0.0425

In order to allow the `avg()`

function to calculate the latter method, we would have to ensure that *NULL* values are included in the average calculation, and therefore need to convert the null values to a 0 in this case. We can use the `IsNull()`

function to perform this task.

```
SELECT Avg(IsNull(commissionPercent,0)) AS AvgComm
FROM employees;
```

The HAVING clause of the SQL statement can also be added once grouping has taken place. The HAVING clause is used, very similar to the WHERE clause, but for conditions that involve aggregate functions. There are quite often conditions that apply to summation data that can only be implemented after grouping takes place, therefore the `HAVING`

clause is executed after the `GROUP BY`

clause, where the `WHERE`

clause executed before the `GROUP BY`

clause.

- ALWAYS use
`WHERE`

for filter conditions that do not involve aggregate functions - ALWAYS use
`HAVING`

for filter conditions the DO involve aggregate functions

Display the average salary for employees in each job-title in each department

SELECT jobID, departmentID, avg(IsNull(monthlySalary,0)) AS avgSalary FROM employees GROUP BY jobID, departmentID ORDER BY departmentID, jobID;

Now, only include departments 20, 60, 80, and 90

SELECT jobID, departmentID, avg(IsNull(monthlySalary,0)) AS avgSalary FROM employees WHERE departmentID IN (20, 60, 80, 90) GROUP BY jobID, departmentID ORDER BY departmentID, jobID;

Now, include only those where the average is above 15,000

SELECT jobID, departmentID, avg(IsNull(monthlySalary,0)) AS avgSalary FROM employees WHERE department_id IN (20, 60, 80, 90) GROUP BY jobID, departmentID HAVING avg(IsNull(monthlySalary,0)) > 15000 ORDER BY departmentID, jobID;