# Sorting and Grouping Output

We will now discuss techniques to manage the order and grouping of the output.

##### ORDER BY

The `ORDER BY` clause sorts the data after the query has been evaluated. Consider an example:

```SELECT surname, firstname FROM customer
WHERE title='Mr'
AND city = 'Portsea'
ORDER by surname;
```

This query finds all customers who have a `title` Mr and live in Portsea. It then presents the results sorted alphabetically by ascending `surname`:

```+-----------+-----------+
| surname   | firstname |
+-----------+-----------+
| Dalion    | Anthony   |
| Galti     | Jim       |
| Keisling  | Mark      |
| Leramonth | James     |
| Mellili   | Derryn    |
| Mockridge | James     |
| Nancarral | Joshua    |
| Ritterman | James     |
+-----------+-----------+
8 rows in set (0.01 sec)
```

Sorting can be on multiple attributes. For example:

```SELECT surname, firstname, initial FROM customer
WHERE zipcode='3001' OR
zipcode='3000'
ORDER BY surname, firstname, initial;
```

This presents a list of customers in areas with `zipcode='3000'` or `zipcode='3001'`, sorted first by ascending `surname`, then (for those customers with the same surname) by `firstname`, and (for those customers with the same surname and first name), by `initial`. So, for example, the output may be:

```+-----------+-----------+---------+
| surname   | firstname | initial |
+-----------+-----------+---------+
| Keisling  | Belinda   | C       |
| Leramonth | Alexa      | D       |
| Leramonth | Joshua    | H       |
| Leramonth | Joshua    | R       |
| Young     | Bob       | A       |
+-----------+-----------+---------+
5 rows in set (0.11 sec)
```

By default, the `ORDER` `BY` clause sorts in ascending order, or `ASC`. To sort in reverse or descending order, `DESC` can be used. Consider an example:

```SELECT * FROM customer
WHERE city='Melbourne'
ORDER BY surname DESC;
```
##### GROUP BY

The `GROUP` `BY` clause is different from `ORDER` `BY` because it doesn't sort the data for output. Instead, it sorts the data early in the query process, for the purpose of grouping or aggregation. An example shows the difference:

```SELECT city, COUNT(*) FROM customer
GROUP BY city;
```

This query outputs a sorted list of cities and, for each city, the `COUNT` of the number of customers who live in that city. The effect of `COUNT(*)` is to count the number of rows per group. In this example, it doesn't matter what is counted; `COUNT(surname)` has exactly the same result.

Here are the first few lines output by the query:

```+--------------+----------+
| city         | COUNT(*) |
+--------------+----------+
| Alexandra    |       14 |
| Armidale     |        7 |
| Athlone      |        9 |
| Bauple       |        6 |
| Belmont      |       11 |
| Bentley      |       10 |
| Berala       |        9 |
```

The query aggregates or groups all the rows for each city into sets, and the `COUNT(*)` operation counts the number in each set. So, for example, there are 14 customers who live in Alexandra.

The `GROUP` `BY` clause can find different properties of the aggregated rows. Here's an example:

```SELECT city, MAX(salary) FROM customer
GROUP BY city;
```

This query first groups the rows by city and then shows the maximum salary in each city. The first few rows of the output are as follows:

```+-----------+-------------+
| city      | MAX(salary) |
+-----------+-------------+
| Alexandra |      109000 |
| Armidale  |       75000 |
| Athlone   |       84000 |
| Bauple    |       33000 |
```
 The `GROUP` `BY` clause should be used only when the query is designed to find a characteristic of a group of rows, not the details of individual rows.

There are several functions that can be used in aggregation with the `GROUP` `BY` clause. Five particularly useful functions are:

AVG( )

Finds the average value of a numeric attribute in a set

MIN( )

Finds a minimum value of a string or numeric attribute in a set

MAX( )

Finds a maximum value of a string or numeric attribute in a set

SUM( )

Finds the sum total of a numeric attribute

COUNT( )

Counts the number of rows in a set

The SQL standard places a constraint on the `GROUP` `BY` clause that MySQL doesn't enforce. In the standard, all attributes that are selected (i.e., appear after the `SELECT` statement) must appear in the `GROUP` `BY` clause. Most examples in this chapter don't meet this unnecessary constraint of the SQL standard.

##### HAVING

The `HAVING` clause permits conditional aggregation of data into groups. For example, consider the following query:

```SELECT city, count(*), max(salary)
FROM customer
GROUP BY city
HAVING count(*) > 10;
```

The query groups rows by `city`, but only for cities that have more than 10 resident customers. For those groups, the `city`, `count( )` of customers, and maximum `salary` of a customer in that city is output. Cities with less than 10 customers are omitted from the result set. The first few rows of the output are as follows:

```+--------------+----------+-------------+
| city         | count(*) | max(salary) |
+--------------+----------+-------------+
| Alexandra    |       14 |      109000 |
| Belmont      |       11 |       71000 |
| Broadmeadows |       11 |       51000 |
| Doveton      |       13 |       77000 |
| Eleker       |       11 |       97000 |
| Gray         |       12 |       77000 |
```

The `HAVING` clause must contain an attribute or expression from the `SELECT` clause.

 The `HAVING` clause is used exclusively with the `GROUP` `BY` clause. It is slow and should never be used instead of a `WHERE` clause.
##### DISTINCT

The `DISTINCT` operator presents only one example of each row from a query. Consider an example:

```SELECT DISTINCT surname FROM customer;
```

This shows one example of each different customer surname in the customer table. This example has exactly the same effect as:

```SELECT surname FROM customer GROUP BY surname;
```

The `DISTINCT` clause is usually slow to run, much like the `GROUP` `BY` clause. We discuss how indexes and query optimization can speed queries later in this chapter.

# Limiting Result Sets in MySQL

An additional operator is available in MySQL that limits the size of the result sets. For example, the following query returns only the first five rows from the wine table:

```SELECT * FROM wine LIMIT 5;
```

This saves query evaluation time and reduces the size of the result set that must be buffered by the DBMS. The `LIMIT` operator is MySQL-specific.