Choosing keys and indexes for fast searching
Elementary database-tuning techniques
Adding and deleting users of a DBMS, and changing user permissions
Limitations of MySQL
Keys, Primary Keys, and Indexes
As discussed earlier in our introduction to SQL, each table should have a
PRIMARY KEY definition as part of the
CREATE TABLE statement. A primary key is an attribute-or set of attributes-that uniquely identifies a row in a table. Storing two rows with the same primary key isn't permitted and, indeed, an attempt to
INSERT duplicate primary keys produces an error.
In MySQL, the attribute values of the primary key are stored in an index to allow fast access to a row. The default MySQL index type is fast for queries that find a specific row, a range of rows, for joins between tables, grouping data, ordering data, and finding minimum and maximum values. Indexes don't provide any speed improvement for retrieving all the rows in a table or for other query types.
Indexes are also useful for fast access to rows by values other than those that are associated with attributes in the primary key. For example, in the customer table, you might define an index by adding the clause:
TABLE statement. After you define this index, some queries that select a particular customer through a
WHERE clause can use it. Consider an example:
SELECT * FROM customer WHERE surname = 'Marzalla' AND firstname = 'Dimitria' AND city = 'St Albans';
This query can use the new index to locate-in at most a few disk accesses-the row that matches the search criteria. Without the index, the DBMS must scan all the rows in the customer table and compare each row to the
WHERE clause. This might be quite slow and certainly requires significantly more disk accesses than the index-based approach (assuming the table has more than a few rows).
A particular feature of DBMSs is that they develop a query evaluation strategy and optimize it without any interaction from the user or programmer. If an index is available, and it makes sense to use it in the context of a query, the DBMS does this automatically. All you need to do is identify which queries are common, and make an index available for those common queries by adding the
KEY clause to the
CREATE TABLE statement or using
ALTER TABLE on an existing table.
Careful index design is important. The
namecity index we have defined can also speed queries other than those that supply a complete
city. For example, consider a query:
SELECT * FROM customer WHERE surname = 'LaTrobe' AND firstname = 'Anthony';
This query can also use the index
namecity, because the index permits access to rows in sorted order first by
firstname, and then
city. With this sorting, all "LaTrobe, Anthony" index entries are clustered together in the index. Indeed, the index can also be used for the query:
SELECT * FROM customer WHERE surname LIKE 'Mar%';
Again, all surnames beginning with "Mar" are clustered together in the index. However, the index can't be used for a query such as:
SELECT * FROM customer WHERE firstname = 'Dimitria' AND city = 'St Albans';
The index can't be used because the leftmost attribute named in the index,
surname, isn't part of the
WHERE clause. In this case, all rows in the customer table must be scanned and the query is much slower (again assuming there are more than a few rows in the customer table, and assuming there is no other index).
There are other cases in which an index can't be used, such as when a query contains an
OR that isn't on an indexed attribute:
SELECT * FROM customer WHERE surname = 'Marzalla' OR email = '[email protected]';
Again, the customer table must be completely scanned, because the second condition,
email='[email protected]', requires all rows to be retrieved as there is no index available on the attribute
ORed attribute isn't the leftmost attribute in an index requires a complete scan of the customer table. The following example requires a complete scan:
SELECT * FROM customer WHERE firstname = 'Dimitria' OR surname = 'Marzalla';
If all the attributes in the index are used in all the queries, to optimize index size, the leftmost attribute in the
KEY clause should be the attribute with the highest number of duplicate entries.
Because indexes speed up queries, why not create indexes on all the attributes you can possibly search on? The answer is that while indexes are fast for searching, they consume space and require updates each time rows are added or deleted, or key attributes are changed. So, if a database is largely static, additional indexes have low overheads, but if a database changes frequently, each additional index slows the update process significantly. In either case, indexes consume additional space, and unnecessary indexes should be avoided.
One way to reduce the size of an index and speed updates is to create an index on a prefix of an attribute. Our
namecity index uses considerable space: for each row in the customer table, an index entry is up to 120 characters in length because it is created from the combined values of the
city attributes. To reduce space, you can define the index as:
 This isn't the space actually required by an index entry, because the data is compressed for storage. However, even with compression, the fewer characters indexed, the more compact the representation, the more space saved, and-depending on the usability of the index-the faster searching and updates are.
KEY namecity (surname(10),firstname(3),city(2));
This uses only the first 10 characters of
surname, 3 of
firstname, and the first 2 characters of
city to distinguish index entries. This is quite reasonable, because 10 characters from a surname distinguishes between most surnames, and the addition of a few characters from a first name and the prefix of their city should be sufficient to uniquely identify almost all customers. Having a smaller index with less information can also mean that queries are actually faster, because more index information can be retrieved from disk per second, and disk retrieval speed is almost always the bottleneck in query performance.
The space saving is significant with a reduced index. A new index entry requires only 15 characters, a saving of up to 105 characters, so index insertions, deletions, and modifications are now likely to be much faster. Note that for
BLOB attribute types, a prefix must be taken when indexing, because indexing the entire attribute is impractical and isn't permitted by the MySQL DBMS.