As discussed previously, accessing a hard disk is slow and is usually the bottleneck in DBMS performance. More specifically, disk seeking-moving the disk head to get information from another location of the disk-is the slowest component of disk access. Therefore, most techniques described in this section are also techniques that improve performance by minimizing disk space requirements.
 Reducing disk space requirements improves both disk seek and read performance. Disk read performance is improved because less data is required to be transferred, while seek performance is improved because the disk head has to move less on average when randomly accessing a smaller file than when accessing a larger file.
Here are some ways to improve DBMS performance:
Carefully choose attribute types and lengths. Where possible, use small variants such as
MEDIUMINTrather than the regular choice
INT. When using fixed-length attributes, such as
CHAR, specify a length that is as short as practical.
Use fixed-length attributes; that is, try to avoid types such as
BLOB. While fixed-length text attributes may waste space, scanning fixed-length rows in a query is much faster than scanning variable-length rows.
Design indexes with care. As discussed in the last section, keep the primary key index as small as possible, create only indexes that are needed, and use prefixes of attributes where possible. Ensure that the leftmost attribute in the index is the most frequently used in queries and, if all attributes are used, make sure the leftmost attribute is the one with the highest number of duplicate entries.
Create a statistics table if aggregate functions such as
SUM( )are frequently used in queries on large tables. A statistics table stores only one row that is manually updated with the aggregate values of another table. For example, if the statistics table maintains the count of rows in a large customer table, each time a row is inserted or deleted in the customer table, the count is updated in the statistics table. For large tables, this is often faster than calculating aggregate functions with the slow built-in functions that require complete processing of all rows.
If large numbers of rows are deleted from a table, or a table containing variable-length attributes is frequently modified, disk space may be wasted. MySQL doesn't usually remove deleted or modified data; it only marks the location as being no longer in use. Wasted space can affect access speed.
To reorganize a table-by copying data to a temporary location and back again-MySQL provides the
TABLEcommand, which should be used periodically. For example:
OPTIMIZE TABLE customer;
OPTIMIZEcommand should be run when the DBMS is offline for scheduled maintenance. The command is nonstandard SQL.
It is possible to create different table types for specific tasks. The default in MySQL is the
MyISAMtype, and all the tables described so far are this table type. For small, temporary, frequently used lookup tables, a different type, the
heaptable type, can be used. There are other types, and we briefly discuss alternatives in Chapter 6. More details are provided in Section 9.4 of the MySQL user manual.
Section 10.7 of the MySQL manual includes other excellent ideas for simple performance improvement.
Another aspect of database tuning is optimizing the performance of the DBMS itself. Included with the MySQL installation is the mysqladmin tool for database administration. Details of the system setup can be found by running the following command from a Linux shell:
% mysqladmin -p
This shows, in part, the following selected system parameters:
join_buffer current value: 131072 key_buffer current value: 8388600 net_buffer_length current value: 16384 record_buffer current value: 131072 sort_buffer current value: 2097144 table_cache current value: 64
The important parameters are those that impact disk use. MySQL has several main-memory buffer parameters that control how much data is kept in memory for processing. These include:
record_bufferfor scanning all rows in a table
key_bufferfor storing indexes in main memory
join_bufferfor joins that don't use indexes
In general, the larger these buffers, the more data from disk is cached or stored in memory and the fewer disk accesses are required. However, if the sum of these parameters is near to exceeding the size of the memory installed in the server, the underlying operating system will start to swap data between disk and memory, and the DBMS will be slow. In any case, careful experimentation based on the application is likely to improve DBMS performance.
Section 10.2.3 of the MySQL manual suggests parameter settings when starting the MySQL server. First, for machines with at least 64 MB of memory, large tables in the DBMS, and a moderate number of users, use:
safe_mysqld -O key_buffer=16M -O table_cache=128 \ -O sort_buffer=4M -O record_buffer=1M &
Second, if there is less than 64 MB of memory available, and there are many users, try the following:
safe_mysqld -O key_buffer=512k -O sort_buffer=100k \ -O record_buffer=100k &
The following setting might be appropriate for the winestore, because many users are expected, the queries are largely index-based, and the database is small:
safe_mysqld -O key_buffer=512k -O sort_buffer=16k \ -O table_cache=32 -O record_buffer=8k -O net_buffer=1K &
Even more conservative settings might also be acceptable.
There are two other parameters we have not discussed. The
table_cache parameter manages the maximum number of open tables per user connection, while the
net_buffer parameter sets the minimum size of the network query buffer in which incoming queries are kept before they are executed.
The mysqladmin utility can report the status of the DBMS:
% mysqladmin -p
The output has the following format:
Uptime: 5721024 Threads: 14 Questions: 7874982 Slow queries: 6 Opens: 115136 Flush tables: 1 Open tables: 62
This gives a brief point-in-time summary of the DBMS status and can help find more about the number of user connections, queries, and table use. Similar output can be generated by running the commands
SHOW STATUS and
SHOW VARIABLES through the MySQL command interpreter.
Information about query performance can be gained with the
benchmark( ) function, which can be used iteratively for tuning when altering table design or DBMS system parameters. The following statement illustrates benchmarking:
SELECT benchmark(10000, COUNT(*)) FROM items;
This statement reports the time taken to evaluate 10,000 calls to
COUNT( ) on the items table.