Cari di MySQL 
    MySQL Manual
Daftar Isi
(Sebelumnya) 7. Backup and Recovery8.5. Optimizing for InnoDB Tables (Berikutnya)

Chapter 8. Optimization

Daftar Isi

8.1. Optimization Overview
8.2. Optimizing SQL Statements
8.2.1. Optimizing SELECT Statements
8.2.2. Optimizing DML Statements
8.2.3. Optimizing Database Privileges
8.2.4. Optimizing INFORMATION_SCHEMA Queries
8.2.5. Other Optimization Tips
8.3. Optimization and Indexes
8.3.1. How MySQL Uses Indexes
8.3.2. Using Primary Keys
8.3.3. Using Foreign Keys
8.3.4. Column Indexes
8.3.5. Multiple-Column Indexes
8.3.6. Verifying Index Usage
8.3.7. InnoDB and MyISAM Index Statistics Collection
8.3.8. Comparison of B-Tree and Hash Indexes
8.4. Optimizing Database Structure
8.4.1. Optimizing Data Size
8.4.2. Optimizing MySQL Data Types
8.4.3. Optimizing for Many Tables
8.5. Optimizing for InnoDB Tables
8.5.1. Optimizing Storage Layout for InnoDB Tables
8.5.2. Optimizing InnoDB Transaction Management
8.5.3. Optimizing InnoDB Logging
8.5.4. Bulk Data Loading for InnoDB Tables
8.5.5. Optimizing InnoDB Queries
8.5.6. Optimizing InnoDB DDL Operations
8.5.7. Optimizing InnoDB Disk I/O
8.5.8. Optimizing InnoDB Configuration Variables
8.5.9. Optimizing InnoDB for Systems with Many Tables
8.6. Optimizing for MyISAM Tables
8.6.1. Optimizing MyISAM Queries
8.6.2. Bulk Data Loading for MyISAM Tables
8.6.3. Speed of REPAIR TABLE Statements
8.7. Optimizing for MEMORY Tables
8.8. Understanding the Query Execution Plan
8.8.1. Optimizing Queries with EXPLAIN
8.8.2. EXPLAIN Output Format
8.8.3. EXPLAIN EXTENDED Output Format
8.8.4. Estimating Query Performance
8.8.5. Controlling the Query Optimizer
8.9. Buffering and Caching
8.9.1. The InnoDB Buffer Pool
8.9.2. The MyISAM Key Cache
8.9.3. The MySQL Query Cache
8.10. Optimizing Locking Operations
8.10.1. Internal Locking Methods
8.10.2. Table Locking Issues
8.10.3. Concurrent Inserts
8.10.4. Metadata Locking Within Transactions
8.10.5. External Locking
8.11. Optimizing the MySQL Server
8.11.1. System Factors and Startup Parameter Tuning
8.11.2. Tuning Server Parameters
8.11.3. Optimizing Disk I/O
8.11.4. Optimizing Memory Use
8.11.5. Optimizing Network Use
8.11.6. The Thread Pool Plugin
8.12. Measuring Performance (Benchmarking)
8.12.1. Measuring the Speed of Expressions and Functions
8.12.2. The MySQL Benchmark Suite
8.12.3. Using Your Own Benchmarks
8.12.4. Measuring Performance with performance_schema
8.12.5. Examining Thread Information
8.13. Internal Details of MySQL Optimizations
8.13.1. Range Optimization
8.13.2. Index Merge Optimization
8.13.3. Engine Condition Pushdown Optimization
8.13.4. IS NULL Optimization
8.13.5. LEFT JOIN and RIGHT JOINOptimization
8.13.6. Nested-Loop Join Algorithms
8.13.7. Nested Join Optimization
8.13.8. Outer Join Simplification
8.13.9. ORDER BY Optimization
8.13.10. GROUP BY Optimization
8.13.11. DISTINCT Optimization
8.13.12. Optimizing Subqueries with EXISTS Strategy

This chapter explains how to optimize MySQL performance and provides examples. Optimization involves configuring, tuning, and measuring performance, at several levels. Depending on your job role (developer, DBA, or a combination of both), you might optimize at the level of individual SQL statements, entire applications, a single database server, or multiple networked database servers. Sometimes you can be proactive and plan in advance for performance, while other times you might troubleshoot a configuration or code issue after a problem occurs. Optimizing CPU and memory usage can also improve scalability, allowing the database to handle more load without slowing down.

8.1. Optimization Overview

Database performance depends on several factors at the database level, such as tables, queries, and configuration settings. These software constructs result in CPU and I/O operations at the hardware level, which you must minimize and make as efficient as possible. As you work on database performance, you start by learning the high-level rules and guidelines for the software side, and measuring performance using wall-clock time. As you become an expert, you learn more about what happens internally, and start measuring things such as CPU cycles and I/O operations.

Typical users aim to get the best database performance out of their existing software and hardware configurations. Advanced users look for opportunities to improve the MySQL software itself, or develop their own storage engines and hardware appliances to expand the MySQL ecosystem.

Optimizing at the Database Level

The most important factor in making a database application fast is its basic design:

  • Are the tables structured properly? In particular, do the columns have the right data types, and does each table have the appropriate columns for the type of work? For example, applications that perform frequent updates often have many tables with few columns, while applications that analyze large amounts of data often have few tables with many columns.

  • Are the right indexes in place to make queries efficient?

  • Are you using the appropriate storage engine for each table, and taking advantage of the strengths and features of each storage engine you use? In particular, the choice of a transactional storage engine such as InnoDB or a non-transactional one such as MyISAM can be very important for performance and scalability.

    Note

    In MySQL 5.5 and higher, InnoDB is the default storage engine for new tables. In practice, the advanced InnoDB performance features mean that InnoDB tables often outperform the simpler MyISAM tables, especially for a busy database.

  • Does each table use an appropriate row format? This choice also depends on the storage engine used for the table. In particular, compressed tables use less disk space and so require less disk I/O to read and write the data. Compression is available for all kinds of workloads with InnoDB tables, and for read-only MyISAM tables.

  • Does the application use an appropriate locking strategy? For example, by allowing shared access when possible so that database operations can run concurrently, and requesting exclusive access when appropriate so that critical operations get top priority. Again, the choice of storage engine is significant. The InnoDB storage engine handles most locking issues without involvement from you, allowing for better concurrency in the database and reducing the amount of experimentation and tuning for your code.

  • Are all memory areas used for caching sized correctly? That is, large enough to hold frequently accessed data, but not so large that they overload physical memory and cause paging. The main memory areas to configure are the InnoDB buffer pool, the MyISAM key cache, and the MySQL query cache.

Optimizing at the Hardware Level

Any database application eventually hits hardware limits as the database becomes more and more busy. A DBA must evaluate whether it is possible to tune the application or reconfigure the server to avoid these bottlenecks, or whether more hardware resources are required. System bottlenecks typically arise from these sources:

  • Disk seeks. It takes time for the disk to find a piece of data. With modern disks, the mean time for this is usually lower than 10ms, so we can in theory do about 100 seeks a second. This time improves slowly with new disks and is very hard to optimize for a single table. The way to optimize seek time is to distribute the data onto more than one disk.

  • Disk reading and writing. When the disk is at the correct position, we need to read or write the data. With modern disks, one disk delivers at least 10�20MB/s throughput. This is easier to optimize than seeks because you can read in parallel from multiple disks.

  • CPU cycles. When the data is in main memory, we must process it to get our result. Having small tables compared to the amount of memory is the most common limiting factor. But with small tables, speed is usually not the problem.

  • Memory bandwidth. When the CPU needs more data than can fit in the CPU cache, main memory bandwidth becomes a bottleneck. This is an uncommon bottleneck for most systems, but one to be aware of.

Balancing Portability and Performance

To use performance-oriented SQL extensions in a portable MySQL program, you can wrap MySQL-specific keywords in a statement within /*! */ comment delimiters. Other SQL servers ignore the commented keywords. For information about writing comments, see Section 9.6, "Comment Syntax".

8.2. Optimizing SQL Statements

The core logic of a database application is performed through SQL statements, whether issued directly through an interpreter or submitted behind the scenes through an API. The tuning guidelines in this section help to speed up all kinds of MySQL applications. The guidelines cover SQL operations that read and write data, the behind-the-scenes overhead for SQL operations in general, and operations used in specific scenarios such as database monitoring.

8.2.1. Optimizing SELECT Statements

Queries, in the form of SELECT statements, perform all the lookup operations in the database. Tuning these statements is a top priority, whether to achieve sub-second response times for dynamic web pages, or to chop hours off the time to generate huge overnight reports.

Besides SELECT statements, the tuning techniques for queries also apply to constructs such as CREATE TABLE...AS SELECT, INSERT INTO...SELECT, and WHERE clauses in DELETE statements. Those statements have additional performance considerations because they combine write operations with the read-oriented query operations.

The main considerations for optimizing queries are:

  • Set up indexes on columns used in the WHERE clause, to speed up evaluation, filtering, and the final retrieval of results. To avoid wasted disk space, construct a small set of indexes that speed up many related queries used in your application.

  • Minimize the number of table scans in your queries, particularly for big tables.

  • Keep table statistics up to date, so the optimizer has the information needed to construct an efficient execution plan.

  • Learn the tuning techniques, indexing techniques, and configuration parameters that are specific to the storage engine for each table. Both InnoDB and MyISAM have sets of guidelines for enabling and sustaining high performance in queries.

  • Isolate and tune any part of the query, such as a function call, that takes excessive time. Remember that such a function might get called millions of times for a big query or insert operation.

  • Avoid transforming the query in ways that make it hard to understand, especially if the optimizer does some of the same transformations automatically.

  • If a performance issue is not easily solved by one of the basic guidelines, investigate the internal details of the specific query by reading the EXPLAIN plan and adjusting your indexes, WHERE clauses, join clauses, and so on. (When you reach a certain level of expertise, reading the EXPLAIN plan might be your first step for every query.)

  • Adjust the size and properties of the memory areas that MySQL uses for caching. With efficient use of the InnoDB buffer pool, MyISAM key cache, and the MySQL query cache, repeated queries run faster because the results are retrieved from memory the second and subsequent times.

  • Even for a query that runs fast using the cache memory areas, you might still optimize further so that they require less cache memory, making your application more scalable. Scalability means that your application can handle more simultaneous users, larger requests, and so on without experiencing a big drop in performance.

  • Deal with locking issues, where the speed of your query might be affected by other sessions accessing the tables at the same time.

8.2.1.1. Speed of SELECT Statements

In general, when you want to make a slow SELECT ... WHERE query faster, the first thing to check is whether you can add an index. Indexes are especially important for queries that reference different tables, using features such as joins and foreign keys. You can use the EXPLAIN statement to determine which indexes are used for a SELECT. See Section 8.8.1, "Optimizing Queries with EXPLAIN", and Section 8.3.1, "How MySQL Uses Indexes".

8.2.1.2. How MySQL Optimizes WHERE Clauses

This section discusses optimizations that can be made for processing WHERE clauses. The examples use SELECT statements, but the same optimizations apply for WHERE clauses in DELETE and UPDATE statements.

Some examples of queries that are very fast:

SELECT COUNT(*) FROM tbl_name;SELECT MIN(key_part1),MAX(key_part1) FROM tbl_name;SELECT MAX(key_part2) FROM tbl_name  WHERE key_part1=constant;SELECT ... FROM tbl_name  ORDER BY key_part1,key_part2,... LIMIT 10;SELECT ... FROM tbl_name  ORDER BY key_part1 DESC, key_part2 DESC, ... LIMIT 10;

MySQL resolves the following queries using only the entries from a secondary index, if the indexed columns are numeric:

SELECT key_part1,key_part2 FROM tbl_name WHERE key_part1=val;SELECT COUNT(*) FROM tbl_name  WHERE key_part1=val1 AND key_part2=val2;SELECT key_part2 FROM tbl_name GROUP BY key_part1;

The following queries use the index data to retrieve the rows in sorted order without a separate sorting pass:

SELECT ... FROM tbl_name  ORDER BY key_part1,key_part2,... ;SELECT ... FROM tbl_name  ORDER BY key_part1 DESC, key_part2 DESC, ... ;

You might be tempted to rewrite your queries to make arithmetic operations faster, while sacrificing readability. Because MySQL does similar optimizations automatically, you can often avoid this work, and leave the query in a more understandable and maintainable form. Some of the optimizations performed by MySQL follow:

Note

Because work on the MySQL optimizer is ongoing, not all of the optimizations that MySQL performs are documented here.

  • Removal of unnecessary parentheses:

       ((a AND b) AND c OR (((a AND b) AND (c AND d))))-> (a AND b AND c) OR (a AND b AND c AND d)
  • Constant folding:

       (a<b AND b=c) AND a=5-> b>5 AND b=c AND a=5
  • Constant condition removal (needed because of constant folding):

       (B>=5 AND B=5) OR (B=6 AND 5=5) OR (B=7 AND 5=6)-> B=5 OR B=6
  • Constant expressions used by indexes are evaluated only once.

  • COUNT(*) on a single table without a WHERE is retrieved directly from the table information for MyISAM and MEMORY tables. This is also done for any NOT NULL expression when used with only one table.

  • Early detection of invalid constant expressions. MySQL quickly detects that some SELECT statements are impossible and returns no rows.

  • HAVING is merged with WHERE if you do not use GROUP BY or aggregate functions (COUNT(), MIN(), and so on).

  • For each table in a join, a simpler WHERE is constructed to get a fast WHERE evaluation for the table and also to skip rows as soon as possible.

  • All constant tables are read first before any other tables in the query. A constant table is any of the following:

    • An empty table or a table with one row.

    • A table that is used with a WHERE clause on a PRIMARY KEY or a UNIQUE index, where all index parts are compared to constant expressions and are defined as NOT NULL.

    All of the following tables are used as constant tables:

    SELECT * FROM t WHERE primary_key=1;SELECT * FROM t1,t2  WHERE t1.primary_key=1 AND t2.primary_key=t1.id;
  • The best join combination for joining the tables is found by trying all possibilities. If all columns in ORDER BY and GROUP BY clauses come from the same table, that table is preferred first when joining.

  • If there is an ORDER BY clause and a different GROUP BY clause, or if the ORDER BY or GROUP BY contains columns from tables other than the first table in the join queue, a temporary table is created.

  • If you use the SQL_SMALL_RESULT option, MySQL uses an in-memory temporary table.

  • Each table index is queried, and the best index is used unless the optimizer believes that it is more efficient to use a table scan. At one time, a scan was used based on whether the best index spanned more than 30% of the table, but a fixed percentage no longer determines the choice between using an index or a scan. The optimizer now is more complex and bases its estimate on additional factors such as table size, number of rows, and I/O block size.

  • MySQL can sometimes produce query results using data from the index, without consulting the table data. If all columns used from the index are numeric, only the index data is used to resolve the query.

  • Before each row is output, those that do not match the HAVING clause are skipped.

8.2.1.3. Optimizing LIMIT Queries

If you need only a specified number of rows from a result set, use a LIMIT clause in the query, rather than fetching the whole result set and throwing away the extra data.

MySQL sometimes optimizes a query that has a LIMIT row_count clause and no HAVING clause:

  • If you select only a few rows with LIMIT, MySQL uses indexes in some cases when normally it would prefer to do a full table scan.

  • If you use LIMIT row_count with ORDER BY, MySQL ends the sorting as soon as it has found the first row_count rows of the sorted result, rather than sorting the entire result. If ordering is done by using an index, this is very fast. If a filesort must be done, all rows that match the query without the LIMIT clause are selected, and most or all of them are sorted, before the first row_count are found. After the initial rows have been found, MySQL does not sort any remainder of the result set.

  • When combining LIMIT row_count with DISTINCT, MySQL stops as soon as it finds row_count unique rows.

  • In some cases, a GROUP BY can be resolved by reading the key in order (or doing a sort on the key) and then calculating summaries until the key value changes. In this case, LIMIT row_count does not calculate any unnecessary GROUP BY values.

  • As soon as MySQL has sent the required number of rows to the client, it aborts the query unless you are using SQL_CALC_FOUND_ROWS.

  • LIMIT 0 quickly returns an empty set. This can be useful for checking the validity of a query. When using one of the MySQL APIs, it can also be employed for obtaining the types of the result columns. (This trick does not work in the MySQL Monitor (the mysql program), which merely displays Empty set in such cases; instead, use SHOW COLUMNS or DESCRIBE for this purpose.)

  • When the server uses temporary tables to resolve the query, it uses the LIMIT row_count clause to calculate how much space is required.

8.2.1.4. How to Avoid Full Table Scans

Frequently, a full table scan is a danger sign that a query can be speeded up significantly. For tables with more than a few rows, consider redesigning the query by adding an index for one or more of the columns tested in the WHERE clause. Put extra effort into avoiding table scans for queries that perform joins or reference foreign keys. If the nature of the data means there is no way to avoid reading all the rows, then it might not be practical to make the query faster, or making it faster might involve extensive restructuring of your tables that is beyond the scope of this section.

The output from EXPLAIN shows ALL in the type column when MySQL uses a table scan to resolve a query. This usually happens under the following conditions:

  • The ON or WHERE clauses do not reference any indexed columns that the query can use. Consider adding an index, or refining those clauses to refer to an indexed column.

  • The table is so small that it is faster to perform a table scan than to bother with a key lookup. This is common for tables with fewer than 10 rows and a short row length. Don't worry in this case.

  • You are comparing indexed columns with constant values and MySQL has calculated (based on the index tree) that the constants cover too large a part of the table and that a table scan would be faster. See Section 8.2.1.2, "How MySQL Optimizes WHERE Clauses". For example, to query census data only for males or only for females, MySQL must read most of the data blocks in the table, so locating the rows through the index would add unnecessary overhead. Don't worry if you encounter this condition for occasional big reports. If these reports are frequent or truly time-critical, and the table is huge, you might partition, shard, or create dimension tables using the relevant column.

  • You are using a key with low cardinality (many rows match the key value) through another column. In this case, MySQL assumes that by using the key it probably will do many key lookups and that a table scan would be faster.

For small tables, a table scan often is appropriate and the performance impact is negligible. For large tables, try the following techniques to avoid having the optimizer incorrectly choose a table scan:

  • Minimize the OR keywords in your WHERE clauses. If there is no index that helps to locate the values on both sides of the OR, any row could potentially be part of the result set, so all rows must be tested, and that requires a full table scan. If you have one index that helps to optimize one side of an OR query, and a different index that helps to optimize the other side, use a UNION operator to run separate fast queries and merge the results afterward.

  • With tables that use the MEMORY storage engine, if you run queries that examine ranges of values (using operators such as >, <=, or BETWEEN on the indexed columns), create the index with the USING BTREE clause. The default (USING HASH) is fast for retrieving individual rows with an equality operator (= or <=>), but is much slower (requiring a full table scan) to examine a range of column values. A MEMORY table created with the USING BTREE clause is still fast for equality comparisons, so use that clause for your MEMORY tables that handle a variety of queries.

  • Use ANALYZE TABLE tbl_name to update the key distributions for the scanned table. See Section 13.7.2.1, "ANALYZE TABLE Syntax".

  • Use FORCE INDEX for the scanned table to tell MySQL that table scans are very expensive compared to using the given index:

    SELECT * FROM t1, t2 FORCE INDEX (index_for_column)  WHERE t1.col_name=t2.col_name;

    See Section 13.2.9.3, "Index Hint Syntax".

  • Start mysqld with the --max-seeks-for-key=1000 option or use SET max_seeks_for_key=1000 to tell the optimizer to assume that no key scan causes more than 1,000 key seeks. See Section 5.1.4, "Server System Variables".

8.2.2. Optimizing DML Statements

This section explains how to speed up the data manipulation language (DML) statements, INSERT, UPDATE, and DELETE. Traditional OLTP applications and modern web applications typically do many small DML operations, where concurrency is vital. Data analysis and reporting applications typically run DML operations that affect many rows at once, where the main considerations is the I/O to write large amounts of data and keep indexes up-to-date. For inserting and updating large volumes of data (known in the industry as ETL, for "extract-transform-load"), sometimes you use other SQL statements or external commands, that mimic the effects of INSERT, UPDATE, and DELETE statements.

8.2.2.1. Speed of INSERT Statements

To optimize insert speed, combine many small operations into a single large operation. Ideally, you make a single connection, send the data for many new rows at once, and delay all index updates and consistency checking until the very end.

The time required for inserting a row is determined by the following factors, where the numbers indicate approximate proportions:

  • Connecting: (3)

  • Sending query to server: (2)

  • Parsing query: (2)

  • Inserting row: (1 � size of row)

  • Inserting indexes: (1 � number of indexes)

  • Closing: (1)

This does not take into consideration the initial overhead to open tables, which is done once for each concurrently running query.

The size of the table slows down the insertion of indexes by log N, assuming B-tree indexes.

You can use the following methods to speed up inserts:

8.2.2.2. Speed of UPDATE Statements

An update statement is optimized like a SELECT query with the additional overhead of a write. The speed of the write depends on the amount of data being updated and the number of indexes that are updated. Indexes that are not changed do not get updated.

Another way to get fast updates is to delay updates and then do many updates in a row later. Performing multiple updates together is much quicker than doing one at a time if you lock the table.

For a MyISAM table that uses dynamic row format, updating a row to a longer total length may split the row. If you do this often, it is very important to use OPTIMIZE TABLE occasionally. See Section 13.7.2.4, "OPTIMIZE TABLE Syntax".

8.2.2.3. Speed of DELETE Statements

The time required to delete individual rows in a MyISAM table is exactly proportional to the number of indexes. To delete rows more quickly, you can increase the size of the key cache by increasing the key_buffer_size system variable. See Section 8.11.2, "Tuning Server Parameters".

To delete all rows from a MyISAM table, TRUNCATE TABLE tbl_name is faster than than DELETE FROM tbl_name. Truncate operations are not transaction-safe; an error occurs when attempting one in the course of an active transaction or active table lock. See Section 13.1.33, "TRUNCATE TABLE Syntax".

8.2.3. Optimizing Database Privileges

The more complex your privilege setup, the more overhead applies to all SQL statements. Simplifying the privileges established by GRANT statements enables MySQL to reduce permission-checking overhead when clients execute statements. For example, if you do not grant any table-level or column-level privileges, the server need not ever check the contents of the tables_priv and columns_priv tables. Similarly, if you place no resource limits on any accounts, the server does not have to perform resource counting. If you have a very high statement-processing load, consider using a simplified grant structure to reduce permission-checking overhead.

8.2.4. Optimizing INFORMATION_SCHEMA Queries

Applications that monitor the database can make frequent use of the INFORMATION_SCHEMA tables. Certain types of queries for INFORMATION_SCHEMA tables can be optimized to execute more quickly. The goal is to minimize file operations (for example, scanning a directory or opening a table file) to collect the information that makes up these dynamic tables. These optimizations do have an effect on how collations are used for searches in INFORMATION_SCHEMA tables. For more information, see Section 10.1.7.9, "Collation and INFORMATION_SCHEMA Searches".

1) Try to use constant lookup values for database and table names in the WHERE clause

You can take advantage of this principle as follows:

  • To look up databases or tables, use expressions that evaluate to a constant, such as literal values, functions that return a constant, or scalar subqueries.

  • Avoid queries that use a nonconstant database name lookup value (or no lookup value) because they require a scan of the data directory to find matching database directory names.

  • Within a database, avoid queries that use a nonconstant table name lookup value (or no lookup value) because they require a scan of the database directory to find matching table files.

This principle applies to the INFORMATION_SCHEMA tables shown in the following table, which shows the columns for which a constant lookup value enables the server to avoid a directory scan. For example, if you are selecting from TABLES, using a constant lookup value for TABLE_SCHEMA in the WHERE clause enables a data directory scan to be avoided.

TableColumn to specify to avoid data directory scanColumn to specify to avoid database directory scan
COLUMNSTABLE_SCHEMATABLE_NAME
KEY_COLUMN_USAGETABLE_SCHEMATABLE_NAME
PARTITIONSTABLE_SCHEMATABLE_NAME
REFERENTIAL_CONSTRAINTSCONSTRAINT_SCHEMATABLE_NAME
STATISTICSTABLE_SCHEMATABLE_NAME
TABLESTABLE_SCHEMATABLE_NAME
TABLE_CONSTRAINTSTABLE_SCHEMATABLE_NAME
TRIGGERSEVENT_OBJECT_SCHEMAEVENT_OBJECT_TABLE
VIEWSTABLE_SCHEMATABLE_NAME

The benefit of a query that is limited to a specific constant database name is that checks need be made only for the named database directory. Example:

SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLESWHERE TABLE_SCHEMA = 'test';

Use of the literal database name test enables the server to check only the test database directory, regardless of how many databases there might be. By contrast, the following query is less efficient because it requires a scan of the data directory to determine which database names match the pattern 'test%':

SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLESWHERE TABLE_SCHEMA LIKE 'test%';

For a query that is limited to a specific constant table name, checks need be made only for the named table within the corresponding database directory. Example:

SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLESWHERE TABLE_SCHEMA = 'test' AND TABLE_NAME = 't1';

Use of the literal table name t1 enables the server to check only the files for the t1 table, regardless of how many tables there might be in the test database. By contrast, the following query requires a scan of the test database directory to determine which table names match the pattern 't%':

SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLESWHERE TABLE_SCHEMA = 'test' AND TABLE_NAME LIKE 't%';

The following query requires a scan of the database directory to determine matching database names for the pattern 'test%', and for each matching database, it requires a scan of the database directory to determine matching table names for the pattern 't%':

SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLESWHERE TABLE_SCHEMA = 'test%' AND TABLE_NAME LIKE 't%';

2) Write queries that minimize the number of table files that must be opened

For queries that refer to certain INFORMATION_SCHEMA table columns, several optimizations are available that minimize the number of table files that must be opened. Example:

SELECT TABLE_NAME, ENGINE FROM INFORMATION_SCHEMA.TABLESWHERE TABLE_SCHEMA = 'test';

In this case, after the server has scanned the database directory to determine the names of the tables in the database, those names become available with no further file system lookups. Thus, TABLE_NAME requires no files to be opened. The ENGINE (storage engine) value can be determined by opening the table's .frm file, without touching other table files such as the .MYD or .MYI file.

Some values, such as INDEX_LENGTH for MyISAM tables, require opening the .MYD or .MYI file as well.

The file-opening optimization types are denoted thus:

  • SKIP_OPEN_TABLE: Table files do not need to be opened. The information has already become available within the query by scanning the database directory.

  • OPEN_FRM_ONLY: Only the table's .frm file need be opened.

  • OPEN_TRIGGER_ONLY: Only the table's .TRG file need be opened.

  • OPEN_FULL_TABLE: The unoptimized information lookup. The .frm, .MYD, and .MYI files must be opened.

The following list indicates how the preceding optimization types apply to INFORMATION_SCHEMA table columns. For tables and columns not named, none of the optimizations apply.

  • COLUMNS: OPEN_FRM_ONLY applies to all columns

  • KEY_COLUMN_USAGE: OPEN_FULL_TABLE applies to all columns

  • PARTITIONS: OPEN_FULL_TABLE applies to all columns

  • REFERENTIAL_CONSTRAINTS: OPEN_FULL_TABLE applies to all columns

  • STATISTICS:

    ColumnOptimization type
    TABLE_CATALOGOPEN_FRM_ONLY
    TABLE_SCHEMAOPEN_FRM_ONLY
    TABLE_NAMEOPEN_FRM_ONLY
    NON_UNIQUEOPEN_FRM_ONLY
    INDEX_SCHEMAOPEN_FRM_ONLY
    INDEX_NAMEOPEN_FRM_ONLY
    SEQ_IN_INDEXOPEN_FRM_ONLY
    COLUMN_NAMEOPEN_FRM_ONLY
    COLLATIONOPEN_FRM_ONLY
    CARDINALITYOPEN_FULL_TABLE
    SUB_PARTOPEN_FRM_ONLY
    PACKEDOPEN_FRM_ONLY
    NULLABLEOPEN_FRM_ONLY
    INDEX_TYPEOPEN_FULL_TABLE
    COMMENTOPEN_FRM_ONLY
  • TABLES:

    ColumnOptimization type
    TABLE_CATALOGSKIP_OPEN_TABLE
    TABLE_SCHEMASKIP_OPEN_TABLE
    TABLE_NAMESKIP_OPEN_TABLE
    TABLE_TYPEOPEN_FRM_ONLY
    ENGINEOPEN_FRM_ONLY
    VERSIONOPEN_FRM_ONLY
    ROW_FORMATOPEN_FULL_TABLE
    TABLE_ROWSOPEN_FULL_TABLE
    AVG_ROW_LENGTHOPEN_FULL_TABLE
    DATA_LENGTHOPEN_FULL_TABLE
    MAX_DATA_LENGTHOPEN_FULL_TABLE
    INDEX_LENGTHOPEN_FULL_TABLE
    DATA_FREEOPEN_FULL_TABLE
    AUTO_INCREMENTOPEN_FULL_TABLE
    CREATE_TIMEOPEN_FULL_TABLE
    UPDATE_TIMEOPEN_FULL_TABLE
    CHECK_TIMEOPEN_FULL_TABLE
    TABLE_COLLATIONOPEN_FRM_ONLY
    CHECKSUMOPEN_FULL_TABLE
    CREATE_OPTIONSOPEN_FRM_ONLY
    TABLE_COMMENTOPEN_FRM_ONLY
  • TABLE_CONSTRAINTS: OPEN_FULL_TABLE applies to all columns

  • TRIGGERS: OPEN_TRIGGER_ONLY applies to all columns

  • VIEWS:

    ColumnOptimization type
    TABLE_CATALOGOPEN_FRM_ONLY
    TABLE_SCHEMAOPEN_FRM_ONLY
    TABLE_NAMEOPEN_FRM_ONLY
    VIEW_DEFINITIONOPEN_FRM_ONLY
    CHECK_OPTIONOPEN_FRM_ONLY
    IS_UPDATABLEOPEN_FULL_TABLE
    DEFINEROPEN_FRM_ONLY
    SECURITY_TYPEOPEN_FRM_ONLY
    CHARACTER_SET_CLIENTOPEN_FRM_ONLY
    COLLATION_CONNECTIONOPEN_FRM_ONLY

3) Use EXPLAIN to determine whether the server can use INFORMATION_SCHEMA optimizations for a query

This applies particularly for INFORMATION_SCHEMA queries that search for information from more than one database, which might take a long time and impact performance. The Extra value in EXPLAIN output indicates which, if any, of the optimizations described earlier the server can use to evaluate INFORMATION_SCHEMA queries. The following examples demonstrate the kinds of information you can expect to see in the Extra value.

mysql> EXPLAIN SELECT TABLE_NAME FROM INFORMATION_SCHEMA.VIEWS WHERE -> TABLE_SCHEMA = 'test' AND TABLE_NAME = 'v1'\G*************************** 1. row ***************************   id: 1  select_type: SIMPLE table: VIEWS type: ALLpossible_keys: NULL  key: TABLE_SCHEMA,TABLE_NAME  key_len: NULL  ref: NULL rows: NULL Extra: Using where; Open_frm_only; Scanned 0 databases

Use of constant database and table lookup values enables the server to avoid directory scans. For references to VIEWS.TABLE_NAME, only the .frm file need be opened.

mysql> EXPLAIN SELECT TABLE_NAME, ROW_FORMAT FROM INFORMATION_SCHEMA.TABLES\G*************************** 1. row ***************************   id: 1  select_type: SIMPLE table: TABLES type: ALLpossible_keys: NULL  key: NULL  key_len: NULL  ref: NULL rows: NULL Extra: Open_full_table; Scanned all databases

No lookup values are provided (there is no WHERE clause), so the server must scan the data directory and each database directory. For each table thus identified, the table name and row format are selected. TABLE_NAME requires no further table files to be opened (the SKIP_OPEN_TABLE optimization applies). ROW_FORMAT requires all table files to be opened (OPEN_FULL_TABLE applies). EXPLAIN reports OPEN_FULL_TABLE because it is more expensive than SKIP_OPEN_TABLE.

mysql> EXPLAIN SELECT TABLE_NAME, TABLE_TYPE FROM INFORMATION_SCHEMA.TABLES -> WHERE TABLE_SCHEMA = 'test'\G*************************** 1. row ***************************   id: 1  select_type: SIMPLE table: TABLES type: ALLpossible_keys: NULL  key: TABLE_SCHEMA  key_len: NULL  ref: NULL rows: NULL Extra: Using where; Open_frm_only; Scanned 1 database

No table name lookup value is provided, so the server must scan the test database directory. For the TABLE_NAME and TABLE_TYPE columns, the SKIP_OPEN_TABLE and OPEN_FRM_ONLY optimizations apply, respectively. EXPLAIN reports OPEN_FRM_ONLY because it is more expensive.

mysql> EXPLAIN SELECT B.TABLE_NAME -> FROM INFORMATION_SCHEMA.TABLES AS A, INFORMATION_SCHEMA.COLUMNS AS B -> WHERE A.TABLE_SCHEMA = 'test' -> AND A.TABLE_NAME = 't1' -> AND B.TABLE_NAME = A.TABLE_NAME\G*************************** 1. row ***************************   id: 1  select_type: SIMPLE table: A type: ALLpossible_keys: NULL  key: TABLE_SCHEMA,TABLE_NAME  key_len: NULL  ref: NULL rows: NULL Extra: Using where; Skip_open_table; Scanned 0 databases*************************** 2. row ***************************   id: 1  select_type: SIMPLE table: B type: ALLpossible_keys: NULL  key: NULL  key_len: NULL  ref: NULL rows: NULL Extra: Using where; Open_frm_only; Scanned all databases;   Using join buffer

For the first EXPLAIN output row: Constant database and table lookup values enable the server to avoid directory scans for TABLES values. References to TABLES.TABLE_NAME require no further table files.

For the second EXPLAIN output row: All COLUMNS table values are OPEN_FRM_ONLY lookups, so COLUMNS.TABLE_NAME requires the .frm file to be opened.

mysql> EXPLAIN SELECT * FROM INFORMATION_SCHEMA.COLLATIONS\G*************************** 1. row ***************************   id: 1  select_type: SIMPLE table: COLLATIONS type: ALLpossible_keys: NULL  key: NULL  key_len: NULL  ref: NULL rows: NULL Extra:

In this case, no optimizations apply because COLLATIONS is not one of the INFORMATION_SCHEMA tables for which optimizations are available.

8.2.5. Other Optimization Tips

This section lists a number of miscellaneous tips for improving query processing speed:

  • If your application makes several database requests to perform related updates, combining the statements into a stored routine can help performance. Similarly, if your application computes a single result based on several column values or large volumes of data, combining the computation into a UDF (user-defined function) can help performance. The resulting fast database operations are then available to be reused by other queries, applications, and even code written in different programming languages. See Section 19.2, "Using Stored Routines (Procedures and Functions)" and Section 23.3, "Adding New Functions to MySQL" for more information.

  • To fix any compression issues that occur with ARCHIVE tables, use OPTIMIZE TABLE. See Section 14.8, "The ARCHIVE Storage Engine".

  • If possible, classify reports as "live" or as "statistical", where data needed for statistical reports is created only from summary tables that are generated periodically from the live data.

  • If you have data that does not conform well to a rows-and-columns table structure, you can pack and store data into a BLOB column. In this case, you must provide code in your application to pack and unpack information, but this might save I/O operations to read and write the sets of related values.

  • With Web servers, store images and other binary assets as files, with the path name stored in the database rather than the file itself. Most Web servers are better at caching files than database contents, so using files is generally faster. (Although you must handle backups and storage issues yourself in this case.)

  • If you need really high speed, look at the low-level MySQL interfaces. For example, by accessing the MySQL InnoDB or MyISAM storage engine directly, you could get a substantial speed increase compared to using the SQL interface.

  • Replication can provide a performance benefit for some operations. You can distribute client retrievals among replication servers to split up the load. To avoid slowing down the master while making backups, you can make backups using a slave server. See Chapter 16, Replication.

8.3. Optimization and Indexes

The best way to improve the performance of SELECT operations is to create indexes on one or more of the columns that are tested in the query. The index entries act like pointers to the table rows, allowing the query to quickly determine which rows match a condition in the WHERE clause, and retrieve the other column values for those rows. All MySQL data types can be indexed.

Although it can be tempting to create an indexes for every possible column used in a query, unnecessary indexes waste space and waste time for MySQL to determine which indexes to use. You must find the right balance to achieve fast queries using the optimal set of indexes.

8.3.1. How MySQL Uses Indexes

Indexes are used to find rows with specific column values quickly. Without an index, MySQL must begin with the first row and then read through the entire table to find the relevant rows. The larger the table, the more this costs. If the table has an index for the columns in question, MySQL can quickly determine the position to seek to in the middle of the data file without having to look at all the data. If a table has 1,000 rows, this is at least 100 times faster than reading sequentially.

Most MySQL indexes (PRIMARY KEY, UNIQUE, INDEX, and FULLTEXT) are stored in B-trees. Exceptions are that indexes on spatial data types use R-trees, and that MEMORY tables also support hash indexes.

In general, indexes are used as described in the following discussion. Characteristics specific to hash indexes (as used in MEMORY tables) are described at the end of this section.

MySQL uses indexes for these operations:

  • To find the rows matching a WHERE clause quickly.

  • To eliminate rows from consideration. If there is a choice between multiple indexes, MySQL normally uses the index that finds the smallest number of rows (the most selective index).

  • To retrieve rows from other tables when performing joins. MySQL can use indexes on columns more efficiently if they are declared as the same type and size. In this context, VARCHAR and CHAR are considered the same if they are declared as the same size. For example, VARCHAR(10) and CHAR(10) are the same size, but VARCHAR(10) and CHAR(15) are not.

    Comparison of dissimilar columns may prevent use of indexes if values cannot be compared directly without conversion. Suppose that a numeric column is compared to a string column. For a given value such as 1 in the numeric column, it might compare equal to any number of values in the string column such as '1', ' 1', '00001', or '01.e1'. This rules out use of any indexes for the string column.

  • To find the MIN() or MAX() value for a specific indexed column key_col. This is optimized by a preprocessor that checks whether you are using WHERE key_part_N = constant on all key parts that occur before key_col in the index. In this case, MySQL does a single key lookup for each MIN() or MAX() expression and replaces it with a constant. If all expressions are replaced with constants, the query returns at once. For example:

    SELECT MIN(key_part2),MAX(key_part2)  FROM tbl_name WHERE key_part1=10;
  • To sort or group a table if the sorting or grouping is done on a leftmost prefix of a usable key (for example, ORDER BY key_part1, key_part2). If all key parts are followed by DESC, the key is read in reverse order. See Section 8.13.9, "ORDER BY Optimization", and Section 8.13.10, "GROUP BY Optimization".

  • In some cases, a query can be optimized to retrieve values without consulting the data rows. (An index that provides all the necessary results for a query is called a covering index.) If a query uses only columns from a table that are numeric and that form a leftmost prefix for some key, the selected values can be retrieved from the index tree for greater speed:

    SELECT key_part3 FROM tbl_name  WHERE key_part1=1

Indexes are less important for queries on small tables, or big tables where report queries process most or all of the rows. When a query needs to access most of the rows, reading sequentially is faster than working through an index. Sequential reads minimize disk seeks, even if not all the rows are needed for the query. See Section 8.2.1.4, "How to Avoid Full Table Scans" for details.

8.3.2. Using Primary Keys

The primary key for a table represents the column or set of columns that you use in your most vital queries. It has an associated index, for fast query performance. Query performance benefits from the NOT NULL optimization, because it cannot include any NULL values. With the InnoDB storage engine, the table data is physically organized to do ultra-fast lookups and sorts based on the primary key column or columns.

If your table is big and important, but does not have an obvious column or set of columns to use as a primary key, you might create a separate column with auto-increment values to use as the primary key. These unique IDs can serve as pointers to corresponding rows in other tables when you join tables using foreign keys.

8.3.3. Using Foreign Keys

If a table has many columns, and you query many different combinations of columns, it might be efficient to split the less-frequently used data into separate tables with a few columns each, and relate them back to the main table by duplicating the numeric ID column from the main table. That way, each small table can have a primary key for fast lookups of its data, and you can query just the set of columns that you need using a join operation. Depending on how the data is distributed, the queries might perform less I/O and take up less cache memory because the relevant columns are packed together on disk. (To maximize performance, queries try to read as few data blocks as possible from disk; tables with only a few columns can fit more rows in each data block.)

8.3.4. Column Indexes

The most common type of index involves a single column, storing copies of the values from that column in a data structure, allowing fast lookups for the rows with the corresponding column values. The B-tree data structure lets the index quickly find a specific value, a set of values, or a range of values, corresponding to operators such as =, >, ≤, BETWEEN, IN, and so on, in a WHERE clause.

The maximum number of indexes per table and the maximum index length is defined per storage engine. See Chapter 14, Storage Engines. All storage engines support at least 16 indexes per table and a total index length of at least 256 bytes. Most storage engines have higher limits.

Prefix Indexes

With col_name(N) syntax in an index specification, you can create an index that uses only the first N characters of a string column. Indexing only a prefix of column values in this way can make the index file much smaller. When you index a BLOB or TEXT column, you must specify a prefix length for the index. For example:

CREATE TABLE test (blob_col BLOB, INDEX(blob_col(10)));

Prefixes can be up to 1000 bytes long (767 bytes for InnoDB tables).

Note

Prefix limits are measured in bytes, while the prefix length in CREATE TABLE statements is interpreted as number of characters. Take this into account when specifying a prefix length for a column that uses a multi-byte character set.

FULLTEXT Indexes

You can also create FULLTEXT indexes. These are used for full-text searches. Only the MyISAM storage engine supports FULLTEXT indexes and only for CHAR, VARCHAR, and TEXT columns. Indexing always takes place over the entire column and column prefix indexing is not supported. For details, see Section 12.9, "Full-Text Search Functions".

Spatial Indexes

You can also create indexes on spatial data types. Currently, only MyISAM supports R-tree indexes on spatial types. Other storage engines use B-trees for indexing spatial types (except for ARCHIVE, which does not support spatial type indexing).

Indexes in the MEMORY Storage Engine

The MEMORY storage engine uses HASH indexes by default, but also supports BTREE indexes.

8.3.5. Multiple-Column Indexes

MySQL can create composite indexes (that is, indexes on multiple columns). An index may consist of up to 16 columns. For certain data types, you can index a prefix of the column (see Section 8.3.4, "Column Indexes").

MySQL can use multiple-column indexes for queries that test all the columns in the index, or queries that test just the first column, the first two columns, the first three columns, and so on. If you specify the columns in the right order in the index definition, a single composite index can speed up several kinds of queries on the same table.

A multiple-column index can be considered a sorted array, the rows of which contain values that are created by concatenating the values of the indexed columns.

Note

As an alternative to a composite index, you can introduce a column that is "hashed" based on information from other columns. If this column is short, reasonably unique, and indexed, it might be faster than a "wide" index on many columns. In MySQL, it is very easy to use this extra column:

SELECT * FROM tbl_name  WHERE hash_col=MD5(CONCAT(val1,val2))  AND col1=val1 AND col2=val2;

Suppose that a table has the following specification:

CREATE TABLE test ( id INT NOT NULL, last_name  CHAR(30) NOT NULL, first_name CHAR(30) NOT NULL, PRIMARY KEY (id), INDEX name (last_name,first_name));

The name index is an index over the last_name and first_name columns. The index can be used for lookups in queries that specify values in a known range for combinations of last_name and first_name values. It can also be used for queries that specify just a last_name value because that column is a leftmost prefix of the index (see Section 8.3.5, "Multiple-Column Indexes"). Therefore, the name index is used for lookups in the following queries:

SELECT * FROM test WHERE last_name='Widenius';SELECT * FROM test  WHERE last_name='Widenius' AND first_name='Michael';SELECT * FROM test  WHERE last_name='Widenius'  AND (first_name='Michael' OR first_name='Monty');SELECT * FROM test  WHERE last_name='Widenius'  AND first_name >='M' AND first_name < 'N';

However, the name index is not used for lookups in the following queries:

SELECT * FROM test WHERE first_name='Michael';SELECT * FROM test  WHERE last_name='Widenius' OR first_name='Michael';

Suppose that you issue the following SELECT statement:

mysql> SELECT * FROM tbl_name WHERE col1=val1 AND col2=val2;

If a multiple-column index exists on col1 and col2, the appropriate rows can be fetched directly. If separate single-column indexes exist on col1 and col2, the optimizer attempts to use the Index Merge optimization (see Section 8.13.2, "Index Merge Optimization"), or attempts to find the most restrictive index by deciding which index excludes more rows and using that index to fetch the rows.

If the table has a multiple-column index, any leftmost prefix of the index can be used by the optimizer to find rows. For example, if you have a three-column index on (col1, col2, col3), you have indexed search capabilities on (col1), (col1, col2), and (col1, col2, col3).

MySQL cannot use the index to perform lookups if the columns do not form a leftmost prefix of the index. Suppose that you have the SELECT statements shown here:

SELECT * FROM tbl_name WHERE col1=val1;SELECT * FROM tbl_name WHERE col1=val1 AND col2=val2;SELECT * FROM tbl_name WHERE col2=val2;SELECT * FROM tbl_name WHERE col2=val2 AND col3=val3;

If an index exists on (col1, col2, col3), only the first two queries use the index. The third and fourth queries do involve indexed columns, but (col2) and (col2, col3) are not leftmost prefixes of (col1, col2, col3).

8.3.6. Verifying Index Usage

Always check whether all your queries really use the indexes that you have created in the tables. Use the EXPLAIN statement, as described in Section 8.8.1, "Optimizing Queries with EXPLAIN".

8.3.7. InnoDB and MyISAM Index StatisticsCollection

Storage engines collect statistics about tables for use by the optimizer. Table statistics are based on value groups, where a value group is a set of rows with the same key prefix value. For optimizer purposes, an important statistic is the average value group size.

MySQL uses the average value group size in the following ways:

  • To estimate how may rows must be read for each ref access

  • To estimate how many row a partial join will produce; that is, the number of rows that an operation of this form will produce:

    (...) JOIN tbl_name ON tbl_name.key = expr

As the average value group size for an index increases, the index is less useful for those two purposes because the average number of rows per lookup increases: For the index to be good for optimization purposes, it is best that each index value target a small number of rows in the table. When a given index value yields a large number of rows, the index is less useful and MySQL is less likely to use it.

The average value group size is related to table cardinality, which is the number of value groups. The SHOW INDEX statement displays a cardinality value based on N/S, where N is the number of rows in the table and S is the average value group size. That ratio yields an approximate number of value groups in the table.

For a join based on the <=> comparison operator, NULL is not treated differently from any other value: NULL <=> NULL, just as N <=> N for any other N.

However, for a join based on the = operator, NULL is different from non-NULL values: expr1 = expr2 is not true when expr1 or expr2 (or both) are NULL. This affects ref accesses for comparisons of the form tbl_name.key = expr: MySQL will not access the table if the current value of expr is NULL, because the comparison cannot be true.

For = comparisons, it does not matter how many NULL values are in the table. For optimization purposes, the relevant value is the average size of the non-NULL value groups. However, MySQL does not currently enable that average size to be collected or used.

For InnoDB and MyISAM tables, you have some control over collection of table statistics by means of the innodb_stats_method and myisam_stats_method system variables, respectively. These variables have three possible values, which differ as follows:

  • When the variable is set to nulls_equal, all NULL values are treated as identical (that is, they all form a single value group).

    If the NULL value group size is much higher than the average non-NULL value group size, this method skews the average value group size upward. This makes index appear to the optimizer to be less useful than it really is for joins that look for non-NULL values. Consequently, the nulls_equal method may cause the optimizer not to use the index for ref accesses when it should.

  • When the variable is set to nulls_unequal, NULL values are not considered the same. Instead, each NULL value forms a separate value group of size 1.

    If you have many NULL values, this method skews the average value group size downward. If the average non-NULL value group size is large, counting NULL values each as a group of size 1 causes the optimizer to overestimate the value of the index for joins that look for non-NULL values. Consequently, the nulls_unequal method may cause the optimizer to use this index for ref lookups when other methods may be better.

  • When the variable is set to nulls_ignored, NULL values are ignored.

If you tend to use many joins that use <=> rather than =, NULL values are not special in comparisons and one NULL is equal to another. In this case, nulls_equal is the appropriate statistics method.

The innodb_stats_method system variable has a global value; the myisam_stats_method system variable has both global and session values. Setting the global value affects statistics collection for tables from the corresponding storage engine. Setting the session value affects statistics collection only for the current client connection. This means that you can force a table's statistics to be regenerated with a given method without affecting other clients by setting the session value of myisam_stats_method.

To regenerate table statistics, you can use any of the following methods:

Some caveats regarding the use of innodb_stats_method and myisam_stats_method:

  • You can force table statistics to be collected explicitly, as just described. However, MySQL may also collect statistics automatically. For example, if during the course of executing statements for a table, some of those statements modify the table, MySQL may collect statistics. (This may occur for bulk inserts or deletes, or some ALTER TABLE statements, for example.) If this happens, the statistics are collected using whatever value innodb_stats_method or myisam_stats_method has at the time. Thus, if you collect statistics using one method, but the system variable is set to the other method when a table's statistics are collected automatically later, the other method will be used.

  • There is no way to tell which method was used to generate statistics for a given table.

  • These variables apply only to InnoDB and MyISAM tables. Other storage engines have only one method for collecting table statistics. Usually it is closer to the nulls_equal method.

8.3.8. Comparison of B-Tree and Hash Indexes

Understanding the B-tree and hash data structures can help predict how different queries perform on different storage engines that use these data structures in their indexes, particularly for the MEMORY storage engine that lets you choose B-tree or hash indexes.

B-Tree Index Characteristics

A B-tree index can be used for column comparisons in expressions that use the =, >, >=, <, <=, or BETWEEN operators. The index also can be used for LIKE comparisons if the argument to LIKE is a constant string that does not start with a wildcard character. For example, the following SELECT statements use indexes:

SELECT * FROM tbl_name WHERE key_col LIKE 'Patrick%';SELECT * FROM tbl_name WHERE key_col LIKE 'Pat%_ck%';

In the first statement, only rows with 'Patrick' <= key_col < 'Patricl' are considered. In the second statement, only rows with 'Pat' <= key_col < 'Pau' are considered.

The following SELECT statements do not use indexes:

SELECT * FROM tbl_name WHERE key_col LIKE '%Patrick%';SELECT * FROM tbl_name WHERE key_col LIKE other_col;

In the first statement, the LIKE value begins with a wildcard character. In the second statement, the LIKE value is not a constant.

If you use ... LIKE '%string%' and string is longer than three characters, MySQL uses the Turbo Boyer-Moore algorithm to initialize the pattern for the string and then uses this pattern to perform the search more quickly.

A search using col_name IS NULL employs indexes if col_name is indexed.

Any index that does not span all AND levels in the WHERE clause is not used to optimize the query. In other words, to be able to use an index, a prefix of the index must be used in every AND group.

The following WHERE clauses use indexes:

... WHERE index_part1=1 AND index_part2=2 AND other_column=3 /* index = 1 OR index = 2 */... WHERE index=1 OR A=10 AND index=2 /* optimized like "index_part1='hello'" */... WHERE index_part1='hello' AND index_part3=5 /* Can use index on index1 but not on index2 or index3 */... WHERE index1=1 AND index2=2 OR index1=3 AND index3=3;

These WHERE clauses do not use indexes:

 /* index_part1 is not used */... WHERE index_part2=1 AND index_part3=2 /*  Index is not used in both parts of the WHERE clause  */... WHERE index=1 OR A=10 /* No index spans all rows  */... WHERE index_part1=1 OR index_part2=10

Sometimes MySQL does not use an index, even if one is available. One circumstance under which this occurs is when the optimizer estimates that using the index would require MySQL to access a very large percentage of the rows in the table. (In this case, a table scan is likely to be much faster because it requires fewer seeks.) However, if such a query uses LIMIT to retrieve only some of the rows, MySQL uses an index anyway, because it can much more quickly find the few rows to return in the result.

Hash Index Characteristics

Hash indexes have somewhat different characteristics from those just discussed:

  • They are used only for equality comparisons that use the = or <=> operators (but are very fast). They are not used for comparison operators such as < that find a range of values. Systems that rely on this type of single-value lookup are known as "key-value stores"; to use MySQL for such applications, use hash indexes wherever possible.

  • The optimizer cannot use a hash index to speed up ORDER BY operations. (This type of index cannot be used to search for the next entry in order.)

  • MySQL cannot determine approximately how many rows there are between two values (this is used by the range optimizer to decide which index to use). This may affect some queries if you change a MyISAM table to a hash-indexed MEMORY table.

  • Only whole keys can be used to search for a row. (With a B-tree index, any leftmost prefix of the key can be used to find rows.)

8.4. Optimizing Database Structure

In your role as a database designer, look for the most efficient way to organize your schemas, tables, and columns. As when tuning application code, you minimize I/O, keep related items together, and plan ahead so that performance stays high as the data volume increases. Starting with an efficient database design makes it easier for team members to write high-performing application code, and makes the database likely to endure as applications evolve and are rewritten.

8.4.1. Optimizing Data Size

Design your tables to minimize their space on the disk. This can result in huge improvements by reducing the amount of data written to and read from disk. Smaller tables normally require less main memory while their contents are being actively processed during query execution. Any space reduction for table data also results in smaller indexes that can be processed faster.

MySQL supports many different storage engines (table types) and row formats. For each table, you can decide which storage and indexing method to use. Choosing the proper table format for your application can give you a big performance gain. See Chapter 14, Storage Engines.

You can get better performance for a table and minimize storage space by using the techniques listed here:

Table Columns

  • Use the most efficient (smallest) data types possible. MySQL has many specialized types that save disk space and memory. For example, use the smaller integer types if possible to get smaller tables. MEDIUMINT is often a better choice than INT because a MEDIUMINT column uses 25% less space.

  • Declare columns to be NOT NULL if possible. It makes SQL operations faster, by enabling better use of indexes and eliminating overhead for testing whether each value is NULL. You also save some storage space, one bit per column. If you really need NULL values in your tables, use them. Just avoid the default setting that allows NULL values in every column.

Row Format

  • InnoDB tables use a compact storage format. In versions of MySQL earlier than 5.0.3, InnoDB rows contain some redundant information, such as the number of columns and the length of each column, even for fixed-size columns. By default, tables are created in the compact format (ROW_FORMAT=COMPACT). If you wish to downgrade to older versions of MySQL, you can request the old format with ROW_FORMAT=REDUNDANT.

    The presence of the compact row format decreases row storage space by about 20% at the cost of increasing CPU use for some operations. If your workload is a typical one that is limited by cache hit rates and disk speed it is likely to be faster. If it is a rare case that is limited by CPU speed, it might be slower.

    The compact InnoDB format also changes how CHAR columns containing UTF-8 data are stored. With ROW_FORMAT=REDUNDANT, a UTF-8 CHAR(N) occupies 3 � N bytes, given that the maximum length of a UTF-8 encoded character is three bytes. Many languages can be written primarily using single-byte UTF-8 characters, so a fixed storage length often wastes space. With ROW_FORMAT=COMPACT format, InnoDB allocates a variable amount of storage in the range from N to 3 � N bytes for these columns by stripping trailing spaces if necessary. The minimum storage length is kept as N bytes to facilitate in-place updates in typical cases.

  • To minimize space even further by storing table data in compressed form, specify ROW_FORMAT=COMPRESSED when creating InnoDB tables, or run the myisampack command on an existing MyISAM table. (InnoDB tables compressed tables are readable and writable, while MyISAM compressed tables are read-only.)

  • For MyISAM tables, if you do not have any variable-length columns (VARCHAR, TEXT, or BLOB columns), a fixed-size row format is used. This is faster but may waste some space. See Section 14.5.3, "MyISAM Table Storage Formats". You can hint that you want to have fixed length rows even if you have VARCHAR columns with the CREATE TABLE option ROW_FORMAT=FIXED.

Indexes

  • The primary index of a table should be as short as possible. This makes identification of each row easy and efficient. For InnoDB tables, the primary key columns are duplicated in each secondary index entry, so a short primary key saves considerable space if you have many secondary indexes.

  • Create only the indexes that you need to improve query performance. Indexes are good for retrieval, but slow down insert and update operations. If you access a table mostly by searching on a combination of columns, create a single composite index on them rather than a separate index for each column. The first part of the index should be the column most used. If you always use many columns when selecting from the table, the first column in the index should be the one with the most duplicates, to obtain better compression of the index.

  • If it is very likely that a long string column has a unique prefix on the first number of characters, it is better to index only this prefix, using MySQL's support for creating an index on the leftmost part of the column (see Section 13.1.13, "CREATE INDEX Syntax"). Shorter indexes are faster, not only because they require less disk space, but because they also give you more hits in the index cache, and thus fewer disk seeks. See Section 8.11.2, "Tuning Server Parameters".

Joins

  • In some circumstances, it can be beneficial to split into two a table that is scanned very often. This is especially true if it is a dynamic-format table and it is possible to use a smaller static format table that can be used to find the relevant rows when scanning the table.

  • Declare columns with identical information in different tables with identical data types, to speed up joins based on the corresponding columns.

  • Keep column names simple, so that you can use the same name across different tables and simplify join queries. For example, in a table named customer, use a column name of name instead of customer_name. To make your names portable to other SQL servers, consider keeping them shorter than 18 characters.

Normalization

  • Normally, try to keep all data nonredundant (observing what is referred to in database theory as third normal form). Instead of repeating lengthy values such as names and addresses, assign them unique IDs, repeat these IDs as needed across multiple smaller tables, and join the tables in queries by referencing the IDs in the join clause.

  • If speed is more important than disk space and the maintenance costs of keeping multiple copies of data, for example in a business intelligence scenario where you analyze all the data from large tables, you can relax the normalization rules, duplicating information or creating summary tables to gain more speed.

8.4.2. Optimizing MySQL Data Types

8.4.2.1. Optimizing for Numeric Data

  • For unique IDs or other values that can be represented as either strings or numbers, prefer numeric columns to string columns. Since large numeric values can be stored in fewer bytes than the corresponding strings, it is faster and takes less memory to transfer and compare them.

  • If you are using numeric data, it is faster in many cases to access information from a database (using a live connection) than to access a text file. Information in the database is likely to be stored in a more compact format than in the text file, so accessing it involves fewer disk accesses. You also save code in your application because you can avoid parsing the text file to find line and column boundaries.

8.4.2.2. Optimizing for Character and String Types

For character and string columns, follow these guidelines:

  • Use binary collation order for fast comparison and sort operations, when you do not need language-specific collation features. You can use the BINARY operator to use binary collation within a particular query.

  • When comparing values from different columns, declare those columns with the same character set and collation wherever possible, to avoid string conversions while running the query.

  • For column values less than 8KB in size, use binary VARCHAR instead of BLOB. The GROUP BY and ORDER BY clauses can generate temporary tables, and these temporary tables can use the MEMORY storage engine if the original table does not contain any BLOB columns.

  • If a table contains string columns such as name and address, but many queries do not retrieve those columns, consider splitting the string columns into a separate table and using join queries with a foreign key when necessary. When MySQL retrieves any value from a row, it reads a data block containing all the columns of that row (and possibly other adjacent rows). Keeping each row small, with only the most frequently used columns, allows more rows to fit in each data block. Such compact tables reduce disk I/O and memory usage for common queries.

  • When you use a randomly generated value as a primary key in an InnoDB table, prefix it with an ascending value such as the current date and time if possible. When consecutive primary values are physically stored near each other, InnoDB can insert and retrieve them faster.

  • See Section 8.4.2.1, "Optimizing for Numeric Data" for reasons why a numeric column is usually preferable to an equivalent string column.

8.4.2.3. Optimizing for BLOB Types

  • When storing a large blob containing textual data, consider compressing it first. Do not use this technique when the entire table is compressed by InnoDB or MyISAM.

  • For a table with several columns, to reduce memory requirements for queries that do not use the BLOB column, consider splitting the BLOB column into a separate table and referencing it with a join query when needed.

  • Since the performance requirements to retrieve and display a BLOB value might be very different from other data types, you could put the BLOB-specific table on a different storage device or even a separate database instance. For example, to retrieve a BLOB might require a large sequential disk read that is better suited to a traditional hard drive than to an SSD device.

  • See Section 8.4.2.2, "Optimizing for Character and String Types" for reasons why a binary VARCHAR column is sometimes preferable to an equivalent BLOB column.

  • Rather than testing for equality against a very long text string, you can store a hash of the column value in a separate column, index that column, and test the hashed value in queries. (Use the MD5() or CRC32() function to produce the hash value.) Since hash functions can produce duplicate results for different inputs, you still include a clause AND blob_column = long_string_value in the query to guard against false matches; the performance benefit comes from the smaller, easily scanned index for the hashed values.

8.4.2.4. Using PROCEDURE ANALYSE

ANALYSE([max_elements[,max_memory]])

ANALYSE() examines the result from a query and returns an analysis of the results that suggests optimal data types for each column that may help reduce table sizes. To obtain this analysis, append PROCEDURE ANALYSE to the end of a SELECT statement:

SELECT ... FROM ... WHERE ... PROCEDURE ANALYSE([max_elements,[max_memory]])

For example:

SELECT col1, col2 FROM table1 PROCEDURE ANALYSE(10, 2000);

The results show some statistics for the values returned by the query, and propose an optimal data type for the columns. This can be helpful for checking your existing tables, or after importing new data. You may need to try different settings for the arguments so that PROCEDURE ANALYSE() does not suggest the ENUM data type when it is not appropriate.

The arguments are optional and are used as follows:

  • max_elements (default 256) is the maximum number of distinct values that ANALYSE() notices per column. This is used by ANALYSE() to check whether the optimal data type should be of type ENUM; if there are more than max_elements distinct values, then ENUM is not a suggested type.

  • max_memory (default 8192) is the maximum amount of memory that ANALYSE() should allocate per column while trying to find all distinct values.

8.4.3. Optimizing for Many Tables

Some techniques for keeping individual queries fast involve splitting data across many tables. When the number of tables runs into the thousands or even millions, the overhead of dealing with all these tables becomes a new performance consideration.

8.4.3.1. How MySQL Opens and Closes Tables

When you execute a mysqladmin status command, you should see something like this:

Uptime: 426 Running threads: 1 Questions: 11082Reloads: 1 Open tables: 12

The Open tables value of 12 can be somewhat puzzling if you have only six tables.

MySQL is multi-threaded, so there may be many clients issuing queries for a given table simultaneously. To minimize the problem with multiple client sessions having different states on the same table, the table is opened independently by each concurrent session. This uses additional memory but normally increases performance. With MyISAM tables, one extra file descriptor is required for the data file for each client that has the table open. (By contrast, the index file descriptor is shared between all sessions.)

The table_open_cache and max_connections system variables affect the maximum number of files the server keeps open. If you increase one or both of these values, you may run up against a limit imposed by your operating system on the per-process number of open file descriptors. Many operating systems permit you to increase the open-files limit, although the method varies widely from system to system. Consult your operating system documentation to determine whether it is possible to increase the limit and how to do so.

table_open_cache is related to max_connections. For example, for 200 concurrent running connections, specify a table cache size of at least 200 * N, where N is the maximum number of tables per join in any of the queries which you execute. You must also reserve some extra file descriptors for temporary tables and files.

Make sure that your operating system can handle the number of open file descriptors implied by the table_open_cache setting. If table_open_cache is set too high, MySQL may run out of file descriptors and refuse connections, fail to perform queries, and be very unreliable. You also have to take into account that the MyISAM storage engine needs two file descriptors for each unique open table. You can increase the number of file descriptors available to MySQL using the --open-files-limit startup option to mysqld. See Section C.5.2.18, "'File' Not Found and Similar Errors".

The cache of open tables is kept at a level of table_open_cache entries. The default value is 400; this can be changed with the --table_open_cache option to mysqld. Note that MySQL may temporarily open more tables than this to execute queries.

MySQL closes an unused table and removes it from the table cache under the following circumstances:

  • When the cache is full and a thread tries to open a table that is not in the cache.

  • When the cache contains more than table_open_cache entries and a table in the cache is no longer being used by any threads.

  • When a table flushing operation occurs. This happens when someone issues a FLUSH TABLES statement or executes a mysqladmin flush-tables or mysqladmin refresh command.

When the table cache fills up, the server uses the following procedure to locate a cache entry to use:

  • Tables that are not currently in use are released, beginning with the table least recently used.

  • If a new table needs to be opened, but the cache is full and no tables can be released, the cache is temporarily extended as necessary. When the cache is in a temporarily extended state and a table goes from a used to unused state, the table is closed and released from the cache.

A MyISAM table is opened for each concurrent access. This means the table needs to be opened twice if two threads access the same table or if a thread accesses the table twice in the same query (for example, by joining the table to itself). Each concurrent open requires an entry in the table cache. The first open of any MyISAM table takes two file descriptors: one for the data file and one for the index file. Each additional use of the table takes only one file descriptor for the data file. The index file descriptor is shared among all threads.

If you are opening a table with the HANDLER tbl_name OPEN statement, a dedicated table object is allocated for the thread. This table object is not shared by other threads and is not closed until the thread calls HANDLER tbl_name CLOSE or the thread terminates. When this happens, the table is put back in the table cache (if the cache is not full). See Section 13.2.4, "HANDLER Syntax".

You can determine whether your table cache is too small by checking the mysqld status variable Opened_tables, which indicates the number of table-opening operations since the server started:

mysql> SHOW GLOBAL STATUS LIKE 'Opened_tables';+---------------+-------+| Variable_name | Value |+---------------+-------+| Opened_tables | 2741  |+---------------+-------+

If the value is very large or increases rapidly, even when you have not issued many FLUSH TABLES statements, increase the table cache size. See Section 5.1.4, "Server System Variables", and Section 5.1.6, "Server Status Variables".

8.4.3.2. Disadvantages of Creating Many Tables in the Same Database

If you have many MyISAM tables in the same database directory, open, close, and create operations are slow. If you execute SELECT statements on many different tables, there is a little overhead when the table cache is full, because for every table that has to be opened, another must be closed. You can reduce this overhead by increasing the number of entries permitted in the table cache.

8.4.3.3. How MySQL Uses Internal Temporary Tables

In some cases, the server creates internal temporary tables while processing queries. Such a table can be held in memory and processed by the MEMORY storage engine, or stored on disk and processed by the MyISAM storage engine. The server may create a temporary table initially as an in-memory table, then convert it to an on-disk table if it becomes too large. Users have no direct control over when the server creates an internal temporary table or which storage engine the server uses to manage it.

Temporary tables can be created under conditions such as these:

  • If there is an ORDER BY clause and a different GROUP BY clause, or if the ORDER BY or GROUP BY contains columns from tables other than the first table in the join queue, a temporary table is created.

  • DISTINCT combined with ORDER BY may require a temporary table.

  • If you use the SQL_SMALL_RESULT option, MySQL uses an in-memory temporary table, unless the query also contains elements (described later) that require on-disk storage.

To determine whether a query requires a temporary table, use EXPLAIN and check the Extra column to see whether it says Using temporary. See Section 8.8.1, "Optimizing Queries with EXPLAIN".

Some conditions prevent the use of an in-memory temporary table, in which case the server uses an on-disk table instead:

  • Presence of a BLOB or TEXT column in the table

  • Presence of any column in a GROUP BY or DISTINCT clause larger than 512 bytes

  • Presence of any column larger than 512 bytes in the SELECT list, if UNION or UNION ALL is used

If an internal temporary table is created initially as an in-memory table but becomes too large, MySQL automatically converts it to an on-disk table. The maximum size for in-memory temporary tables is the minimum of the tmp_table_size and max_heap_table_size values. This differs from MEMORY tables explicitly created with CREATE TABLE: For such tables, the max_heap_table_size system variable determines how large the table is permitted to grow and there is no conversion to on-disk format.

When the server creates an internal temporary table (either in memory or on disk), it increments the Created_tmp_tables status variable. If the server creates the table on disk (either initially or by converting an in-memory table) it increments the Created_tmp_disk_tables status variable.

Copyright © 1997, 2013, Oracle and/or its affiliates. All rights reserved. Legal Notices
(Sebelumnya) 7. Backup and Recovery8.5. Optimizing for InnoDB Tables (Berikutnya)