Optimizing for InnoDB Tables - MySQL Manual

MySQL Manual

Daftar Isi ☛

8.9. Buffering and Caching (Berikutnya)

8.5. Optimizing for InnoDB Tables

8.5.1. Optimizing Storage Layout for InnoDB Tables
8.5.2. Optimizing InnoDB Transaction Management
8.5.3. Optimizing InnoDB Logging
8.5.4. Bulk Data Loading for InnoDB Tables
8.5.5. Optimizing InnoDB Queries
8.5.6. Optimizing InnoDB DDL Operations
8.5.7. Optimizing InnoDB Disk I/O
8.5.8. Optimizing InnoDB Configuration Variables
8.5.9. Optimizing InnoDB for Systems with Many Tables

InnoDB is the storage engine that MySQL customers typically use in production databases where reliability and concurrency are important. Because InnoDB is the default storage engine in MySQL 5.5 and higher, you can expect to see InnoDB tables more often than before. This section explains how to optimize database operations for InnoDB tables.

8.5.1. Optimizing Storage Layout for InnoDB Tables

Once your data reaches a stable size, or a growing table has increased by tens or some hundreds of megabytes, consider using the OPTIMIZE TABLE statement to reorganize the table and compact any wasted space. The reorganized tables require less disk I/O to perform full table scans. This is a straightforward technique that can improve performance when other techniques such as improving index usage or tuning application code are not practical.
OPTIMIZE TABLE copies the data part of the table and rebuilds the indexes. The benefits come from improved packing of data within indexes, and reduced fragmentation within the tablespaces and on disk. The benefits vary depending on the data in each table. You may find that there are significant gains for some and not for others, or that the gains decrease over time until you next optimize the table. This operation can be slow if the table is large or if the indexes being rebuilt don't fit into the buffer pool. The first run after adding a lot of data to a table is often much slower than later runs.
In InnoDB, having a long PRIMARY KEY (either a single column with a lengthy value, or several columns that form a long composite value) wastes a lot of disk space. The primary key value for a row is duplicated in all the secondary index records that point to the same row. (See Section 14.3.11, "InnoDB Table and Index Structures".) Create an AUTO_INCREMENT column as the primary key if your primary key is long, or index a prefix of a long VARCHAR column instead of the entire column.
Use the VARCHAR data type instead of CHAR to store variable-length strings or for columns with many NULL values. A CHAR(N) column always takes N characters to store data, even if the string is shorter or its value is NULL. Smaller tables fit better in the buffer pool and reduce disk I/O.
When using COMPACT row format (the default InnoDB format in MySQL 5.5) and variable-length character sets, such as utf8 or sjis, CHAR(N) columns occupy a variable amount of space, but still at least N bytes.
For tables that are big, or contain lots of repetitive text or numeric data, consider using COMPRESSED row format. Less disk I/O is required to bring data into the buffer pool, or to perform full table scans. Before making a permanent decision, measure the amount of compression you can achieve by using COMPRESSED versus COMPACT row format.

8.5.2. Optimizing InnoDB Transaction Management

To optimize InnoDB transaction processing, find the ideal balance between the performance overhead of transactional features and the workload of your server. For example, an application might encounter performance issues if it commits thousands of times per second, and different performance issues if it commits only every 2-3 hours.

The default MySQL setting AUTOCOMMIT=1 can impose performance limitations on a busy database server. Where practical, wrap several related DML operations into a single transaction, by issuing SET AUTOCOMMIT=0 or a START TRANSACTION statement, followed by a COMMIT statement after making all the changes.
InnoDB must flush the log to disk at each transaction commit if that transaction made modifications to the database. When each change is followed by a commit (as with the default autocommit setting), the I/O throughput of the storage device puts a cap on the number of potential operations per second.
Avoid performing rollbacks after inserting, updating, or deleting huge numbers of rows. If a big transaction is slowing down server performance, rolling it back can make the problem worse, potentially taking several times as long to perform as the original DML operations. Killing the database process does not help, because the rollback starts again on server startup.
To minimize the chance of this issue occurring: increase the size of the buffer pool so that all the DML changes can be cached rather than immediately written to disk; set innodb_change_buffering=all so that update and delete operations are buffered in addition to inserts; and consider issuing COMMIT statements periodically during the big DML operation, possibly breaking a single delete or update into multiple statements that operate on smaller numbers of rows.
To get rid of a runaway rollback once it occurs, increase the buffer pool so that the rollback becomes CPU-bound and runs fast, or kill the server and restart with innodb_force_recovery=3, as explained in Section 14.3.7.1, "The InnoDB Recovery Process".
This issue is expected to be less prominent in MySQL 5.5 and up, or in MySQL 5.1 with the InnoDB Plugin, because the default setting innodb_change_buffering=all allows update and delete operations to be cached in memory, making them faster to perform in the first place, and also faster to roll back if needed. Make sure to use this parameter setting on servers that process long-running transactions with many inserts, updates, or deletes.
If you can afford the loss of some of the latest committed transactions if a crash occurs, you can set the innodb_flush_log_at_trx_commit parameter to 0. InnoDB tries to flush the log once per second anyway, although the flush is not guaranteed. Also, set the value of innodb_support_xa to 0, which will reduce the number of disk flushes due to synchronizing on disk data and the binary log.
When rows are modified or deleted, the rows and associated undo logs are not physically removed immediately, or even immediately after the transaction commits. The old data is preserved until transactions that started earlier or concurrently are finished, so that those transactions can access the previous state of modified or deleted rows. Thus, a long-running transaction can prevent InnoDB from purging data that was changed by a different transaction.
When rows are modified or deleted within a long-running transaction, other transactions using the READ COMMITTED and REPEATABLE READ isolation levels have to do more work to reconstruct the older data if they read those same rows.
When a long-running transaction modifies a table, queries against that table from other transactions do not make use of the covering index technique. Queries that normally could retrieve all the result columns from a secondary index, instead look up the appropriate values from the table data.

8.5.3. Optimizing InnoDB Logging

Make your log files big, even as big as the buffer pool. When InnoDB has written the log files full, it must write the modified contents of the buffer pool to disk in a checkpoint. Small log files cause many unnecessary disk writes. Although historically big log files caused lengthy recovery times, recovery is now much faster and you can confidently use large log files.
Make the log buffer quite large as well (on the order of 8MB).

8.5.4. Bulk Data Loading for InnoDB Tables

These performance tips supplement the general guidelines for fast inserts in Section 8.2.2.1, "Speed of INSERT Statements".

When importing data into InnoDB, turn off autocommit mode, because it performs a log flush to disk for every insert. To disable autocommit during your import operation, surround it with SET autocommit and COMMIT statements:
```
SET autocommit=0;... SQL import statements ...COMMIT;
```
The mysqldump option --opt creates dump files that are fast to import into an InnoDB table, even without wrapping them with the SET autocommit and COMMIT statements.
If you have UNIQUE constraints on secondary keys, you can speed up table imports by temporarily turning off the uniqueness checks during the import session:
```
SET unique_checks=0;... SQL import statements ...SET unique_checks=1;
```
For big tables, this saves a lot of disk I/O because InnoDB can use its insert buffer to write secondary index records in a batch. Be certain that the data contains no duplicate keys.
If you have FOREIGN KEY constraints in your tables, you can speed up table imports by turning off the foreign key checks for the duration of the import session:
```
SET foreign_key_checks=0;... SQL import statements ...SET foreign_key_checks=1;
```
For big tables, this can save a lot of disk I/O.
Use the multiple-row INSERT syntax to reduce communication overhead between the client and the server if you need to insert many rows:
```
INSERT INTO yourtable VALUES (1,2), (5,5), ...;
```
This tip is valid for inserts into any table, not just InnoDB tables.
When doing bulk inserts into tables with auto-increment columns, set innodb_autoinc_lock_mode to 2 instead of the default value 1. See Section 14.3.5.3.2, "Configurable InnoDB Auto-Increment Locking" for details.

8.5.5. Optimizing InnoDB Queries

To tune queries for InnoDB tables, create an appropriate set of indexes on each table. See Section 8.3.1, "How MySQL Uses Indexes" for details. Follow these guidelines for InnoDB indexes:

Because each InnoDB table has a primary key (whether you request one or not), specify a set of primary key columns for each table, columns that are used in the most important and time-critical queries.
Do not specify too many or too long columns in the primary key, because these column values are duplicated in each secondary index.
Do not create a separate secondary index for each column, because each query can only make use of one index. Indexes on rarely tested columns or columns with only a few different values might not be helpful for any queries. If you have many queries for the same table, testing different combinations of columns, try to create a small number of concatenated indexes rather than a large number of single-column indexes. If an index contains all the columns needed for the result set (known as a covering index), the query might be able to avoid reading the table data at all.
If an indexed column cannot contain any NULL values, declare it as NOT NULL when you create the table. The optimizer can better determine which index is most effective to use for a query, when it knows whether each column contains NULL values or not.
If you often have recurring queries for tables that are not updated frequently, enable the query cache:
```
[mysqld]query_cache_type = 1query_cache_size = 10M
```

8.5.6. Optimizing InnoDB DDL Operations

For DDL operations on tables and indexes (CREATE, ALTER, and DROP statements), the most significant aspect for InnoDB tables is that creating and dropping secondary indexes is much faster in MySQL 5.5 and higher, than in MySQL 5.1 and before. See Section 14.4.2, "Fast Index Creation in the InnoDB Storage Engine" for details.
"Fast index creation" makes it faster in some cases to drop an index before loading data into a table, then re-create the index after loading the data.
Use TRUNCATE TABLE to empty a table, not DELETE FROM tbl_name. Foreign key constraints can make a TRUNCATE statement work like a regular DELETE statement, in which case a sequence of commands like DROP TABLE and CREATE TABLE might be fastest.
Because the primary key is integral to the storage layout of each InnoDB table, and changing the definition of the primary key involves reorganizing the whole table, always set up the primary key as part of the CREATE TABLE statement, and plan ahead so that you do not need to ALTER or DROP the primary key afterward.

8.5.7. Optimizing InnoDB Disk I/O

If you follow the best practices for database design and the tuning techniques for SQL operations, but your database is still slowed by heavy disk I/O activity, explore these low-level techniques related to disk I/O. If the Unix top tool or the Windows Task Manager shows that the CPU usage percentage with your workload is less than 70%, your workload is probably disk-bound.

When table data is cached in the InnoDB buffer pool, it can be processed over and over by queries without requiring any disk I/O. Specify the size of the buffer pool with the innodb_buffer_pool_size option. This memory area is important enough that busy databases often specify a size approximately 80% of the amount of physical memory. For more information, see Section 8.9.1, "The InnoDB Buffer Pool".
In some versions of GNU/Linux and Unix, flushing files to disk with the Unix fsync() call (which InnoDB uses by default) and similar methods is surprisingly slow. If database write performance is an issue, conduct benchmarks with the innodb_flush_method parameter set to O_DSYNC.
When using the InnoDB storage engine on Solaris 10 for x86_64 architecture (AMD Opteron), use direct I/O for InnoDB-related files, to avoid degradation of InnoDB performance. To use direct I/O for an entire UFS file system used for storing InnoDB-related files, mount it with the forcedirectio option; see mount_ufs(1M). (The default on Solaris 10/x86_64 is not to use this option.) To apply direct I/O only to InnoDB file operations rather than the whole file system, set innodb_flush_method = O_DIRECT. With this setting, InnoDB calls directio() instead of fcntl() for I/O to data files (not for I/O to log files).
When using the InnoDB storage engine with a large innodb_buffer_pool_size value on any release of Solaris 2.6 and up and any platform (sparc/x86/x64/amd64), conduct benchmarks with InnoDB data files and log files on raw devices or on a separate direct I/O UFS file system, using the forcedirectio mount option as described earlier. (It is necessary to use the mount option rather than setting innodb_flush_method if you want direct I/O for the log files.) Users of the Veritas file system VxFS should use the convosync=direct mount option.
Do not place other MySQL data files, such as those for MyISAM tables, on a direct I/O file system. Executables or libraries must not be placed on a direct I/O file system.
If you have additional storage devices available to set up a RAID configuration or symbolic links to different disks, Section 8.11.3, "Optimizing Disk I/O" for additional low-level I/O tips.

8.5.8. Optimizing InnoDB Configuration Variables

Different settings work best for servers with light, predictable loads, versus servers that are running near full capacity all the time, or that experience spikes of high activity.

Because the InnoDB storage engine performs many of its optimizations automatically, many performance-tuning tasks involve monitoring to ensure that the database is performing well, and changing configuration options when performance drops. See Section 14.4.7.17, "Integration with the MySQL Performance Schema" for information about detailed InnoDB performance monitoring.

For information about the most important and most recent InnoDB performance features, see Section 14.4.7, "InnoDB Performance and Scalability Enhancements". Even if you have used InnoDB tables in prior versions, these features might be new to you, because they are from the "InnoDB Plugin". The Plugin co-existed alongside the built-in InnoDB in MySQL 5.1, and becomes the default storage engine in MySQL 5.5 and higher.

The main configuration steps you can perform include:

Enabling InnoDB to use high-performance memory allocators on systems that include them. See Section 14.4.7.3, "Using Operating System Memory Allocators".
Controlling the types of DML operations for which InnoDB buffers the changed data, to avoid frequent small disk writes. See Section 14.4.7.4, "Controlling InnoDB Change Buffering". Because the default is to buffer all types of DML operations, only change this setting if you need to reduce the amount of buffering.
Turning the adaptive hash indexing feature on and off. See Section 14.4.7.5, "Controlling Adaptive Hash Indexing". You might change this setting during periods of unusual activity, then restore it to its original setting.
Setting a limit on the number of concurrent threads that InnoDB processes, if context switching is a bottleneck. See Section 14.4.7.6, "Changes Regarding Thread Concurrency".
Controlling the amount of prefetching that InnoDB does with its read-ahead operations. When the system has unused I/O capacity, more read-ahead can improve the performance of queries. Too much read-ahead can cause periodic drops in performance on a heavily loaded system. See Section 14.4.7.7, "Changes in the Read-Ahead Algorithm".
Increasing the number of background threads for read or write operations, if you have a high-end I/O subsystem that is not fully utilized by the default values. See Section 14.4.7.8, "Multiple Background I/O Threads".
Controlling how much I/O InnoDB performs in the background. See Section 14.4.7.11, "Controlling the Master Thread I/O Rate". The amount of background I/O is higher than in MySQL 5.1, so you might scale back this setting if you observe periodic drops in performance.
Controlling the algorithm that determines when InnoDB performs certain types of background writes. See Section 14.4.7.12, "Controlling the Flushing Rate of Dirty Pages". The algorithm works for some types of workloads but not others, so might turn off this setting if you observe periodic drops in performance.
Taking advantage of multicore processors and their cache memory configuration, to minimize delays in context switching. See Section 14.4.7.14, "Control of Spin Lock Polling".
Preventing one-time operations such as table scans from interfering with the frequently accessed data stored in the InnoDB buffer cache. See Section 14.4.7.15, "Making the Buffer Pool Scan Resistant".
Adjusting your log files to a size that makes sense for reliability and crash recovery. See Section 14.4.7.16, "Improvements to Crash Recovery Performance". Historically, people have kept their InnoDB log files small to avoid long startup times after a crash. Internal improvements in InnoDB make startup much faster, so the log file size is not such a performance factor anymore. If your log files are artificially small, increasing the size can help performance by reducing the I/O that occurs as redo log records are recycled.
Configuring the size and number of instances for the InnoDB buffer pool, especially important for systems with multi-gigabyte buffer pools. See Section 14.4.7.18, "Improvements to Performance from Multiple Buffer Pools".
Increasing the maximum number of concurrent transactions, which dramatically improves scalability for the busiest databases. See Section 14.4.7.19, "Better Scalability with Multiple Rollback Segments". Although this feature does not require any action during day-to-day operation, you must perform a slow shutdown during or after upgrading the database to MySQL 5.5 to enable the higher limit.
Moving purge operations (a type of garbage collection) into a background thread. See Section 14.4.7.20, "Better Scalability with Improved Purge Scheduling". To effectively measure the results of this setting, tune the other I/O-related and thread-related configuration settings first.
Reducing the amount of switching that InnoDB does between concurrent threads, so that SQL operations on a busy server do not queue up and form a "traffic jam". Set a value for the innodb_thread_concurrency option, up to approximately 32 for a high-powered modern system. Increase the value for the innodb_concurrency_tickets option, typically to 5000 or so, This combination of options sets a cap on the number of threads that InnoDB processes at any one time, and allows each thread to do substantial work before being swapped out, so that the number of waiting threads stays low and operations can complete without excessive context switching.

8.5.9. Optimizing InnoDB for Systems with Many Tables

InnoDB computes index cardinality values for a table the first time that table is accessed after startup, instead of storing such values in the table. This step can take significant time on systems that partition the data into many tables. Since this overhead only applies to the initial table open operation, to "warm up" a table for later use, access it immediately after startup by issuing a statement such as SELECT 1 FROM tbl_name LIMIT 1.

8.6. Optimizing for MyISAM Tables

8.6.1. Optimizing MyISAM Queries
8.6.2. Bulk Data Loading for MyISAM Tables
8.6.3. Speed of REPAIR TABLE Statements

The MyISAM storage engine performs best with read-mostly data or with low-concurrency operations, because table locks limit the ability to perform simultaneous updates. In MySQL 5.5, InnoDB is the default storage engine rather than MyISAM.

8.6.1. Optimizing MyISAM Queries

Some general tips for speeding up queries on MyISAM tables:

To help MySQL better optimize queries, use ANALYZE TABLE or run myisamchk --analyze on a table after it has been loaded with data. This updates a value for each index part that indicates the average number of rows that have the same value. (For unique indexes, this is always 1.) MySQL uses this to decide which index to choose when you join two tables based on a nonconstant expression. You can check the result from the table analysis by using SHOW INDEX FROM tbl_name and examining the Cardinality value. myisamchk --description --verbose shows index distribution information.
To sort an index and data according to an index, use myisamchk --sort-index --sort-records=1 (assuming that you want to sort on index 1). This is a good way to make queries faster if you have a unique index from which you want to read all rows in order according to the index. The first time you sort a large table this way, it may take a long time.
Try to avoid complex SELECT queries on MyISAM tables that are updated frequently, to avoid problems with table locking that occur due to contention between readers and writers.
MyISAM supports concurrent inserts: If a table has no free blocks in the middle of the data file, you can INSERT new rows into it at the same time that other threads are reading from the table. If it is important to be able to do this, consider using the table in ways that avoid deleting rows. Another possibility is to run OPTIMIZE TABLE to defragment the table after you have deleted a lot of rows from it. This behavior is altered by setting the concurrent_insert variable. You can force new rows to be appended (and therefore permit concurrent inserts), even in tables that have deleted rows. See Section 8.10.3, "Concurrent Inserts".
For MyISAM tables that change frequently, try to avoid all variable-length columns (VARCHAR, BLOB, and TEXT). The table uses dynamic row format if it includes even a single variable-length column. See Chapter 14, Storage Engines.
It is normally not useful to split a table into different tables just because the rows become large. In accessing a row, the biggest performance hit is the disk seek needed to find the first byte of the row. After finding the data, most modern disks can read the entire row fast enough for most applications. The only cases where splitting up a table makes an appreciable difference is if it is a MyISAM table using dynamic row format that you can change to a fixed row size, or if you very often need to scan the table but do not need most of the columns. See Chapter 14, Storage Engines.
Use ALTER TABLE ... ORDER BY expr1, expr2, ... if you usually retrieve rows in expr1, expr2, ... order. By using this option after extensive changes to the table, you may be able to get higher performance.
If you often need to calculate results such as counts based on information from a lot of rows, it may be preferable to introduce a new table and update the counter in real time. An update of the following form is very fast:
```
UPDATE tbl_name SET count_col=count_col+1 WHERE key_col=constant;
```
This is very important when you use MySQL storage engines such as MyISAM that has only table-level locking (multiple readers with single writers). This also gives better performance with most database systems, because the row locking manager in this case has less to do.
Use INSERT DELAYED when you do not need to know when your data is written. This reduces the overall insertion impact because many rows can be written with a single disk write.
Use INSERT LOW_PRIORITY when you want to give SELECT statements higher priority than your inserts.
Use SELECT HIGH_PRIORITY to get retrievals that jump the queue. That is, the SELECT is executed even if there is another client waiting to do a write.
LOW_PRIORITY and HIGH_PRIORITY have an effect only for storage engines that use only table-level locking (such as MyISAM, MEMORY, and MERGE).
Use OPTIMIZE TABLE periodically to avoid fragmentation with dynamic-format MyISAM tables. See Section 14.5.3, "MyISAM Table Storage Formats".
Declaring a MyISAM table with the DELAY_KEY_WRITE=1 table option makes index updates faster because they are not flushed to disk until the table is closed. The downside is that if something kills the server while such a table is open, you must ensure that the table is okay by running the server with the --myisam-recover-options option, or by running myisamchk before restarting the server. (However, even in this case, you should not lose anything by using DELAY_KEY_WRITE, because the key information can always be generated from the data rows.)
Strings are automatically prefix- and end-space compressed in MyISAM indexes. See Section 13.1.13, "CREATE INDEX Syntax".
You can increase performance by caching queries or answers in your application and then executing many inserts or updates together. Locking the table during this operation ensures that the index cache is only flushed once after all updates. You can also take advantage of MySQL's query cache to achieve similar results; see Section 8.9.3, "The MySQL Query Cache".

8.6.2. Bulk Data Loading for MyISAM Tables

These performance tips supplement the general guidelines for fast inserts in Section 8.2.2.1, "Speed of INSERT Statements".

To improve performance when multiple clients insert a lot of rows, use the INSERT DELAYED statement. See Section 13.2.5.2, "INSERT DELAYED Syntax". This technique works for MyISAM and some other storage engines, but not InnoDB.
For a MyISAM table, you can use concurrent inserts to add rows at the same time that SELECT statements are running, if there are no deleted rows in middle of the data file. See Section 8.10.3, "Concurrent Inserts".
With some extra work, it is possible to make LOAD DATA INFILE run even faster for a MyISAM table when the table has many indexes. Use the following procedure:
1. Execute a FLUSH TABLES statement or a mysqladmin flush-tables command.
2. Use myisamchk --keys-used=0 -rq /path/to/db/tbl_name to remove all use of indexes for the table.
3. Insert data into the table with LOAD DATA INFILE. This does not update any indexes and therefore is very fast.
4. If you intend only to read from the table in the future, use myisampack to compress it. See Section 14.5.3.3, "Compressed Table Characteristics".
5. Re-create the indexes with myisamchk -rq /path/to/db/tbl_name. This creates the index tree in memory before writing it to disk, which is much faster that updating the index during LOAD DATA INFILE because it avoids lots of disk seeks. The resulting index tree is also perfectly balanced.
6. Execute a FLUSH TABLES statement or a mysqladmin flush-tables command.
LOAD DATA INFILE performs the preceding optimization automatically if the MyISAM table into which you insert data is empty. The main difference between automatic optimization and using the procedure explicitly is that you can let myisamchk allocate much more temporary memory for the index creation than you might want the server to allocate for index re-creation when it executes the LOAD DATA INFILE statement.
You can also disable or enable the nonunique indexes for a MyISAM table by using the following statements rather than myisamchk. If you use these statements, you can skip the FLUSH TABLE operations:
```
ALTER TABLE tbl_name DISABLE KEYS;ALTER TABLE tbl_name ENABLE KEYS;
```
To speed up INSERT operations that are performed with multiple statements for nontransactional tables, lock your tables:
```
LOCK TABLES a WRITE;INSERT INTO a VALUES (1,23),(2,34),(4,33);INSERT INTO a VALUES (8,26),(6,29);...UNLOCK TABLES;
```
This benefits performance because the index buffer is flushed to disk only once, after all INSERT statements have completed. Normally, there would be as many index buffer flushes as there are INSERT statements. Explicit locking statements are not needed if you can insert all rows with a single INSERT.
Locking also lowers the total time for multiple-connection tests, although the maximum wait time for individual connections might go up because they wait for locks. Suppose that five clients attempt to perform inserts simultaneously as follows:
- Connection 1 does 1000 inserts
- Connections 2, 3, and 4 do 1 insert
- Connection 5 does 1000 inserts
If you do not use locking, connections 2, 3, and 4 finish before 1 and 5. If you use locking, connections 2, 3, and 4 probably do not finish before 1 or 5, but the total time should be about 40% faster.
INSERT, UPDATE, and DELETE operations are very fast in MySQL, but you can obtain better overall performance by adding locks around everything that does more than about five successive inserts or updates. If you do very many successive inserts, you could do a LOCK TABLES followed by an UNLOCK TABLES once in a while (each 1,000 rows or so) to permit other threads to access table. This would still result in a nice performance gain.
INSERT is still much slower for loading data than LOAD DATA INFILE, even when using the strategies just outlined.
To increase performance for MyISAM tables, for both LOAD DATA INFILE and INSERT, enlarge the key cache by increasing the key_buffer_size system variable. See Section 8.11.2, "Tuning Server Parameters".

8.6.3. Speed of REPAIR TABLE Statements

REPAIR TABLE for MyISAM tables is similar to using myisamchk for repair operations, and some of the same performance optimizations apply:

myisamchk has variables that control memory allocation. You may be able to its improve performance by setting these variables, as described in Section 4.6.3.6, "myisamchk Memory Usage".
For REPAIR TABLE, the same principle applies, but because the repair is done by the server, you set server system variables instead of myisamchk variables. Also, in addition to setting memory-allocation variables, increasing the myisam_max_sort_file_size system variable increases the likelihood that the repair will use the faster filesort method and avoid the slower repair by key cache method. Set the variable to the maximum file size for your system, after checking to be sure that there is enough free space to hold a copy of the table files. The free space must be available in the file system containing the original table files.

Suppose that a myisamchk table-repair operation is done using the following options to set its memory-allocation variables:

--key_buffer_size=128M --myisam_sort_buffer_size=256M--read_buffer_size=64M --write_buffer_size=64M

Some of those myisamchk variables correspond to server system variables:

myisamchk Variable	System Variable
key_buffer_size	key_buffer_size
myisam_sort_buffer_size	myisam_sort_buffer_size
read_buffer_size	read_buffer_size
write_buffer_size	none

Each of the server system variables can be set at runtime, and some of them (myisam_sort_buffer_size, read_buffer_size) have a session value in addition to a global value. Setting a session value limits the effect of the change to your current session and does not affect other users. Changing a global-only variable (key_buffer_size, myisam_max_sort_file_size) affects other users as well. For key_buffer_size, you must take into account that the buffer is shared with those users. For example, if you set the myisamchk key_buffer_size variable to 128MB, you could set the corresponding key_buffer_size system variable larger than that (if it is not already set larger), to permit key buffer use by activity in other sessions. However, changing the global key buffer size invalidates the buffer, causing increased disk I/O and slowdown for other sessions. An alternative that avoids this problem is to use a separate key cache, assign to it the indexes from the table to be repaired, and deallocate it when the repair is complete. See Section 8.9.2.2, "Multiple Key Caches".

Based on the preceding remarks, a REPAIR TABLE operation can be done as follows to use settings similar to the myisamchk command. Here a separate 128MB key buffer is allocated and the file system is assumed to permit a file size of at least 100GB.

SET SESSION myisam_sort_buffer_size = 256*1024*1024;SET SESSION read_buffer_size = 64*1024*1024;SET GLOBAL myisam_max_sort_file_size = 100*1024*1024*1024;SET GLOBAL repair_cache.key_buffer_size = 128*1024*1024;CACHE INDEX tbl_name IN repair_cache;LOAD INDEX INTO CACHE tbl_name;REPAIR TABLE tbl_name ;SET GLOBAL repair_cache.key_buffer_size = 0;

If you intend to change a global variable but want to do so only for the duration of a REPAIR TABLE operation to minimally affect other users, save its value in a user variable and restore it afterward. For example:

SET @old_myisam_sort_buffer_size = @@global.myisam_max_sort_file_size;SET GLOBAL myisam_max_sort_file_size = 100*1024*1024*1024;REPAIR TABLE tbl_name ;SET GLOBAL myisam_max_sort_file_size = @old_myisam_max_sort_file_size;

The system variables that affect REPAIR TABLE can be set globally at server startup if you want the values to be in effect by default. For example, add these lines to the server my.cnf file:

[mysqld]myisam_sort_buffer_size=256Mkey_buffer_size=1Gmyisam_max_sort_file_size=100G

These settings do not include read_buffer_size. Setting read_buffer_size globally to a large value does so for all sessions and can cause performance to suffer due to excessive memory allocation for a server with many simultaneous sessions.

8.7. Optimizing for MEMORY Tables

Consider using MEMORY tables for noncritical data that is accessed often, and is read-only or rarely updated. Benchmark your application against equivalent InnoDB or MyISAM tables under a realistic workload, to confirm that any additional performance is worth the risk of losing data, or the overhead of copying data from a disk-based table at application start.

For best performance with MEMORY tables, examine the kinds of queries against each table, and specify the type to use for each associated index, either a B-tree index or a hash index. On the CREATE INDEX statement, use the clause USING BTREE or USING HASH. B-tree indexes are fast for queries that do greater-than or less-than comparisons through operators such as > or BETWEEN. Hash indexes are only fast for queries that look up single values through the = operator, or a restricted set of values through the IN operator. For why USING BTREE is often a better choice than the default USING HASH, see Section 8.2.1.4, "How to Avoid Full Table Scans". For implementation details of the different types of MEMORY indexes, see Section 8.3.8, "Comparison of B-Tree and Hash Indexes".

8.8. Understanding the Query Execution Plan

8.8.1. Optimizing Queries with EXPLAIN
8.8.2. EXPLAIN Output Format
8.8.3. EXPLAIN EXTENDED Output Format
8.8.4. Estimating Query Performance
8.8.5. Controlling the Query Optimizer

Depending on the details of your tables, columns, indexes, and the conditions in your WHERE clause, the MySQL optimizer considers many techniques to efficiently perform the lookups involved in an SQL query. A query on a huge table can be performed without reading all the rows; a join involving several tables can be performed without comparing every combination of rows. The set of operations that the optimizer chooses to perform the most efficient query is called the "query execution plan", also known as the EXPLAIN plan. Your goals are to recognize the aspects of the EXPLAIN plan that indicate a query is optimized well, and to learn the SQL syntax and indexing techniques to improve the plan if you see some inefficient operations.

8.8.1. Optimizing Queries with EXPLAIN

The EXPLAIN statement can be used either as a way to obtain information about how MySQL executes a statement, or as a synonym for DESCRIBE:

When you precede a SELECT statement with the keyword EXPLAIN, MySQL displays information from the optimizer about the query execution plan. That is, MySQL explains how it would process the statement, including information about how tables are joined and in which order. See Section 8.8.2, "EXPLAIN Output Format".
EXPLAIN EXTENDED can be used to obtain additional information; see Section 8.8.3, "EXPLAIN EXTENDED Output Format".
EXPLAIN PARTITIONS is useful only when examining queries involving partitioned tables. For details, see Section 18.3.4, "Obtaining Information About Partitions".
EXPLAIN tbl_name is synonymous with DESCRIBE tbl_name or SHOW COLUMNS FROM tbl_name. For information about DESCRIBE and SHOW COLUMNS, see Section 13.8.1, "DESCRIBE Syntax", and Section 13.7.5.6, "SHOW COLUMNS Syntax".

With the help of EXPLAIN, you can see where you should add indexes to tables so that the statement executes faster by using indexes to find rows. You can also use EXPLAIN to check whether the optimizer joins the tables in an optimal order. To give a hint to the optimizer to use a join order corresponding to the order in which the tables are named in a SELECT statement, begin the statement with SELECT STRAIGHT_JOIN rather than just SELECT. (See Section 13.2.9, "SELECT Syntax".)

If you have a problem with indexes not being used when you believe that they should be, run ANALYZE TABLE to update table statistics, such as cardinality of keys, that can affect the choices the optimizer makes. See Section 13.7.2.1, "ANALYZE TABLE Syntax".

8.8.2. EXPLAIN Output Format

The EXPLAIN statement provides information about the execution plan for a SELECT statement.

EXPLAIN returns a row of information for each table used in the SELECT statement. It lists the tables in the output in the order that MySQL would read them while processing the statement. MySQL resolves all joins using a nested-loop join method. This means that MySQL reads a row from the first table, and then finds a matching row in the second table, the third table, and so on. When all tables are processed, MySQL outputs the selected columns and backtracks through the table list until a table is found for which there are more matching rows. The next row is read from this table and the process continues with the next table.

When the EXTENDED keyword is used, EXPLAIN produces extra information that can be viewed by issuing a SHOW WARNINGS statement following the EXPLAIN statement. EXPLAIN EXTENDED also displays the filtered column. See Section 8.8.3, "EXPLAIN EXTENDED Output Format".

Note

You cannot use the EXTENDED and PARTITIONS keywords together in the same EXPLAIN statement.

EXPLAIN Output Columns

This section describes the output columns produced by EXPLAIN. Later sections provide additional information about the type and Extra columns.

Each output row from EXPLAIN provides information about one table. Each row contains the values summarized in Table 8.1, "EXPLAIN Output Columns", and described in more detail following the table.

Table 8.1. EXPLAIN Output Columns

Column	Meaning
id	The SELECT identifier
select_type	The SELECT type
table	The table for the output row
partitions	The matching partitions
type	The join type
possible_keys	The possible indexes to choose
key	The index actually chosen
key_len	The length of the chosen key
ref	The columns compared to the index
rows	Estimate of rows to be examined
filtered	Percentage of rows filtered by table condition
Extra	Additional information

id
The SELECT identifier. This is the sequential number of the SELECT within the query. The value can be NULL if the row refers to the union result of other rows. In this case, the table column shows a value like <unionM,N> to indicate that the row refers to the union of the rows with id values of M and N.

select_type

The type of SELECT, which can be any of those shown in the following table.

select_type Value	Meaning
SIMPLE	Simple SELECT (not using UNION or subqueries)
PRIMARY	Outermost SELECT
UNION	Second or later SELECT statement in a UNION
DEPENDENT UNION	Second or later SELECT statement in a UNION, dependent on outer query
UNION RESULT	Result of a UNION.
SUBQUERY	First SELECT in subquery
DEPENDENT SUBQUERY	First SELECT in subquery, dependent on outer query
DERIVED	Derived table SELECT (subquery in FROM clause)
UNCACHEABLE SUBQUERY	A subquery for which the result cannot be cached and must be re-evaluated for each row of the outer query
UNCACHEABLE UNION	The second or later select in a UNION that belongs to an uncacheable subquery (seeUNCACHEABLE SUBQUERY)

DEPENDENT typically signifies the use of a correlated subquery. See Section 13.2.10.7, "Correlated Subqueries".

DEPENDENT SUBQUERY evaluation differs from UNCACHEABLE SUBQUERY evaluation. For DEPENDENT SUBQUERY, the subquery is re-evaluated only once for each set of different values of the variables from its outer context. For UNCACHEABLE SUBQUERY, the subquery is re-evaluated for each row of the outer context.

Cacheability of subqueries differs from caching of query results in the query cache (which is described in Section 8.9.3.1, "How the Query Cache Operates"). Subquery caching occurs during query execution, whereas the query cache is used to store results only after query execution finishes.

table
The name of the table to which the row of output refers. This can also be one of the following values:
- <unionM,N>: The row refers to the union of the rows with id values of M and N.
- <derivedN>: The row refers to the derived table result for the row with an id value of N. A derived table may result, for example, from a subquery in the FROM clause.
partitions
The partitions from which records would be matched by the query. This column is displayed only if the PARTITIONS keyword is used. The value is NULL for nonpartitioned tables. See Section 18.3.4, "Obtaining Information About Partitions".
type
The join type. For descriptions of the different types, see EXPLAIN Join Types.
possible_keys
The possible_keys column indicates which indexes MySQL can choose from use to find the rows in this table. Note that this column is totally independent of the order of the tables as displayed in the output from EXPLAIN. That means that some of the keys in possible_keys might not be usable in practice with the generated table order.
If this column is NULL, there are no relevant indexes. In this case, you may be able to improve the performance of your query by examining the WHERE clause to check whether it refers to some column or columns that would be suitable for indexing. If so, create an appropriate index and check the query with EXPLAIN again. See Section 13.1.7, "ALTER TABLE Syntax".
To see what indexes a table has, use SHOW INDEX FROM tbl_name.
key
The key column indicates the key (index) that MySQL actually decided to use. If MySQL decides to use one of the possible_keys indexes to look up rows, that index is listed as the key value.
It is possible that key will name an index that is not present in the possible_keys value. This can happen if none of the possible_keys indexes are suitable for looking up rows, but all the columns selected by the query are columns of some other index. That is, the named index covers the selected columns, so although it is not used to determine which rows to retrieve, an index scan is more efficient than a data row scan.
For InnoDB, a secondary index might cover the selected columns even if the query also selects the primary key because InnoDB stores the primary key value with each secondary index. If key is NULL, MySQL found no index to use for executing the query more efficiently.
To force MySQL to use or ignore an index listed in the possible_keys column, use FORCE INDEX, USE INDEX, or IGNORE INDEX in your query. See Section 13.2.9.3, "Index Hint Syntax".
For MyISAM and NDB tables, running ANALYZE TABLE helps the optimizer choose better indexes. For NDB tables, this also improves performance of distributed pushed-down joins. For MyISAM tables, myisamchk --analyze does the same as ANALYZE TABLE. See Section 7.6, "MyISAM Table Maintenance and Crash Recovery".
key_len
The key_len column indicates the length of the key that MySQL decided to use. The length is NULL if the key column says NULL. Note that the value of key_len enables you to determine how many parts of a multiple-part key MySQL actually uses.
ref
The ref column shows which columns or constants are compared to the index named in the key column to select rows from the table.
rows
The rows column indicates the number of rows MySQL believes it must examine to execute the query.
For InnoDB tables, this number is an estimate, and may not always be exact.
filtered
The filtered column indicates an estimated percentage of table rows that will be filtered by the table condition. That is, rows shows the estimated number of rows examined and rows � filtered / 100 shows the number of rows that will be joined with previous tables. This column is displayed if you use EXPLAIN EXTENDED.
Extra
This column contains additional information about how MySQL resolves the query. For descriptions of the different values, see EXPLAIN Extra Information.

EXPLAIN Join Types

The type column of EXPLAIN output describes how tables are joined. The following list describes the join types, ordered from the best type to the worst:

system
The table has only one row (= system table). This is a special case of the const join type.
const
The table has at most one matching row, which is read at the start of the query. Because there is only one row, values from the column in this row can be regarded as constants by the rest of the optimizer. const tables are very fast because they are read only once.
const is used when you compare all parts of a PRIMARY KEY or UNIQUE index to constant values. In the following queries, tbl_name can be used as a const table:
```
SELECT * FROM tbl_name WHERE primary_key=1;SELECT * FROM tbl_name  WHERE primary_key_part1=1 AND primary_key_part2=2;
```
eq_ref
One row is read from this table for each combination of rows from the previous tables. Other than the system and const types, this is the best possible join type. It is used when all parts of an index are used by the join and the index is a PRIMARY KEY or UNIQUE NOT NULL index.
eq_ref can be used for indexed columns that are compared using the = operator. The comparison value can be a constant or an expression that uses columns from tables that are read before this table. In the following examples, MySQL can use an eq_ref join to process ref_table:
```
SELECT * FROM ref_table,other_table  WHERE ref_table.key_column=other_table.column;SELECT * FROM ref_table,other_table  WHERE ref_table.key_column_part1=other_table.column  AND ref_table.key_column_part2=1;
```
ref
All rows with matching index values are read from this table for each combination of rows from the previous tables. ref is used if the join uses only a leftmost prefix of the key or if the key is not a PRIMARY KEY or UNIQUE index (in other words, if the join cannot select a single row based on the key value). If the key that is used matches only a few rows, this is a good join type.
ref can be used for indexed columns that are compared using the = or <=> operator. In the following examples, MySQL can use a ref join to process ref_table:
```
SELECT * FROM ref_table WHERE key_column=expr;SELECT * FROM ref_table,other_table  WHERE ref_table.key_column=other_table.column;SELECT * FROM ref_table,other_table  WHERE ref_table.key_column_part1=other_table.column  AND ref_table.key_column_part2=1;
```
fulltext
The join is performed using a FULLTEXT index.
ref_or_null
This join type is like ref, but with the addition that MySQL does an extra search for rows that contain NULL values. This join type optimization is used most often in resolving subqueries. In the following examples, MySQL can use a ref_or_null join to process ref_table:
```
SELECT * FROM ref_table  WHERE key_column=expr OR key_column IS NULL;
```
See Section 8.13.4, "IS NULL Optimization".
index_merge
This join type indicates that the Index Merge optimization is used. In this case, the key column in the output row contains a list of indexes used, and key_len contains a list of the longest key parts for the indexes used. For more information, see Section 8.13.2, "Index Merge Optimization".
unique_subquery
This type replaces ref for some IN subqueries of the following form:
```
value IN (SELECT primary_key FROM single_table WHERE some_expr)
```
unique_subquery is just an index lookup function that replaces the subquery completely for better efficiency.
index_subquery
This join type is similar to unique_subquery. It replaces IN subqueries, but it works for nonunique indexes in subqueries of the following form:
```
value IN (SELECT key_column FROM single_table WHERE some_expr)
```
range
Only rows that are in a given range are retrieved, using an index to select the rows. The key column in the output row indicates which index is used. The key_len contains the longest key part that was used. The ref column is NULL for this type.
range can be used when a key column is compared to a constant using any of the =, <>, >, >=, <, <=, IS NULL, <=>, BETWEEN, or IN() operators:
```
SELECT * FROM tbl_name  WHERE key_column = 10;SELECT * FROM tbl_name  WHERE key_column BETWEEN 10 and 20;SELECT * FROM tbl_name  WHERE key_column IN (10,20,30);SELECT * FROM tbl_name  WHERE key_part1 = 10 AND key_part2 IN (10,20,30);
```
index
The index join type is the same as ALL, except that the index tree is scanned. This occurs two ways:
- If the index is a covering index for the queries and can be used to satisify all data required from the table, only the index tree is scanned. In this case, the Extra column says Using index. An index-only scan usually is faster than ALL because the size of the index usually is smaller than the table data.
- A full table scan is performed using reads from the index to look up data rows in index order. Uses index does not appear in the Extra column.
MySQL can use this join type when the query uses only columns that are part of a single index.
ALL
A full table scan is done for each combination of rows from the previous tables. This is normally not good if the table is the first table not marked const, and usually very bad in all other cases. Normally, you can avoid ALL by adding indexes that enable row retrieval from the table based on constant values or column values from earlier tables.

EXPLAIN Extra Information

The Extra column of EXPLAIN output contains additional information about how MySQL resolves the query. The following list explains the values that can appear in this column. If you want to make your queries as fast as possible, look out for Extra values of Using filesort and Using temporary.

Child of 'table' pushed join@1
This table is referenced as the child of table in a join that can be pushed down to the NDB kernel. Applies only in MySQL Cluster NDB 7.2 and later, when pushed-down joins are enabled. See the description of the ndb_join_pushdown server system variable for more information and examples.
const row not found
For a query such as SELECT ... FROM tbl_name, the table was empty.
Distinct
MySQL is looking for distinct values, so it stops searching for more rows for the current row combination after it has found the first matching row.
Full scan on NULL key
This occurs for subquery optimization as a fallback strategy when the optimizer cannot use an index-lookup access method.
Impossible HAVING
The HAVING clause is always false and cannot select any rows.
Impossible WHERE
The WHERE clause is always false and cannot select any rows.
Impossible WHERE noticed after reading const tables
MySQL has read all const (and system) tables and notice that the WHERE clause is always false.
No matching min/max row
No row satisfies the condition for a query such as SELECT MIN(...) FROM ... WHERE condition.
no matching row in const table
For a query with a join, there was an empty table or a table with no rows satisfying a unique index condition.
No tables used
The query has no FROM clause, or has a FROM DUAL clause.
Not exists
MySQL was able to do a LEFT JOIN optimization on the query and does not examine more rows in this table for the previous row combination after it finds one row that matches the LEFT JOIN criteria. Here is an example of the type of query that can be optimized this way:
```
SELECT * FROM t1 LEFT JOIN t2 ON t1.id=t2.id  WHERE t2.id IS NULL;
```
Assume that t2.id is defined as NOT NULL. In this case, MySQL scans t1 and looks up the rows in t2 using the values of t1.id. If MySQL finds a matching row in t2, it knows that t2.id can never be NULL, and does not scan through the rest of the rows in t2 that have the same id value. In other words, for each row in t1, MySQL needs to do only a single lookup in t2, regardless of how many rows actually match in t2.
Range checked for each record (index map: N)
MySQL found no good index to use, but found that some of indexes might be used after column values from preceding tables are known. For each row combination in the preceding tables, MySQL checks whether it is possible to use a range or index_merge access method to retrieve rows. This is not very fast, but is faster than performing a join with no index at all. The applicability criteria are as described in Section 8.13.1, "Range Optimization", and Section 8.13.2, "Index Merge Optimization", with the exception that all column values for the preceding table are known and considered to be constants.
Indexes are numbered beginning with 1, in the same order as shown by SHOW INDEX for the table. The index map value N is a bitmask value that indicates which indexes are candidates. For example, a value of 0x19 (binary 11001) means that indexes 1, 4, and 5 will be considered.
Scanned N databases
This indicates how many directory scans the server performs when processing a query for INFORMATION_SCHEMA tables, as described in Section 8.2.4, "Optimizing INFORMATION_SCHEMA Queries". The value of N can be 0, 1, or all.
Select tables optimized away
The query contained only aggregate functions (MIN(), MAX()) that were all resolved using an index, or COUNT(*) for MyISAM, and no GROUP BY clause. The optimizer determined that only one row should be returned.
Skip_open_table, Open_frm_only, Open_trigger_only, Open_full_table
These values indicate file-opening optimizations that apply to queries for INFORMATION_SCHEMA tables, as described in Section 8.2.4, "Optimizing INFORMATION_SCHEMA Queries".
- Skip_open_table: Table files do not need to be opened. The information has already become available within the query by scanning the database directory.
- Open_frm_only: Only the table's .frm file need be opened.
- Open_trigger_only: Only the table's .TRG file need be opened.
- Open_full_table: The unoptimized information lookup. The .frm, .MYD, and .MYI files must be opened.
unique row not found
For a query such as SELECT ... FROM tbl_name, no rows satisfy the condition for a UNIQUE index or PRIMARY KEY on the table.
Using filesort
MySQL must do an extra pass to find out how to retrieve the rows in sorted order. The sort is done by going through all rows according to the join type and storing the sort key and pointer to the row for all rows that match the WHERE clause. The keys then are sorted and the rows are retrieved in sorted order. See Section 8.13.9, "ORDER BY Optimization".
Using index
The column information is retrieved from the table using only information in the index tree without having to do an additional seek to read the actual row. This strategy can be used when the query uses only columns that are part of a single index.
If the Extra column also says Using where, it means the index is being used to perform lookups of key values. Without Using where, the optimizer may be reading the index to avoid reading data rows but not using it for lookups. For example, if the index is a covering index for the query, the optimizer may scan it without using it for lookups.
For InnoDB tables that have a user-defined clustered index, that index can be used even when Using index is absent from the Extra column. This is the case if type is index and key is PRIMARY.
Using index for group-by
Similar to the Using index table access method, Using index for group-by indicates that MySQL found an index that can be used to retrieve all columns of a GROUP BY or DISTINCT query without any extra disk access to the actual table. Additionally, the index is used in the most efficient way so that for each group, only a few index entries are read. For details, see Section 8.13.10, "GROUP BY Optimization".
Using join buffer
Tables from earlier joins are read in portions into the join buffer, and then their rows are used from the buffer to perform the join with the current table.
Using sort_union(...), Using union(...), Using intersect(...)
These indicate how index scans are merged for the index_merge join type. See Section 8.13.2, "Index Merge Optimization".
Using temporary
To resolve the query, MySQL needs to create a temporary table to hold the result. This typically happens if the query contains GROUP BY and ORDER BY clauses that list columns differently.
Using where
A WHERE clause is used to restrict which rows to match against the next table or send to the client. Unless you specifically intend to fetch or examine all rows from the table, you may have something wrong in your query if the Extra value is not Using where and the table join type is ALL or index. Even if you are using an index for all parts of a WHERE clause, you may see Using where if the column can be NULL.
Using where with pushed condition
This item applies to NDBCLUSTER tables only. It means that MySQL Cluster is using the Condition Pushdown optimization to improve the efficiency of a direct comparison between a nonindexed column and a constant. In such cases, the condition is "pushed down" to the cluster's data nodes and is evaluated on all data nodes simultaneously. This eliminates the need to send nonmatching rows over the network, and can speed up such queries by a factor of 5 to 10 times over cases where Condition Pushdown could be but is not used. For more information, see Section 8.13.3, "Engine Condition Pushdown Optimization".

EXPLAIN Output Interpretation

You can get a good indication of how good a join is by taking the product of the values in the rows column of the EXPLAIN output. This should tell you roughly how many rows MySQL must examine to execute the query. If you restrict queries with the max_join_size system variable, this row product also is used to determine which multiple-table SELECT statements to execute and which to abort. See Section 8.11.2, "Tuning Server Parameters".

The following example shows how a multiple-table join can be optimized progressively based on the information provided by EXPLAIN.

Suppose that you have the SELECT statement shown here and that you plan to examine it using EXPLAIN:

EXPLAIN SELECT tt.TicketNumber, tt.TimeIn,   tt.ProjectReference, tt.EstimatedShipDate,   tt.ActualShipDate, tt.ClientID,   tt.ServiceCodes, tt.RepetitiveID,   tt.CurrentProcess, tt.CurrentDPPerson,   tt.RecordVolume, tt.DPPrinted, et.COUNTRY,   et_1.COUNTRY, do.CUSTNAME FROM tt, et, et AS et_1, do WHERE tt.SubmitTime IS NULL  AND tt.ActualPC = et.EMPLOYID  AND tt.AssignedPC = et_1.EMPLOYID  AND tt.ClientID = do.CUSTNMBR;

For this example, make the following assumptions:

The columns being compared have been declared as follows.
Table Column Data Type
tt ActualPC CHAR(10)
tt AssignedPC CHAR(10)
tt ClientID CHAR(10)
et EMPLOYID CHAR(15)
do CUSTNMBR CHAR(15)
The tables have the following indexes.
Table Index
tt ActualPC
tt AssignedPC
tt ClientID
et EMPLOYID (primary key)
do CUSTNMBR (primary key)
The tt.ActualPC values are not evenly distributed.

Initially, before any optimizations have been performed, the EXPLAIN statement produces the following information:

table type possible_keys key  key_len ref  rows  Extraet ALL  PRIMARY   NULL NULL NULL 74do ALL  PRIMARY   NULL NULL NULL 2135et_1  ALL  PRIMARY   NULL NULL NULL 74tt ALL  AssignedPC,   NULL NULL NULL 3872   ClientID,   ActualPC  Range checked for each record (index map: 0x23)

Because type is ALL for each table, this output indicates that MySQL is generating a Cartesian product of all the tables; that is, every combination of rows. This takes quite a long time, because the product of the number of rows in each table must be examined. For the case at hand, this product is 74 � 2135 � 74 � 3872 = 45,268,558,720 rows. If the tables were bigger, you can only imagine how long it would take.

One problem here is that MySQL can use indexes on columns more efficiently if they are declared as the same type and size. In this context, VARCHAR and CHAR are considered the same if they are declared as the same size. tt.ActualPC is declared as CHAR(10) and et.EMPLOYID is CHAR(15), so there is a length mismatch.

To fix this disparity between column lengths, use ALTER TABLE to lengthen ActualPC from 10 characters to 15 characters:

mysql> ALTER TABLE tt MODIFY ActualPC VARCHAR(15);

Now tt.ActualPC and et.EMPLOYID are both VARCHAR(15). Executing the EXPLAIN statement again produces this result:

table type   possible_keys key key_len ref rows Extratt ALL AssignedPC,   NULL NULL NULL 3872 Using ClientID, where ActualPCdo ALL PRIMARY   NULL NULL NULL 2135  Range checked for each record (index map: 0x1)et_1  ALL PRIMARY   NULL NULL NULL 74  Range checked for each record (index map: 0x1)et eq_ref PRIMARY   PRIMARY 15  tt.ActualPC 1

This is not perfect, but is much better: The product of the rows values is less by a factor of 74. This version executes in a couple of seconds.

A second alteration can be made to eliminate the column length mismatches for the tt.AssignedPC = et_1.EMPLOYID and tt.ClientID = do.CUSTNMBR comparisons:

mysql> ALTER TABLE tt MODIFY AssignedPC VARCHAR(15), -> MODIFY ClientID   VARCHAR(15);

After that modification, EXPLAIN produces the output shown here:

table type   possible_keys key  key_len ref   rows Extraet ALL PRIMARY   NULL NULL NULL  74tt ref AssignedPC,   ActualPC 15  et.EMPLOYID   52   Using ClientID, where ActualPCet_1  eq_ref PRIMARY   PRIMARY  15  tt.AssignedPC 1do eq_ref PRIMARY   PRIMARY  15  tt.ClientID   1

At this point, the query is optimized almost as well as possible. The remaining problem is that, by default, MySQL assumes that values in the tt.ActualPC column are evenly distributed, and that is not the case for the tt table. Fortunately, it is easy to tell MySQL to analyze the key distribution:

mysql> ANALYZE TABLE tt;

With the additional index information, the join is perfect and EXPLAIN produces this result:

table type   possible_keys key key_len ref   rows Extratt ALL AssignedPC NULL NULL NULL  3872 Using ClientID, where ActualPCet eq_ref PRIMARY   PRIMARY 15  tt.ActualPC   1et_1  eq_ref PRIMARY   PRIMARY 15  tt.AssignedPC 1do eq_ref PRIMARY   PRIMARY 15  tt.ClientID   1

Note that the rows column in the output from EXPLAIN is an educated guess from the MySQL join optimizer. Check whether the numbers are even close to the truth by comparing the rows product with the actual number of rows that the query returns. If the numbers are quite different, you might get better performance by using STRAIGHT_JOIN in your SELECT statement and trying to list the tables in a different order in the FROM clause.

It is possible in some cases to execute statements that modify data when EXPLAIN SELECT is used with a subquery; for more information, see Section 13.2.10.8, "Subqueries in the FROM Clause".

8.8.3. EXPLAIN EXTENDED Output Format

When EXPLAIN is used with the EXTENDED keyword, the output includes a filtered column not otherwise displayed. This column indicates the estimated percentage of table rows that will be filtered by the table condition. In addition, the statement produces extra information that can be viewed by issuing a SHOW WARNINGS statement following the EXPLAIN statement. The Message value in SHOW WARNINGS output displays how the optimizer qualifies table and column names in the SELECT statement, what the SELECT looks like after the application of rewriting and optimization rules, and possibly other notes about the optimization process. Here is an example:

mysql> EXPLAIN EXTENDED -> SELECT t1.a, t1.a IN (SELECT t2.a FROM t2) FROM t1\G*************************** 1. row ***************************   id: 1  select_type: PRIMARY table: t1 type: indexpossible_keys: NULL  key: PRIMARY  key_len: 4  ref: NULL rows: 4 filtered: 100.00 Extra: Using index*************************** 2. row ***************************   id: 2  select_type: DEPENDENT SUBQUERY table: t2 type: index_subquerypossible_keys: a  key: a  key_len: 5  ref: func rows: 2 filtered: 100.00 Extra: Using index2 rows in set, 1 warning (0.00 sec)mysql> SHOW WARNINGS\G*************************** 1. row ***************************  Level: Note   Code: 1003Message: select `test`.`t1`.`a` AS `a`, <in_optimizer>(`test`.`t1`.`a`, <exists>(<index_lookup>(<cache>(`test`.`t1`.`a`) in t2 on a checking NULL having <is_not_null_test>(`test`.`t2`.`a`)))) AS `t1.a IN (SELECT t2.a FROM t2)` from `test`.`t1`1 row in set (0.00 sec)

Because the statement displayed by SHOW WARNINGS may contain special markers to provide information about query rewriting or optimizer actions, the statement is not necessarily valid SQL and is not intended to be executed. The output may also include rows with Message values that provide additional non-SQL explanatory notes about actions taken by the optimizer.

The following list describes special markers that can appear in EXTENDED output displayed by SHOW WARNINGS:

<cache>(expr)
The expression (such as a scalar subquery) is executed once and the resulting value is saved in memory for later use.
<exists>(query fragment)
The subquery predicate is converted to an EXISTS predicate and the subquery is transformed so that it can be used together with the EXISTS predicate.
<in_optimizer>(query fragment)
This is an internal optimizer object with no user significance.
<index_lookup>(query fragment)
The query fragment is processed using an index lookup to find qualifying rows.
<is_not_null_test>(expr)
A test to verify that the expression does not evaluate to NULL.
<primary_index_lookup>(query fragment)
The query fragment is processed using a primary key lookup to find qualifying rows.
<ref_null_helper>(expr)
This is an internal optimizer object with no user significance.

8.8.4. Estimating Query Performance

In most cases, you can estimate query performance by counting disk seeks. For small tables, you can usually find a row in one disk seek (because the index is probably cached). For bigger tables, you can estimate that, using B-tree indexes, you need this many seeks to find a row: log(row_count) / log(index_block_length / 3 * 2 / (index_length + data_pointer_length)) + 1.

In MySQL, an index block is usually 1,024 bytes and the data pointer is usually four bytes. For a 500,000-row table with a key value length of three bytes (the size of MEDIUMINT), the formula indicates log(500,000)/log(1024/3*2/(3+4)) + 1 = 4 seeks.

This index would require storage of about 500,000 * 7 * 3/2 = 5.2MB (assuming a typical index buffer fill ratio of 2/3), so you probably have much of the index in memory and so need only one or two calls to read data to find the row.

For writes, however, you need four seek requests to find where to place a new index value and normally two seeks to update the index and write the row.

Note that the preceding discussion does not mean that your application performance slowly degenerates by log N. As long as everything is cached by the OS or the MySQL server, things become only marginally slower as the table gets bigger. After the data gets too big to be cached, things start to go much slower until your applications are bound only by disk seeks (which increase by log N). To avoid this, increase the key cache size as the data grows. For MyISAM tables, the key cache size is controlled by the key_buffer_size system variable. See Section 8.11.2, "Tuning Server Parameters".

8.8.5. Controlling the Query Optimizer

8.8.5.1. Controlling Query Plan Evaluation
8.8.5.2. Controlling Switchable Optimizations

MySQL provides optimizer control through system variables that affect how query plans are evaluated and which switchable optimizations are enabled.

8.8.5.1. Controlling Query Plan Evaluation

The task of the query optimizer is to find an optimal plan for executing an SQL query. Because the difference in performance between "good" and "bad" plans can be orders of magnitude (that is, seconds versus hours or even days), most query optimizers, including that of MySQL, perform a more or less exhaustive search for an optimal plan among all possible query evaluation plans. For join queries, the number of possible plans investigated by the MySQL optimizer grows exponentially with the number of tables referenced in a query. For small numbers of tables (typically less than 7 to 10) this is not a problem. However, when larger queries are submitted, the time spent in query optimization may easily become the major bottleneck in the server's performance.

A more flexible method for query optimization enables the user to control how exhaustive the optimizer is in its search for an optimal query evaluation plan. The general idea is that the fewer plans that are investigated by the optimizer, the less time it spends in compiling a query. On the other hand, because the optimizer skips some plans, it may miss finding an optimal plan.

The behavior of the optimizer with respect to the number of plans it evaluates can be controlled using two system variables:

The optimizer_prune_level variable tells the optimizer to skip certain plans based on estimates of the number of rows accessed for each table. Our experience shows that this kind of "educated guess" rarely misses optimal plans, and may dramatically reduce query compilation times. That is why this option is on (optimizer_prune_level=1) by default. However, if you believe that the optimizer missed a better query plan, this option can be switched off (optimizer_prune_level=0) with the risk that query compilation may take much longer. Note that, even with the use of this heuristic, the optimizer still explores a roughly exponential number of plans.
The optimizer_search_depth variable tells how far into the "future" of each incomplete plan the optimizer should look to evaluate whether it should be expanded further. Smaller values of optimizer_search_depth may result in orders of magnitude smaller query compilation times. For example, queries with 12, 13, or more tables may easily require hours and even days to compile if optimizer_search_depth is close to the number of tables in the query. At the same time, if compiled with optimizer_search_depth equal to 3 or 4, the optimizer may compile in less than a minute for the same query. If you are unsure of what a reasonable value is for optimizer_search_depth, this variable can be set to 0 to tell the optimizer to determine the value automatically.

8.8.5.2. Controlling Switchable Optimizations

The optimizer_switch system variable enables control over optimizer behavior. Its value is a set of flags, each of which has a value of on or off to indicate whether the corresponding optimizer behavior is enabled or disabled. This variable has global and session values and can be changed at runtime. The global default can be set at server startup.

To see the current set of optimizer flags, select the variable value:

mysql> SELECT @@optimizer_switch\G*************************** 1. row ***************************@@optimizer_switch: index_merge=on,index_merge_union=on, index_merge_sort_union=on, index_merge_intersection=on, engine_condition_pushdown=on

To change the value of optimizer_switch, assign a value consisting of a comma-separated list of one or more commands:

SET [GLOBAL|SESSION] optimizer_switch='command[,command]...';

Each command value should have one of the forms shown in the following table.

Command Syntax	Meaning
default	Reset every optimization to its default value
opt_name=default	Set the named optimization to its default value
opt_name=off	Disable the named optimization
opt_name=on	Enable the named optimization

The order of the commands in the value does not matter, although the default command is executed first if present. Setting an opt_name flag to default sets it to whichever of on or off is its default value. Specifying any given opt_name more than once in the value is not permitted and causes an error. Any errors in the value cause the assignment to fail with an error, leaving the value of optimizer_switch unchanged.

The following table lists the permissible opt_name flag names, grouped by optimization strategy.

Optimization	Flag Name	Meaning
Engine Condition Pushdown	engine_condition_pushdown	Controls engine condition pushdown
Index Merge	index_merge	Controls all Index Merge optimizations
	index_merge_intersection	Controls the Index Merge Intersection Access optimization
	index_merge_sort_union	Controls the Index Merge Sort-Union Access optimization
	index_merge_union	Controls the Index Merge Union Access optimization

The flag for engine condition pushdown was added in MySQL 5.5.3.

For more information about individual optimization strategies, see the following sections:

When you assign a value to optimizer_switch, flags that are not mentioned keep their current values. This makes it possible to enable or disable specific optimizer behaviors in a single statement without affecting other behaviors. The statement does not depend on what other optimizer flags exist and what their values are. Suppose that all Index Merge optimizations are enabled:

mysql> SELECT @@optimizer_switch\G*************************** 1. row ***************************@@optimizer_switch: index_merge=on,index_merge_union=on, index_merge_sort_union=on, index_merge_intersection=on, engine_condition_pushdown=on

If the server is using the Index Merge Union or Index Merge Sort-Union access methods for certain queries and you want to check whether the optimizer will perform better without them, set the variable value like this:

mysql> SET optimizer_switch='index_merge_union=off,index_merge_sort_union=off';mysql> SELECT @@optimizer_switch\G*************************** 1. row ***************************@@optimizer_switch: index_merge=on,index_merge_union=off, index_merge_sort_union=off, index_merge_intersection=on, engine_condition_pushdown=on

(Sebelumnya) 8. Optimization

8.9. Buffering and Caching (Berikutnya)