Cari di MySQL 
    MySQL Manual
Daftar Isi
(Sebelumnya) 8.9. Buffering and Caching9. Language Structure (Berikutnya)

8.12. Measuring Performance (Benchmarking)

To measure performance, consider the following factors:

  • Whether you are measuring the speed of a single operation on a quiet system, or how a set of operations (a "workload") works over a period of time. With simple tests, you usually test how changing one aspect (a configuration setting, the set of indexes on a table, the SQL clauses in a query) affects performance. Benchmarks are typically long-running and elaborate performance tests, where the results could dictate high-level choices such as hardware and storage configuration, or how soon to upgrade to a new MySQL version.

  • For benchmarking, sometimes you must simulate a heavy database workload to get an accurate picture.

  • Performance can vary depending on so many different factors that a difference of a few percentage points might not be a decisive victory. The results might shift the opposite way when you test in a different environment.

  • Certain MySQL features help or do not help performance depending on the workload. For completeness, always test performance with those features turned on and turned off. The two most important features to try with each workload are the MySQL query cache, and the adaptive hash index for InnoDB tables.

This section progresses from simple and direct measurement techniques that a single developer can do, to more complicated ones that require additional expertise to perform and interpret the results.

8.12.1. Measuring the Speed of Expressions and Functions

To measure the speed of a specific MySQL expression or function, invoke the BENCHMARK() function using the mysql client program. Its syntax is BENCHMARK(loop_count,expression). The return value is always zero, but mysql prints a line displaying approximately how long the statement took to execute. For example:

mysql> SELECT BENCHMARK(1000000,1+1);+------------------------+| BENCHMARK(1000000,1+1) |+------------------------+|  0 |+------------------------+1 row in set (0.32 sec)

This result was obtained on a Pentium II 400MHz system. It shows that MySQL can execute 1,000,000 simple addition expressions in 0.32 seconds on that system.

The built-in MySQL functions are typically highly optimized, but there may be some exceptions. BENCHMARK() is an excellent tool for finding out if some function is a problem for your queries.

8.12.2. The MySQL Benchmark Suite

This benchmark suite is meant to tell any user what operations a given SQL implementation performs well or poorly. You can get a good idea for how the benchmarks work by looking at the code and results in the sql-bench directory in any MySQL source distribution.

Note that this benchmark is single-threaded, so it measures the minimum time for the operations performed. We plan to add multi-threaded tests to the benchmark suite in the future.

To use the benchmark suite, the following requirements must be satisfied:

After you obtain a MySQL source distribution, you can find the benchmark suite located in its sql-bench directory. To run the benchmark tests, build MySQL, and then change location into the sql-bench directory and execute the run-all-tests script:

shell> cd sql-benchshell> perl run-all-tests --server=server_name

server_name should be the name of one of the supported servers. To get a list of all options and supported servers, invoke this command:

shell> perl run-all-tests --help

The crash-me script also is located in the sql-bench directory. crash-me tries to determine what features a database system supports and what its capabilities and limitations are by actually running queries. For example, it determines:

  • What data types are supported.

  • How many indexes are supported.

  • What functions are supported.

  • How big a query can be.

  • How big a VARCHAR column can be.

For more information about benchmark results, visit http://www.mysql.com/why-mysql/benchmarks/.

8.12.3. Using Your Own Benchmarks

Benchmark your application and database to find out where the bottlenecks are. After fixing one bottleneck (or by replacing it with a "dummy" module), you can proceed to identify the next bottleneck. Even if the overall performance for your application currently is acceptable, you should at least make a plan for each bottleneck and decide how to solve it if someday you really need the extra performance.

For examples of portable benchmark programs, look at those in the MySQL benchmark suite. See Section 8.12.2, "The MySQL Benchmark Suite". You can take any program from this suite and modify it for your own needs. By doing this, you can try different solutions to your problem and test which really is fastest for you.

Another free benchmark suite is the Open Source Database Benchmark, available at http://osdb.sourceforge.net/.

It is very common for a problem to occur only when the system is very heavily loaded, even if other aspects of the system were tested extensively. These performance problems are typically due to issues of basic database design (for example, table scans are not good under high load) or problems with the operating system or libraries. These problems are much easier to fix if isolated during pre-production testing.

To avoid problems like this, benchmark your whole application under the worst possible load:

These programs or packages can bring a system to its knees, so be sure to use them only on your development systems.

8.12.4. Measuring Performance with performance_schema

You can query the tables in the performance_schema database to see real-time information about the performance characteristics of your server and the applications it is running. See Chapter 21, MySQL Performance Schema for details.

8.12.5. Examining Thread Information

As you monitor the performance of your MySQL server, examine the process list, which is the set of threads currently executing within the server. Process list information is available from these sources:

You can always view information about your own threads. To view information about threads being executed for other accounts, you must have the PROCESS privilege.

Each process list entry contains several pieces of information:

  • Id is the connection identifier for the client associated with the thread.

  • User and Host indicate the account associated with the thread.

  • db is the default database for the thread, or NULL if none is selected.

  • Command and State indicate what the thread is doing.

    Most states correspond to very quick operations. If a thread stays in a given state for many seconds, there might be a problem that needs to be investigated.

  • Time indicates how long the thread has been in its current state. The thread's notion of the current time may be altered in some cases: The thread can change the time with SET TIMESTAMP = value. For a thread running on a slave that is processing events from the master, the thread time is set to the time found in the events and thus reflects current time on the master and not the slave.

  • Info contains the text of the statement being executed by the thread, or NULL if it is not executing one. By default, this value contains only the first 100 characters of the statement. To see the complete statements, use SHOW FULL PROCESSLIST.

The following sections list the possible Command values, and State values grouped by category. The meaning for some of these values is self-evident. For others, additional description is provided.

8.12.5.1. Thread Command Values

A thread can have any of the following Command values:

  • Binlog Dump

    This is a thread on a master server for sending binary log contents to a slave server.

  • Change user

    The thread is executing a change-user operation.

  • Close stmt

    The thread is closing a prepared statement.

  • Connect

    A replication slave is connected to its master.

  • Connect Out

    A replication slave is connecting to its master.

  • Create DB

    The thread is executing a create-database operation.

  • Daemon

    This thread is internal to the server, not a thread that services a client connection.

  • Debug

    The thread is generating debugging information.

  • Delayed insert

    The thread is a delayed-insert handler.

  • Drop DB

    The thread is executing a drop-database operation.

  • Error

  • Execute

    The thread is executing a prepared statement.

  • Fetch

    The thread is fetching the results from executing a prepared statement.

  • Field List

    The thread is retrieving information for table columns.

  • Init DB

    The thread is selecting a default database.

  • Kill

    The thread is killing another thread.

  • Long Data

    The thread is retrieving long data in the result of executing a prepared statement.

  • Ping

    The thread is handling a server-ping request.

  • Prepare

    The thread is preparing a prepared statement.

  • Processlist

    The thread is producing information about server threads.

  • Query

    The thread is executing a statement.

  • Quit

    The thread is terminating.

  • Refresh

    The thread is flushing table, logs, or caches, or resetting status variable or replication server information.

  • Register Slave

    The thread is registering a slave server.

  • Reset stmt

    The thread is resetting a prepared statement.

  • Set option

    The thread is setting or resetting a client statement-execution option.

  • Shutdown

    The thread is shutting down the server.

  • Sleep

    The thread is waiting for the client to send a new statement to it.

  • Statistics

    The thread is producing server-status information.

  • Table Dump

    The thread is sending table contents to a slave server.

  • Time

    Unused.

8.12.5.2. General Thread States

The following list describes thread State values that are associated with general query processing and not more specialized activities such as replication. Many of these are useful only for finding bugs in the server.

  • After create

    This occurs when the thread creates a table (including internal temporary tables), at the end of the function that creates the table. This state is used even if the table could not be created due to some error.

  • Analyzing

    The thread is calculating a MyISAM table key distributions (for example, for ANALYZE TABLE).

  • checking permissions

    The thread is checking whether the server has the required privileges to execute the statement.

  • Checking table

    The thread is performing a table check operation.

  • cleaning up

    The thread has processed one command and is preparing to free memory and reset certain state variables.

  • closing tables

    The thread is flushing the changed table data to disk and closing the used tables. This should be a fast operation. If not, verify that you do not have a full disk and that the disk is not in very heavy use.

  • converting HEAP to MyISAM

    The thread is converting an internal temporary table from a MEMORY table to an on-disk MyISAM table.

  • copy to tmp table

    The thread is processing an ALTER TABLE statement. This state occurs after the table with the new structure has been created but before rows are copied into it.

  • Copying to group table

    If a statement has different ORDER BY and GROUP BY criteria, the rows are sorted by group and copied to a temporary table.

  • Copying to tmp table

    The server is copying to a temporary table in memory.

  • Copying to tmp table on disk

    The server is copying to a temporary table on disk. The temporary result set has become too large (see Section 8.4.3.3, "How MySQL Uses Internal Temporary Tables"). Consequently, the thread is changing the temporary table from in-memory to disk-based format to save memory.

  • Creating index

    The thread is processing ALTER TABLE ... ENABLE KEYS for a MyISAM table.

  • Creating sort index

    The thread is processing a SELECT that is resolved using an internal temporary table.

  • creating table

    The thread is creating a table. This includes creation of temporary tables.

  • Creating tmp table

    The thread is creating a temporary table in memory or on disk. If the table is created in memory but later is converted to an on-disk table, the state during that operation will be Copying to tmp table on disk.

  • deleting from main table

    The server is executing the first part of a multiple-table delete. It is deleting only from the first table, and saving columns and offsets to be used for deleting from the other (reference) tables.

  • deleting from reference tables

    The server is executing the second part of a multiple-table delete and deleting the matched rows from the other tables.

  • discard_or_import_tablespace

    The thread is processing an ALTER TABLE ... DISCARD TABLESPACE or ALTER TABLE ... IMPORT TABLESPACE statement.

  • end

    This occurs at the end but before the cleanup of ALTER TABLE, CREATE VIEW, DELETE, INSERT, SELECT, or UPDATE statements.

  • executing

    The thread has begun executing a statement.

  • Execution of init_command

    The thread is executing statements in the value of the init_command system variable.

  • freeing items

    The thread has executed a command. Some freeing of items done during this state involves the query cache. This state is usually followed by cleaning up.

  • Flushing tables

    The thread is executing FLUSH TABLES and is waiting for all threads to close their tables.

  • FULLTEXT initialization

    The server is preparing to perform a natural-language full-text search.

  • init

    This occurs before the initialization of ALTER TABLE, DELETE, INSERT, SELECT, or UPDATE statements. Actions taken by the server in this state include flushing the binary log, the InnoDB log, and some query cache cleanup operations.

    For the end state, the following operations could be happening:

    • Removing query cache entries after data in a table is changed

    • Writing an event to the binary log

    • Freeing memory buffers, including for blobs

  • Killed

    Someone has sent a KILL statement to the thread and it should abort next time it checks the kill flag. The flag is checked in each major loop in MySQL, but in some cases it might still take a short time for the thread to die. If the thread is locked by some other thread, the kill takes effect as soon as the other thread releases its lock.

  • Locked

    The query is locked by another query.

    As of MySQL 5.5.3, this state was removed because it was equivalent to the Table lock state and no longer appears in SHOW PROCESSLIST output.

  • logging slow query

    The thread is writing a statement to the slow-query log.

  • NULL

    This state is used for the SHOW PROCESSLIST state.

  • login

    The initial state for a connection thread until the client has been authenticated successfully.

  • manage keys

    The server is enabling or disabling a table index.

  • Opening tables, Opening table

    The thread is trying to open a table. This is should be very fast procedure, unless something prevents opening. For example, an ALTER TABLE or a LOCK TABLE statement can prevent opening a table until the statement is finished. It is also worth checking that your table_open_cache value is large enough.

  • optimizing

    The server is performing initial optimizations for a query.

  • preparing

    This state occurs during query optimization.

  • Purging old relay logs

    The thread is removing unneeded relay log files.

  • query end

    This state occurs after processing a query but before the freeing items state.

  • Reading from net

    The server is reading a packet from the network.

  • Removing duplicates

    The query was using SELECT DISTINCT in such a way that MySQL could not optimize away the distinct operation at an early stage. Because of this, MySQL requires an extra stage to remove all duplicated rows before sending the result to the client.

  • removing tmp table

    The thread is removing an internal temporary table after processing a SELECT statement. This state is not used if no temporary table was created.

  • rename

    The thread is renaming a table.

  • rename result table

    The thread is processing an ALTER TABLE statement, has created the new table, and is renaming it to replace the original table.

  • Reopen tables

    The thread got a lock for the table, but noticed after getting the lock that the underlying table structure changed. It has freed the lock, closed the table, and is trying to reopen it.

  • Repair by sorting

    The repair code is using a sort to create indexes.

  • Repair done

    The thread has completed a multi-threaded repair for a MyISAM table.

  • Repair with keycache

    The repair code is using creating keys one by one through the key cache. This is much slower than Repair by sorting.

  • Rolling back

    The thread is rolling back a transaction.

  • Saving state

    For MyISAM table operations such as repair or analysis, the thread is saving the new table state to the .MYI file header. State includes information such as number of rows, the AUTO_INCREMENT counter, and key distributions.

  • Searching rows for update

    The thread is doing a first phase to find all matching rows before updating them. This has to be done if the UPDATE is changing the index that is used to find the involved rows.

  • Sending data

    The thread is reading and processing rows for a SELECT statement, and sending data to the client. Because operations occurring during this this state tend to perform large amounts of disk access (reads), it is often the longest-running state over the lifetime of a given query.

  • setup

    The thread is beginning an ALTER TABLE operation.

  • Sorting for group

    The thread is doing a sort to satisfy a GROUP BY.

  • Sorting for order

    The thread is doing a sort to satisfy a ORDER BY.

  • Sorting index

    The thread is sorting index pages for more efficient access during a MyISAM table optimization operation.

  • Sorting result

    For a SELECT statement, this is similar to Creating sort index, but for nontemporary tables.

  • statistics

    The server is calculating statistics to develop a query execution plan. If a thread is in this state for a long time, the server is probably disk-bound performing other work.

  • System lock

    The thread is going to request or is waiting for an internal or external system lock for the table. If this state is being caused by requests for external locks and you are not using multiple mysqld servers that are accessing the same MyISAM tables, you can disable external system locks with the --skip-external-locking option. However, external locking is disabled by default, so it is likely that this option will have no effect. For SHOW PROFILE, this state means the thread is requesting the lock (not waiting for it).

  • Table lock

    The next thread state after System lock. The thread has acquired an external lock and is going to request an internal table lock.

    This state was replaced in MySQL 5.5.6 with Waiting for table level lock.

  • Updating

    The thread is searching for rows to update and is updating them.

  • updating main table

    The server is executing the first part of a multiple-table update. It is updating only the first table, and saving columns and offsets to be used for updating the other (reference) tables.

  • updating reference tables

    The server is executing the second part of a multiple-table update and updating the matched rows from the other tables.

  • User lock

    The thread is going to request or is waiting for an advisory lock requested with a GET_LOCK() call. For SHOW PROFILE, this state means the thread is requesting the lock (not waiting for it).

  • User sleep

    The thread has invoked a SLEEP() call.

  • Waiting for all running commits to finish

    A statement that causes an explicit or implicit commit is waiting for release of a read lock. This state was removed in MySQL 5.5.8; Waiting for commit lock is used instead.

  • Waiting for commit lock

    A statement that causes an explicit or implicit commit is waiting for release of a read lock or FLUSH TABLES WITH READ LOCK) is waiting for a commit lock. This state was added in MySQL 5.5.8.

  • Waiting for global read lock

    FLUSH TABLES WITH READ LOCK) is waiting for a global read lock.

  • Waiting for release of readlock

    The thread is waiting for a global read lock obtained by another thread (with FLUSH TABLES WITH READ LOCK) to be released. This state was removed in MySQL 5.5.8; Waiting for global read lock or Waiting for commit lock are used instead.

  • Waiting for tables, Waiting for table, Waiting for table flush

    The thread got a notification that the underlying structure for a table has changed and it needs to reopen the table to get the new structure. However, to reopen the table, it must wait until all other threads have closed the table in question.

    This notification takes place if another thread has used FLUSH TABLES or one of the following statements on the table in question: FLUSH TABLES tbl_name, ALTER TABLE, RENAME TABLE, REPAIR TABLE, ANALYZE TABLE, or OPTIMIZE TABLE.

    In MySQL 5.5.6, Waiting for table was replaced with Waiting for table flush.

  • Waiting for lock_type lock

    The server is waiting to acquire a lock, where lock_type indicates the type of lock:

    • Waiting for event metadata lock (added in MySQL 5.5.8)

    • Waiting for global metadata lock (replaced by Waiting for global read lock in MySQL 5.5.8)

    • Waiting for global read lock (added in MySQL 5.5.8)

    • Waiting for schema metadata lock

    • Waiting for stored function metadata lock

    • Waiting for stored procedure metadata lock

    • Waiting for table level lock

    • Waiting for table metadata lock

    • Waiting for trigger metadata lock (added in MySQL 5.5.8)

  • Waiting on cond

    A generic state in which the thread is waiting for a condition to become true. No specific state information is available.

  • Waiting to get readlock

    The thread has issued a FLUSH TABLES WITH READ LOCK statement to obtain a global read lock and is waiting to obtain the lock. This state was removed in MySQL 5.5.8; Waiting for global read lock is used instead.

  • Writing to net

    The server is writing a packet to the network.

8.12.5.3. Delayed-Insert Thread States

These thread states are associated with processing for DELAYED inserts (see Section 13.2.5.2, "INSERT DELAYED Syntax"). Some states are associated with connection threads that process INSERT DELAYED statements from clients. Other states are associated with delayed-insert handler threads that insert the rows. There is a delayed-insert handler thread for each table for which INSERT DELAYED statements are issued.

States associated with a connection thread that processes an INSERT DELAYED statement from the client:

  • allocating local table

    The thread is preparing to feed rows to the delayed-insert handler thread.

  • Creating delayed handler

    The thread is creating a handler for DELAYED inserts.

  • got handler lock

    This occurs before the allocating local table state and after the waiting for handler lock state, when the connection thread gets access to the delayed-insert handler thread.

  • got old table

    This occurs after the waiting for handler open state. The delayed-insert handler thread has signaled that it has ended its initialization phase, which includes opening the table for delayed inserts.

  • storing row into queue

    The thread is adding a new row to the list of rows that the delayed-insert handler thread must insert.

  • update

    The thread is getting ready to start updating the table.

  • waiting for delay_list

    This occurs during the initialization phase when the thread is trying to find the delayed-insert handler thread for the table, and before attempting to gain access to the list of delayed-insert threads.

  • waiting for handler insert

    An INSERT DELAYED handler has processed all pending inserts and is waiting for new ones.

  • waiting for handler lock

    This occurs before the allocating local table state when the connection thread waits for access to the delayed-insert handler thread.

  • waiting for handler open

    This occurs after the Creating delayed handler state and before the got old table state. The delayed-insert handler thread has just been started, and the connection thread is waiting for it to initialize.

States associated with a delayed-insert handler thread that inserts the rows:

  • insert

    The state that occurs just before inserting rows into the table.

  • reschedule

    After inserting a number of rows, the delayed-insert thread sleeps to let other threads do work.

  • upgrading lock

    A delayed-insert handler is trying to get a lock for the table to insert rows.

  • Waiting for INSERT

    A delayed-insert handler is waiting for a connection thread to add rows to the queue (see storing row into queue).

8.12.5.4. Query Cache Thread States

These thread states are associated with the query cache (see Section 8.9.3, "The MySQL Query Cache").

  • checking privileges on cached query

    The server is checking whether the user has privileges to access a cached query result.

  • checking query cache for query

    The server is checking whether the current query is present in the query cache.

  • invalidating query cache entries

    Query cache entries are being marked invalid because the underlying tables have changed.

  • sending cached result to client

    The server is taking the result of a query from the query cache and sending it to the client.

  • storing result in query cache

    The server is storing the result of a query in the query cache.

  • Waiting for query cache lock

    This state occurs while a session is waiting to take the query cache lock. This can happen for any statement that needs to perform some query cache operation, such as an INSERT or DELETE that invalidates the query cache, a SELECT that looks for a cached entry, RESET QUERY CACHE, and so forth.

8.12.5.5. Replication Master Thread States

The following list shows the most common states you may see in the State column for the master's Binlog Dump thread. If you see no Binlog Dump threads on a master server, this means that replication is not running-that is, that no slaves are currently connected.

  • Sending binlog event to slave

    Binary logs consist of events, where an event is usually an update plus some other information. The thread has read an event from the binary log and is now sending it to the slave.

  • Finished reading one binlog; switching to next binlog

    The thread has finished reading a binary log file and is opening the next one to send to the slave.

  • Master has sent all binlog to slave; waiting for binlog to be updated

    The thread has read all outstanding updates from the binary logs and sent them to the slave. The thread is now idle, waiting for new events to appear in the binary log resulting from new updates occurring on the master.

  • Waiting to finalize termination

    A very brief state that occurs as the thread is stopping.

8.12.5.6. Replication Slave I/O Thread States

The following list shows the most common states you see in the State column for a slave server I/O thread. This state also appears in the Slave_IO_State column displayed by SHOW SLAVE STATUS, so you can get a good view of what is happening by using that statement.

  • Waiting for master update

    The initial state before Connecting to master.

  • Connecting to master

    The thread is attempting to connect to the master.

  • Checking master version

    A state that occurs very briefly, after the connection to the master is established.

  • Registering slave on master

    A state that occurs very briefly after the connection to the master is established.

  • Requesting binlog dump

    A state that occurs very briefly, after the connection to the master is established. The thread sends to the master a request for the contents of its binary logs, starting from the requested binary log file name and position.

  • Waiting to reconnect after a failed binlog dump request

    If the binary log dump request failed (due to disconnection), the thread goes into this state while it sleeps, then tries to reconnect periodically. The interval between retries can be specified using the CHANGE MASTER TO statement.

  • Reconnecting after a failed binlog dump request

    The thread is trying to reconnect to the master.

  • Waiting for master to send event

    The thread has connected to the master and is waiting for binary log events to arrive. This can last for a long time if the master is idle. If the wait lasts for slave_net_timeout seconds, a timeout occurs. At that point, the thread considers the connection to be broken and makes an attempt to reconnect.

  • Queueing master event to the relay log

    The thread has read an event and is copying it to the relay log so that the SQL thread can process it.

  • Waiting to reconnect after a failed master event read

    An error occurred while reading (due to disconnection). The thread is sleeping for the number of seconds set by the CHANGE MASTER TO statement (default 60) before attempting to reconnect.

  • Reconnecting after a failed master event read

    The thread is trying to reconnect to the master. When connection is established again, the state becomes Waiting for master to send event.

  • Waiting for the slave SQL thread to free enough relay log space

    You are using a nonzero relay_log_space_limit value, and the relay logs have grown large enough that their combined size exceeds this value. The I/O thread is waiting until the SQL thread frees enough space by processing relay log contents so that it can delete some relay log files.

  • Waiting for slave mutex on exit

    A state that occurs briefly as the thread is stopping.

8.12.5.7. Replication Slave SQL Thread States

The following list shows the most common states you may see in the State column for a slave server SQL thread:

  • Waiting for the next event in relay log

    The initial state before Reading event from the relay log.

  • Reading event from the relay log

    The thread has read an event from the relay log so that the event can be processed.

  • Making temp file

    The thread is executing a LOAD DATA INFILE statement and is creating a temporary file containing the data from which the slave will read rows.

  • Slave has read all relay log; waiting for the slave I/O thread to update it

    The thread has processed all events in the relay log files, and is now waiting for the I/O thread to write new events to the relay log.

  • Waiting for slave mutex on exit

    A very brief state that occurs as the thread is stopping.

The State column for the I/O thread may also show the text of a statement. This indicates that the thread has read an event from the relay log, extracted the statement from it, and is executing it.

8.12.5.8. Replication Slave Connection Thread States

These thread states occur on a replication slave but are associated with connection threads, not with the I/O or SQL threads.

  • Changing master

    The thread is processing a CHANGE MASTER TO statement.

  • Killing slave

    The thread is processing a STOP SLAVE statement.

  • Opening master dump table

    This state occurs after Creating table from master dump.

  • Reading master dump table data

    This state occurs after Opening master dump table.

  • Rebuilding the index on master dump table

    This state occurs after Reading master dump table data.

8.12.5.9. MySQL Cluster Thread States

  • Committing events to binlog

  • Opening mysql.ndb_apply_status

  • Processing events

    The thread is processing events for binary logging.

  • Processing events from schema table

    The thread is doing the work of schema replication.

  • Shutting down

  • Syncing ndb table schema operation and binlog

    This is used to have a correct binary log of schema operations for NDB.

  • Waiting for event from ndbcluster

    The server is acting as an SQL node in a MySQL Cluster, and is connected to a cluster management node.

  • Waiting for first event from ndbcluster

  • Waiting for ndbcluster binlog update to reach current position

  • Waiting for ndbcluster to start

  • Waiting for schema epoch

    The thread is waiting for a schema epoch (that is, a global checkpoint).

8.12.5.10. Event Scheduler Thread States

These states occur for the Event Scheduler thread, threads that are created to execute scheduled events, or threads that terminate the scheduler.

  • Clearing

    The scheduler thread or a thread that was executing an event is terminating and is about to end.

  • Initialized

    The scheduler thread or a thread that will execute an event has been initialized.

  • Waiting for next activation

    The scheduler has a nonempty event queue but the next activation is in the future.

  • Waiting for scheduler to stop

    The thread issued SET GLOBAL event_scheduler=OFF and is waiting for the scheduler to stop.

  • Waiting on empty queue

    The scheduler's event queue is empty and it is sleeping.

8.13. Internal Details of MySQL Optimizations

This background information helps you to understand some of the terms you see in the EXPLAIN plan output. If you are rewriting queries to make them more efficient, or writing your own application logic for lookups, joining, or sorting, use this information to determine which optimizations you write yourself and which you can rely on MySQL to perform.

8.13.1. Range Optimization

The range access method uses a single index to retrieve a subset of table rows that are contained within one or several index value intervals. It can be used for a single-part or multiple-part index. The following sections give a detailed description of how intervals are extracted from the WHERE clause.

8.13.1.1. The Range Access Method for Single-Part Indexes

For a single-part index, index value intervals can be conveniently represented by corresponding conditions in the WHERE clause, so we speak of range conditions rather than "intervals."

The definition of a range condition for a single-part index is as follows:

  • For both BTREE and HASH indexes, comparison of a key part with a constant value is a range condition when using the =, <=>, IN(), IS NULL, or IS NOT NULL operators.

  • Additionally, for BTREE indexes, comparison of a key part with a constant value is a range condition when using the >, <, >=, <=, BETWEEN, !=, or <> operators, or LIKE comparisons if the argument to LIKE is a constant string that does not start with a wildcard character.

  • For all types of indexes, multiple range conditions combined with OR or AND form a range condition.

"Constant value" in the preceding descriptions means one of the following:

  • A constant from the query string

  • A column of a const or system table from the same join

  • The result of an uncorrelated subquery

  • Any expression composed entirely from subexpressions of the preceding types

Here are some examples of queries with range conditions in the WHERE clause:

SELECT * FROM t1  WHERE key_col > 1  AND key_col < 10;SELECT * FROM t1  WHERE key_col = 1  OR key_col IN (15,18,20);SELECT * FROM t1  WHERE key_col LIKE 'ab%'  OR key_col BETWEEN 'bar' AND 'foo';

Note that some nonconstant values may be converted to constants during the constant propagation phase.

MySQL tries to extract range conditions from the WHERE clause for each of the possible indexes. During the extraction process, conditions that cannot be used for constructing the range condition are dropped, conditions that produce overlapping ranges are combined, and conditions that produce empty ranges are removed.

Consider the following statement, where key1 is an indexed column and nonkey is not indexed:

SELECT * FROM t1 WHERE  (key1 < 'abc' AND (key1 LIKE 'abcde%' OR key1 LIKE '%b')) OR  (key1 < 'bar' AND nonkey = 4) OR  (key1 < 'uux' AND key1 > 'z');

The extraction process for key key1 is as follows:

  1. Start with original WHERE clause:

    (key1 < 'abc' AND (key1 LIKE 'abcde%' OR key1 LIKE '%b')) OR(key1 < 'bar' AND nonkey = 4) OR(key1 < 'uux' AND key1 > 'z')
  2. Remove nonkey = 4 and key1 LIKE '%b' because they cannot be used for a range scan. The correct way to remove them is to replace them with TRUE, so that we do not miss any matching rows when doing the range scan. Having replaced them with TRUE, we get:

    (key1 < 'abc' AND (key1 LIKE 'abcde%' OR TRUE)) OR(key1 < 'bar' AND TRUE) OR(key1 < 'uux' AND key1 > 'z')
  3. Collapse conditions that are always true or false:

    • (key1 LIKE 'abcde%' OR TRUE) is always true

    • (key1 < 'uux' AND key1 > 'z') is always false

    Replacing these conditions with constants, we get:

    (key1 < 'abc' AND TRUE) OR (key1 < 'bar' AND TRUE) OR (FALSE)

    Removing unnecessary TRUE and FALSE constants, we obtain:

    (key1 < 'abc') OR (key1 < 'bar')
  4. Combining overlapping intervals into one yields the final condition to be used for the range scan:

    (key1 < 'bar')

In general (and as demonstrated by the preceding example), the condition used for a range scan is less restrictive than the WHERE clause. MySQL performs an additional check to filter out rows that satisfy the range condition but not the full WHERE clause.

The range condition extraction algorithm can handle nested AND/OR constructs of arbitrary depth, and its output does not depend on the order in which conditions appear in WHERE clause.

Currently, MySQL does not support merging multiple ranges for the range access method for spatial indexes. To work around this limitation, you can use a UNION with identical SELECT statements, except that you put each spatial predicate in a different SELECT.

8.13.1.2. The Range Access Method for Multiple-Part Indexes

Range conditions on a multiple-part index are an extension of range conditions for a single-part index. A range condition on a multiple-part index restricts index rows to lie within one or several key tuple intervals. Key tuple intervals are defined over a set of key tuples, using ordering from the index.

For example, consider a multiple-part index defined as key1(key_part1, key_part2, key_part3), and the following set of key tuples listed in key order:

key_part1  key_part2  key_part3  NULL   1  'abc'  NULL   1  'xyz'  NULL   2  'foo'   1 1  'abc'   1 1  'xyz'   1 2  'abc'   2 1  'aaa'

The condition key_part1 = 1 defines this interval:

(1,-inf,-inf) <= (key_part1,key_part2,key_part3) < (1,+inf,+inf)

The interval covers the 4th, 5th, and 6th tuples in the preceding data set and can be used by the range access method.

By contrast, the condition key_part3 = 'abc' does not define a single interval and cannot be used by the range access method.

The following descriptions indicate how range conditions work for multiple-part indexes in greater detail.

  • For HASH indexes, each interval containing identical values can be used. This means that the interval can be produced only for conditions in the following form:

     key_part1 cmp const1AND key_part2 cmp const2AND ...AND key_partN cmp constN;

    Here, const1, const2, � are constants, cmp is one of the =, <=>, or IS NULL comparison operators, and the conditions cover all index parts. (That is, there are N conditions, one for each part of an N-part index.) For example, the following is a range condition for a three-part HASH index:

    key_part1 = 1 AND key_part2 IS NULL AND key_part3 = 'foo'

    For the definition of what is considered to be a constant, see Section 8.13.1.1, "The Range Access Method for Single-Part Indexes".

  • For a BTREE index, an interval might be usable for conditions combined with AND, where each condition compares a key part with a constant value using =, <=>, IS NULL, >, <, >=, <=, !=, <>, BETWEEN, or LIKE 'pattern' (where 'pattern' does not start with a wildcard). An interval can be used as long as it is possible to determine a single key tuple containing all rows that match the condition (or two intervals if <> or != is used).

    The optimizer attempts to use additional key parts to determine the interval as long as the comparison operator is =, <=>, or IS NULL. If the operator is >, <, >=, <=, !=, <>, BETWEEN, or LIKE, the optimizer uses it but considers no more key parts. For the following expression, the optimizer uses = from the first comparison. It also uses >= from the second comparison but considers no further key parts and does not use the third comparison for interval construction:

    key_part1 = 'foo' AND key_part2 >= 10 AND key_part3 > 10

    The single interval is:

    ('foo',10,-inf) < (key_part1,key_part2,key_part3) < ('foo',+inf,+inf)

    It is possible that the created interval contains more rows than the initial condition. For example, the preceding interval includes the value ('foo', 11, 0), which does not satisfy the original condition.

  • If conditions that cover sets of rows contained within intervals are combined with OR, they form a condition that covers a set of rows contained within the union of their intervals. If the conditions are combined with AND, they form a condition that covers a set of rows contained within the intersection of their intervals. For example, for this condition on a two-part index:

    (key_part1 = 1 AND key_part2 < 2) OR (key_part1 > 5)

    The intervals are:

    (1,-inf) < (key_part1,key_part2) < (1,2)(5,-inf) < (key_part1,key_part2)

    In this example, the interval on the first line uses one key part for the left bound and two key parts for the right bound. The interval on the second line uses only one key part. The key_len column in the EXPLAIN output indicates the maximum length of the key prefix used.

    In some cases, key_len may indicate that a key part was used, but that might be not what you would expect. Suppose that key_part1 and key_part2 can be NULL. Then the key_len column displays two key part lengths for the following condition:

    key_part1 >= 1 AND key_part2 < 2

    But, in fact, the condition is converted to this:

    key_part1 >= 1 AND key_part2 IS NOT NULL

Section 8.13.1.1, "The Range Access Method for Single-Part Indexes", describes how optimizations are performed to combine or eliminate intervals for range conditions on a single-part index. Analogous steps are performed for range conditions on multiple-part indexes.

8.13.2. Index Merge Optimization

The Index Merge method is used to retrieve rows with several range scans and to merge their results into one. The merge can produce unions, intersections, or unions-of-intersections of its underlying scans. This access method merges index scans from a single table; it does not merge scans across multiple tables.

In EXPLAIN output, the Index Merge method appears as index_merge in the type column. In this case, the key column contains a list of indexes used, and key_len contains a list of the longest key parts for those indexes.

Examples:

SELECT * FROM tbl_name WHERE key1 = 10 OR key2 = 20;SELECT * FROM tbl_name  WHERE (key1 = 10 OR key2 = 20) AND non_key=30;SELECT * FROM t1, t2  WHERE (t1.key1 IN (1,2) OR t1.key2 LIKE 'value%')  AND t2.key1=t1.some_col;SELECT * FROM t1, t2  WHERE t1.key1=1  AND (t2.key1=t1.some_col OR t2.key2=t1.some_col2);

The Index Merge method has several access algorithms (seen in the Extra field of EXPLAIN output):

  • Using intersect(...)

  • Using union(...)

  • Using sort_union(...)

The following sections describe these methods in greater detail.

Note

The Index Merge optimization algorithm has the following known deficiencies:

  • If your query has a complex WHERE clause with deep AND/OR nesting and MySQL doesn't choose the optimal plan, try distributing terms using the following identity laws:

    (x AND y) OR z = (x OR z) AND (y OR z)(x OR y) AND z = (x AND z) OR (y AND z)
  • Index Merge is not applicable to full-text indexes. We plan to extend it to cover these in a future MySQL release.

  • If a range scan is possible on some key, the optimizer will not consider using Index Merge Union or Index Merge Sort-Union algorithms. For example, consider this query:

    SELECT * FROM t1 WHERE (goodkey1 < 10 OR goodkey2 < 20) AND badkey < 30;

    For this query, two plans are possible:

    • An Index Merge scan using the (goodkey1 < 10 OR goodkey2 < 20) condition.

    • A range scan using the badkey < 30 condition.

    However, the optimizer considers only the second plan.

The choice between different possible variants of the Index Merge access method and other access methods is based on cost estimates of various available options.

8.13.2.1. The Index Merge Intersection Access Algorithm

This access algorithm can be employed when a WHERE clause was converted to several range conditions on different keys combined with AND, and each condition is one of the following:

  • In this form, where the index has exactly N parts (that is, all index parts are covered):

    key_part1=const1 AND key_part2=const2 ... AND key_partN=constN
  • Any range condition over a primary key of an InnoDB table.

Examples:

SELECT * FROM innodb_table WHERE primary_key < 10 AND key_col1=20;SELECT * FROM tbl_name  WHERE (key1_part1=1 AND key1_part2=2) AND key2=2;

The Index Merge intersection algorithm performs simultaneous scans on all used indexes and produces the intersection of row sequences that it receives from the merged index scans.

If all columns used in the query are covered by the used indexes, full table rows are not retrieved (EXPLAIN output contains Using index in Extra field in this case). Here is an example of such a query:

SELECT COUNT(*) FROM t1 WHERE key1=1 AND key2=1;

If the used indexes don't cover all columns used in the query, full rows are retrieved only when the range conditions for all used keys are satisfied.

If one of the merged conditions is a condition over a primary key of an InnoDB table, it is not used for row retrieval, but is used to filter out rows retrieved using other conditions.

8.13.2.2. The Index Merge Union Access Algorithm

The applicability criteria for this algorithm are similar to those for the Index Merge method intersection algorithm. The algorithm can be employed when the table's WHERE clause was converted to several range conditions on different keys combined with OR, and each condition is one of the following:

  • In this form, where the index has exactly N parts (that is, all index parts are covered):

    key_part1=const1 AND key_part2=const2 ... AND key_partN=constN
  • Any range condition over a primary key of an InnoDB table.

  • A condition for which the Index Merge method intersection algorithm is applicable.

Examples:

SELECT * FROM t1 WHERE key1=1 OR key2=2 OR key3=3;SELECT * FROM innodb_table WHERE (key1=1 AND key2=2) OR  (key3='foo' AND key4='bar') AND key5=5;

8.13.2.3. The Index Merge Sort-Union Access Algorithm

This access algorithm is employed when the WHERE clause was converted to several range conditions combined by OR, but for which the Index Merge method union algorithm is not applicable.

Examples:

SELECT * FROM tbl_name WHERE key_col1 < 10 OR key_col2 < 20;SELECT * FROM tbl_name  WHERE (key_col1 > 10 OR key_col2 = 20) AND nonkey_col=30;

The difference between the sort-union algorithm and the union algorithm is that the sort-union algorithm must first fetch row IDs for all rows and sort them before returning any rows.

8.13.3. Engine Condition Pushdown Optimization

This optimization improves the efficiency of direct comparisons between a nonindexed column and a constant. In such cases, the condition is "pushed down" to the storage engine for evaluation. This optimization can be used only by the NDBCLUSTER storage engine.

For MySQL Cluster, this optimization can eliminate the need to send nonmatching rows over the network between the cluster's data nodes and the MySQL Server that issued the query, and can speed up queries where it is used by a factor of 5 to 10 times over cases where condition pushdown could be but is not used.

Suppose that a MySQL Cluster table is defined as follows:

CREATE TABLE t1 ( a INT, b INT, KEY(a)) ENGINE=NDBCLUSTER;

Condition pushdown can be used with queries such as the one shown here, which includes a comparison between a nonindexed column and a constant:

SELECT a, b FROM t1 WHERE b = 10;

The use of condition pushdown can be seen in the output of EXPLAIN:

mysql> EXPLAIN SELECT a,b FROM t1 WHERE b = 10\G*************************** 1. row ***************************   id: 1  select_type: SIMPLE table: t1 type: ALLpossible_keys: NULL  key: NULL  key_len: NULL  ref: NULL rows: 10 Extra: Using where with pushed condition

However, condition pushdown cannot be used with either of these two queries:

SELECT a,b FROM t1 WHERE a = 10;SELECT a,b FROM t1 WHERE b + 1 = 10;

Condition pushdown is not applicable to the first query because an index exists on column a. (An index access method would be more efficient and so would be chosen in preference to condition pushdown.) Condition pushdown cannot be employed for the second query because the comparison involving the nonindexed column b is indirect. (However, condition pushdown could be applied if you were to reduce b + 1 = 10 to b = 9 in the WHERE clause.)

Condition pushdown may also be employed when an indexed column is compared with a constant using a > or < operator:

mysql> EXPLAIN SELECT a, b FROM t1 WHERE a < 2\G*************************** 1. row ***************************   id: 1  select_type: SIMPLE table: t1 type: rangepossible_keys: a  key: a  key_len: 5  ref: NULL rows: 2 Extra: Using where with pushed condition

Other supported comparisons for condition pushdown include the following:

  • column [NOT] LIKE pattern

    pattern must be a string literal containing the pattern to be matched; for syntax, see Section 12.5.1, "String Comparison Functions".

  • column IS [NOT] NULL

  • column IN (value_list)

    Each item in the value_list must be a constant, literal value.

  • column BETWEEN constant1 AND constant2

    constant1 and constant2 must each be a constant, literal value.

In all of the cases in the preceding list, it is possible for the condition to be converted into the form of one or more direct comparisons between a column and a constant.

Engine condition pushdown is enabled by default. To disable it at server startup, set the optimizer_switch system variable. For example, in a my.cnf file, use these lines:

[mysqld]optimizer_switch=engine_condition_pushdown=off

At runtime, disable condition pushdown like this:

SET optimizer_switch='engine_condition_pushdown=off';

Before MySQL 5.5.3, disable condition pushdown using the engine_condition_pushdown system variable. At server startup:

[mysqld]engine_condition_pushdown=0

At runtime, use either of these statements:

SET engine_condition_pushdown=OFF;
SET engine_condition_pushdown=0;

Limitations. Condition pushdown is subject to the following limitations:

  • Condition pushdown is supported only by the NDBCLUSTER storage engine.

  • Columns may be compared with constants only; however, this includes expressions which evaluate to constant values.

  • Columns used in comparisons cannot be of any of the BLOB or TEXT types.

  • A string value to be compared with a column must use the same collation as the column.

  • Joins are not directly supported; conditions involving multiple tables are pushed separately where possible. Use EXPLAIN EXTENDED to determine which conditions are actually pushed down.

8.13.4. IS NULL Optimization

MySQL can perform the same optimization on col_name IS NULL that it can use for col_name = constant_value. For example, MySQL can use indexes and ranges to search for NULL with IS NULL.

Examples:

SELECT * FROM tbl_name WHERE key_col IS NULL;SELECT * FROM tbl_name WHERE key_col <=> NULL;SELECT * FROM tbl_name  WHERE key_col=const1 OR key_col=const2 OR key_col IS NULL;

If a WHERE clause includes a col_name IS NULL condition for a column that is declared as NOT NULL, that expression is optimized away. This optimization does not occur in cases when the column might produce NULL anyway; for example, if it comes from a table on the right side of a LEFT JOIN.

MySQL can also optimize the combination col_name = expr OR col_name IS NULL, a form that is common in resolved subqueries. EXPLAIN shows ref_or_null when this optimization is used.

This optimization can handle one IS NULL for any key part.

Some examples of queries that are optimized, assuming that there is an index on columns a and b of table t2:

SELECT * FROM t1 WHERE t1.a=expr OR t1.a IS NULL;SELECT * FROM t1, t2 WHERE t1.a=t2.a OR t2.a IS NULL;SELECT * FROM t1, t2  WHERE (t1.a=t2.a OR t2.a IS NULL) AND t2.b=t1.b;SELECT * FROM t1, t2  WHERE t1.a=t2.a AND (t2.b=t1.b OR t2.b IS NULL);SELECT * FROM t1, t2  WHERE (t1.a=t2.a AND t2.a IS NULL AND ...)  OR (t1.a=t2.a AND t2.a IS NULL AND ...);

ref_or_null works by first doing a read on the reference key, and then a separate search for rows with a NULL key value.

Note that the optimization can handle only one IS NULL level. In the following query, MySQL uses key lookups only on the expression (t1.a=t2.a AND t2.a IS NULL) and is not able to use the key part on b:

SELECT * FROM t1, t2  WHERE (t1.a=t2.a AND t2.a IS NULL)  OR (t1.b=t2.b AND t2.b IS NULL);

8.13.5. LEFT JOIN and RIGHT JOINOptimization

MySQL implements an A LEFT JOIN B join_condition as follows:

  • Table B is set to depend on table A and all tables on which A depends.

  • Table A is set to depend on all tables (except B) that are used in the LEFT JOIN condition.

  • The LEFT JOIN condition is used to decide how to retrieve rows from table B. (In other words, any condition in the WHERE clause is not used.)

  • All standard join optimizations are performed, with the exception that a table is always read after all tables on which it depends. If there is a circular dependence, MySQL issues an error.

  • All standard WHERE optimizations are performed.

  • If there is a row in A that matches the WHERE clause, but there is no row in B that matches the ON condition, an extra B row is generated with all columns set to NULL.

  • If you use LEFT JOIN to find rows that do not exist in some table and you have the following test: col_name IS NULL in the WHERE part, where col_name is a column that is declared as NOT NULL, MySQL stops searching for more rows (for a particular key combination) after it has found one row that matches the LEFT JOIN condition.

The implementation of RIGHT JOIN is analogous to that of LEFT JOIN with the roles of the tables reversed.

The join optimizer calculates the order in which tables should be joined. The table read order forced by LEFT JOIN or STRAIGHT_JOIN helps the join optimizer do its work much more quickly, because there are fewer table permutations to check. Note that this means that if you do a query of the following type, MySQL does a full scan on b because the LEFT JOIN forces it to be read before d:

SELECT *  FROM a JOIN b LEFT JOIN c ON (c.key=a.key)  LEFT JOIN d ON (d.key=a.key)  WHERE b.key=d.key;

The fix in this case is reverse the order in which a and b are listed in the FROM clause:

SELECT *  FROM b JOIN a LEFT JOIN c ON (c.key=a.key)  LEFT JOIN d ON (d.key=a.key)  WHERE b.key=d.key;

For a LEFT JOIN, if the WHERE condition is always false for the generated NULL row, the LEFT JOIN is changed to a normal join. For example, the WHERE clause would be false in the following query if t2.column1 were NULL:

SELECT * FROM t1 LEFT JOIN t2 ON (column1) WHERE t2.column2=5;

Therefore, it is safe to convert the query to a normal join:

SELECT * FROM t1, t2 WHERE t2.column2=5 AND t1.column1=t2.column1;

This can be made faster because MySQL can use table t2 before table t1 if doing so would result in a better query plan. To provide a hint about the table join order, use STRAIGHT_JOIN. (See Section 13.2.9, "SELECT Syntax".)

8.13.6. Nested-Loop Join Algorithms

MySQL executes joins between tables using a nested-loop algorithm or variations on it.

Nested-Loop Join Algorithm

A simple nested-loop join (NLJ) algorithm reads rows from the first table in a loop one at a time, passing each row to a nested loop that processes the next table in the join. This process is repeated as many times as there remain tables to be joined.

Assume that a join between three tables t1, t2, and t3 is to be executed using the following join types:

Table   Join Typet1  ranget2  reft3  ALL

If a simple NLJ algorithm is used, the join is processed like this:

for each row in t1 matching range {  for each row in t2 matching reference key { for each row in t3 {  if row satisfies join conditions,  send to client }  }}

Because the NLJ algorithm passes rows one at a time from outer loops to inner loops, it typically reads tables processed in the inner loops many times.

Block Nested-Loop Join Algorithm

A Block Nested-Loop (BNL) join algorithm uses buffering of rows read in outer loops to reduce the number of times that tables in inner loops must be read. For example, if 10 rows are read into a buffer and the buffer is passed to the next inner loop, each row read in the inner loop can be compared against all 10 rows in the buffer. The reduces the number of times the inner table must be read by an order of magnitude.

MySQL uses join buffering under these conditions:

  • The join_buffer_size system variable determines the size of each join buffer.

  • Join buffering can be used when the join is of type ALL or index (in other words, when no possible keys can be used, and a full scan is done, of either the data or index rows, respectively), or range.

  • One buffer is allocated for each join that can be buffered, so a given query might be processed using multiple join buffers.

  • A join buffer is never allocated for the first nonconst table, even if it would be of type ALL or index.

  • A join buffer is allocated prior to executing the join and freed after the query is done.

  • Only columns of interest to the join are stored in the join buffer, not whole rows.

For the example join described previously for the NLJ algorithm (without buffering), the join is done as follow using join buffering:

for each row in t1 matching range {  for each row in t2 matching reference key { store used columns from t1, t2 in join buffer if buffer is full {  for each row in t3 { for each t1, t2 combination in join buffer {  if row satisfies join conditions,  send to client }  }  empty buffer }  }}if buffer is not empty {  for each row in t3 { for each t1, t2 combination in join buffer {  if row satisfies join conditions,  send to client }  }}

If S is the size of each stored t1, t2 combination is the join buffer and C is the number of combinations in the buffer, the number of times table t3 is scanned is:

(S * C)/join_buffer_size + 1

The number of t3 scans decreases as the value of join_buffer_size increases, up to the point when join_buffer_size is large enough to hold all previous row combinations. At that point, there is no speed to be gained by making it larger.

8.13.7. Nested Join Optimization

The syntax for expressing joins permits nested joins. The following discussion refers to the join syntax described in Section 13.2.9.2, "JOIN Syntax".

The syntax of table_factor is extended in comparison with the SQL Standard. The latter accepts only table_reference, not a list of them inside a pair of parentheses. This is a conservative extension if we consider each comma in a list of table_reference items as equivalent to an inner join. For example:

SELECT * FROM t1 LEFT JOIN (t2, t3, t4) ON (t2.a=t1.a AND t3.b=t1.b AND t4.c=t1.c)

is equivalent to:

SELECT * FROM t1 LEFT JOIN (t2 CROSS JOIN t3 CROSS JOIN t4) ON (t2.a=t1.a AND t3.b=t1.b AND t4.c=t1.c)

In MySQL, CROSS JOIN is a syntactic equivalent to INNER JOIN (they can replace each other). In standard SQL, they are not equivalent. INNER JOIN is used with an ON clause; CROSS JOIN is used otherwise.

In general, parentheses can be ignored in join expressions containing only inner join operations. After removing parentheses and grouping operations to the left, the join expression:

t1 LEFT JOIN (t2 LEFT JOIN t3 ON t2.b=t3.b OR t2.b IS NULL)   ON t1.a=t2.a

transforms into the expression:

(t1 LEFT JOIN t2 ON t1.a=t2.a) LEFT JOIN t3 ON t2.b=t3.b OR t2.b IS NULL

Yet, the two expressions are not equivalent. To see this, suppose that the tables t1, t2, and t3 have the following state:

  • Table t1 contains rows (1), (2)

  • Table t2 contains row (1,101)

  • Table t3 contains row (101)

In this case, the first expression returns a result set including the rows (1,1,101,101), (2,NULL,NULL,NULL), whereas the second expression returns the rows (1,1,101,101), (2,NULL,NULL,101):

mysql> SELECT * -> FROM t1 ->  LEFT JOIN ->  (t2 LEFT JOIN t3 ON t2.b=t3.b OR t2.b IS NULL) ->  ON t1.a=t2.a;+------+------+------+------+| a | a | b | b |+------+------+------+------+| 1 | 1 |  101 |  101 || 2 | NULL | NULL | NULL |+------+------+------+------+mysql> SELECT * -> FROM (t1 LEFT JOIN t2 ON t1.a=t2.a) ->  LEFT JOIN t3 ->  ON t2.b=t3.b OR t2.b IS NULL;+------+------+------+------+| a | a | b | b |+------+------+------+------+| 1 | 1 |  101 |  101 || 2 | NULL | NULL |  101 |+------+------+------+------+

In the following example, an outer join operation is used together with an inner join operation:

t1 LEFT JOIN (t2, t3) ON t1.a=t2.a

That expression cannot be transformed into the following expression:

t1 LEFT JOIN t2 ON t1.a=t2.a, t3.

For the given table states, the two expressions return different sets of rows:

mysql> SELECT * -> FROM t1 LEFT JOIN (t2, t3) ON t1.a=t2.a;+------+------+------+------+| a | a | b | b |+------+------+------+------+| 1 | 1 |  101 |  101 || 2 | NULL | NULL | NULL |+------+------+------+------+mysql> SELECT * -> FROM t1 LEFT JOIN t2 ON t1.a=t2.a, t3;+------+------+------+------+| a | a | b | b |+------+------+------+------+| 1 | 1 |  101 |  101 || 2 | NULL | NULL |  101 |+------+------+------+------+

Therefore, if we omit parentheses in a join expression with outer join operators, we might change the result set for the original expression.

More exactly, we cannot ignore parentheses in the right operand of the left outer join operation and in the left operand of a right join operation. In other words, we cannot ignore parentheses for the inner table expressions of outer join operations. Parentheses for the other operand (operand for the outer table) can be ignored.

The following expression:

(t1,t2) LEFT JOIN t3 ON P(t2.b,t3.b)

is equivalent to this expression:

t1, t2 LEFT JOIN t3 ON P(t2.b,t3.b)

for any tables t1,t2,t3 and any condition P over attributes t2.b and t3.b.

Whenever the order of execution of the join operations in a join expression (join_table) is not from left to right, we talk about nested joins. Consider the following queries:

SELECT * FROM t1 LEFT JOIN (t2 LEFT JOIN t3 ON t2.b=t3.b) ON t1.a=t2.a  WHERE t1.a > 1SELECT * FROM t1 LEFT JOIN (t2, t3) ON t1.a=t2.a  WHERE (t2.b=t3.b OR t2.b IS NULL) AND t1.a > 1

Those queries are considered to contain these nested joins:

t2 LEFT JOIN t3 ON t2.b=t3.bt2, t3

The nested join is formed in the first query with a left join operation, whereas in the second query it is formed with an inner join operation.

In the first query, the parentheses can be omitted: The grammatical structure of the join expression will dictate the same order of execution for join operations. For the second query, the parentheses cannot be omitted, although the join expression here can be interpreted unambiguously without them. (In our extended syntax the parentheses in (t2, t3) of the second query are required, although theoretically the query could be parsed without them: We still would have unambiguous syntactical structure for the query because LEFT JOIN and ON would play the role of the left and right delimiters for the expression (t2,t3).)

The preceding examples demonstrate these points:

  • For join expressions involving only inner joins (and not outer joins), parentheses can be removed. You can remove parentheses and evaluate left to right (or, in fact, you can evaluate the tables in any order).

  • The same is not true, in general, for outer joins or for outer joins mixed with inner joins. Removal of parentheses may change the result.

Queries with nested outer joins are executed in the same pipeline manner as queries with inner joins. More exactly, a variation of the nested-loop join algorithm is exploited. Recall by what algorithmic schema the nested-loop join executes a query. Suppose that we have a join query over 3 tables T1,T2,T3 of the form:

SELECT * FROM T1 INNER JOIN T2 ON P1(T1,T2) INNER JOIN T3 ON P2(T2,T3)  WHERE P(T1,T2,T3).

Here, P1(T1,T2) and P2(T3,T3) are some join conditions (on expressions), whereas P(t1,t2,t3) is a condition over columns of tables T1,T2,T3.

The nested-loop join algorithm would execute this query in the following manner:

FOR each row t1 in T1 {  FOR each row t2 in T2 such that P1(t1,t2) { FOR each row t3 in T3 such that P2(t2,t3) {  IF P(t1,t2,t3) { t:=t1||t2||t3; OUTPUT t;  } }  }}

The notation t1||t2||t3 means "a row constructed by concatenating the columns of rows t1, t2, and t3." In some of the following examples, NULL where a row name appears means that NULL is used for each column of that row. For example, t1||t2||NULL means "a row constructed by concatenating the columns of rows t1 and t2, and NULL for each column of t3."

Now let's consider a query with nested outer joins:

SELECT * FROM T1 LEFT JOIN  (T2 LEFT JOIN T3 ON P2(T2,T3))  ON P1(T1,T2)  WHERE P(T1,T2,T3).

For this query, we modify the nested-loop pattern to get:

FOR each row t1 in T1 {  BOOL f1:=FALSE;  FOR each row t2 in T2 such that P1(t1,t2) { BOOL f2:=FALSE; FOR each row t3 in T3 such that P2(t2,t3) {  IF P(t1,t2,t3) { t:=t1||t2||t3; OUTPUT t;  }  f2=TRUE;  f1=TRUE; } IF (!f2) {  IF P(t1,t2,NULL) { t:=t1||t2||NULL; OUTPUT t;  }  f1=TRUE; }  }  IF (!f1) { IF P(t1,NULL,NULL) {  t:=t1||NULL||NULL; OUTPUT t; }  }}

In general, for any nested loop for the first inner table in an outer join operation, a flag is introduced that is turned off before the loop and is checked after the loop. The flag is turned on when for the current row from the outer table a match from the table representing the inner operand is found. If at the end of the loop cycle the flag is still off, no match has been found for the current row of the outer table. In this case, the row is complemented by NULL values for the columns of the inner tables. The result row is passed to the final check for the output or into the next nested loop, but only if the row satisfies the join condition of all embedded outer joins.

In our example, the outer join table expressed by the following expression is embedded:

(T2 LEFT JOIN T3 ON P2(T2,T3))

Note that for the query with inner joins, the optimizer could choose a different order of nested loops, such as this one:

FOR each row t3 in T3 {  FOR each row t2 in T2 such that P2(t2,t3) { FOR each row t1 in T1 such that P1(t1,t2) {  IF P(t1,t2,t3) { t:=t1||t2||t3; OUTPUT t;  } }  }}

For the queries with outer joins, the optimizer can choose only such an order where loops for outer tables precede loops for inner tables. Thus, for our query with outer joins, only one nesting order is possible. For the following query, the optimizer will evaluate two different nestings:

SELECT * T1 LEFT JOIN (T2,T3) ON P1(T1,T2) AND P2(T1,T3)  WHERE P(T1,T2,T3)

The nestings are these:

FOR each row t1 in T1 {  BOOL f1:=FALSE;  FOR each row t2 in T2 such that P1(t1,t2) { FOR each row t3 in T3 such that P2(t1,t3) {  IF P(t1,t2,t3) { t:=t1||t2||t3; OUTPUT t;  }  f1:=TRUE }  }  IF (!f1) { IF P(t1,NULL,NULL) {  t:=t1||NULL||NULL; OUTPUT t; }  }}

and:

FOR each row t1 in T1 {  BOOL f1:=FALSE;  FOR each row t3 in T3 such that P2(t1,t3) { FOR each row t2 in T2 such that P1(t1,t2) {  IF P(t1,t2,t3) { t:=t1||t2||t3; OUTPUT t;  }  f1:=TRUE }  }  IF (!f1) { IF P(t1,NULL,NULL) {  t:=t1||NULL||NULL; OUTPUT t; }  }}

In both nestings, T1 must be processed in the outer loop because it is used in an outer join. T2 and T3 are used in an inner join, so that join must be processed in the inner loop. However, because the join is an inner join, T2 and T3 can be processed in either order.

When discussing the nested-loop algorithm for inner joins, we omitted some details whose impact on the performance of query execution may be huge. We did not mention so-called "pushed-down" conditions. Suppose that our WHERE condition P(T1,T2,T3) can be represented by a conjunctive formula:

P(T1,T2,T2) = C1(T1) AND C2(T2) AND C3(T3).

In this case, MySQL actually uses the following nested-loop schema for the execution of the query with inner joins:

FOR each row t1 in T1 such that C1(t1) {  FOR each row t2 in T2 such that P1(t1,t2) AND C2(t2)  { FOR each row t3 in T3 such that P2(t2,t3) AND C3(t3) {  IF P(t1,t2,t3) { t:=t1||t2||t3; OUTPUT t;  } }  }}

You see that each of the conjuncts C1(T1), C2(T2), C3(T3) are pushed out of the most inner loop to the most outer loop where it can be evaluated. If C1(T1) is a very restrictive condition, this condition pushdown may greatly reduce the number of rows from table T1 passed to the inner loops. As a result, the execution time for the query may improve immensely.

For a query with outer joins, the WHERE condition is to be checked only after it has been found that the current row from the outer table has a match in the inner tables. Thus, the optimization of pushing conditions out of the inner nested loops cannot be applied directly to queries with outer joins. Here we have to introduce conditional pushed-down predicates guarded by the flags that are turned on when a match has been encountered.

For our example with outer joins with:

P(T1,T2,T3)=C1(T1) AND C(T2) AND C3(T3)

the nested-loop schema using guarded pushed-down conditions looks like this:

FOR each row t1 in T1 such that C1(t1) {  BOOL f1:=FALSE;  FOR each row t2 in T2  such that P1(t1,t2) AND (f1?C2(t2):TRUE) { BOOL f2:=FALSE; FOR each row t3 in T3 such that P2(t2,t3) AND (f1&&f2?C3(t3):TRUE) {  IF (f1&&f2?TRUE:(C2(t2) AND C3(t3))) { t:=t1||t2||t3; OUTPUT t;  }  f2=TRUE;  f1=TRUE; } IF (!f2) {  IF (f1?TRUE:C2(t2) && P(t1,t2,NULL)) { t:=t1||t2||NULL; OUTPUT t;  }  f1=TRUE; }  }  IF (!f1 && P(t1,NULL,NULL)) {  t:=t1||NULL||NULL; OUTPUT t;  }}

In general, pushed-down predicates can be extracted from join conditions such as P1(T1,T2) and P(T2,T3). In this case, a pushed-down predicate is guarded also by a flag that prevents checking the predicate for the NULL-complemented row generated by the corresponding outer join operation.

Note that access by key from one inner table to another in the same nested join is prohibited if it is induced by a predicate from the WHERE condition. (We could use conditional key access in this case, but this technique is not employed yet in MySQL 5.5.)

8.13.8. Outer Join Simplification

Table expressions in the FROM clause of a query are simplified in many cases.

At the parser stage, queries with right outer joins operations are converted to equivalent queries containing only left join operations. In the general case, the conversion is performed according to the following rule:

(T1, ...) RIGHT JOIN (T2,...) ON P(T1,...,T2,...) =(T2, ...) LEFT JOIN (T1,...) ON P(T1,...,T2,...)

All inner join expressions of the form T1 INNER JOIN T2 ON P(T1,T2) are replaced by the list T1,T2, P(T1,T2) being joined as a conjunct to the WHERE condition (or to the join condition of the embedding join, if there is any).

When the optimizer evaluates plans for join queries with outer join operation, it takes into consideration only the plans where, for each such operation, the outer tables are accessed before the inner tables. The optimizer options are limited because only such plans enables us to execute queries with outer joins operations by the nested loop schema.

Suppose that we have a query of the form:

SELECT * T1 LEFT JOIN T2 ON P1(T1,T2)  WHERE P(T1,T2) AND R(T2)

with R(T2) narrowing greatly the number of matching rows from table T2. If we executed the query as it is, the optimizer would have no other choice besides to access table T1 before table T2 that may lead to a very inefficient execution plan.

Fortunately, MySQL converts such a query into a query without an outer join operation if the WHERE condition is null-rejected. A condition is called null-rejected for an outer join operation if it evaluates to FALSE or to UNKNOWN for any NULL-complemented row built for the operation.

Thus, for this outer join:

T1 LEFT JOIN T2 ON T1.A=T2.A

Conditions such as these are null-rejected:

T2.B IS NOT NULL,T2.B > 3,T2.C <= T1.C,T2.B < 2 OR T2.C > 1

Conditions such as these are not null-rejected:

T2.B IS NULL,T1.B < 3 OR T2.B IS NOT NULL,T1.B < 3 OR T2.B > 3

The general rules for checking whether a condition is null-rejected for an outer join operation are simple. A condition is null-rejected in the following cases:

  • If it is of the form A IS NOT NULL, where A is an attribute of any of the inner tables

  • If it is a predicate containing a reference to an inner table that evaluates to UNKNOWN when one of its arguments is NULL

  • If it is a conjunction containing a null-rejected condition as a conjunct

  • If it is a disjunction of null-rejected conditions

A condition can be null-rejected for one outer join operation in a query and not null-rejected for another. In the query:

SELECT * FROM T1 LEFT JOIN T2 ON T2.A=T1.A LEFT JOIN T3 ON T3.B=T1.B  WHERE T3.C > 0

the WHERE condition is null-rejected for the second outer join operation but is not null-rejected for the first one.

If the WHERE condition is null-rejected for an outer join operation in a query, the outer join operation is replaced by an inner join operation.

For example, the preceding query is replaced with the query:

SELECT * FROM T1 LEFT JOIN T2 ON T2.A=T1.A INNER JOIN T3 ON T3.B=T1.B  WHERE T3.C > 0

For the original query, the optimizer would evaluate plans compatible with only one access order T1,T2,T3. For the replacing query, it additionally considers the access sequence T3,T1,T2.

A conversion of one outer join operation may trigger a conversion of another. Thus, the query:

SELECT * FROM T1 LEFT JOIN T2 ON T2.A=T1.A LEFT JOIN T3 ON T3.B=T2.B  WHERE T3.C > 0

will be first converted to the query:

SELECT * FROM T1 LEFT JOIN T2 ON T2.A=T1.A INNER JOIN T3 ON T3.B=T2.B  WHERE T3.C > 0

which is equivalent to the query:

SELECT * FROM (T1 LEFT JOIN T2 ON T2.A=T1.A), T3  WHERE T3.C > 0 AND T3.B=T2.B

Now the remaining outer join operation can be replaced by an inner join, too, because the condition T3.B=T2.B is null-rejected and we get a query without outer joins at all:

SELECT * FROM (T1 INNER JOIN T2 ON T2.A=T1.A), T3  WHERE T3.C > 0 AND T3.B=T2.B

Sometimes we succeed in replacing an embedded outer join operation, but cannot convert the embedding outer join. The following query:

SELECT * FROM T1 LEFT JOIN  (T2 LEFT JOIN T3 ON T3.B=T2.B)  ON T2.A=T1.A  WHERE T3.C > 0

is converted to:

SELECT * FROM T1 LEFT JOIN  (T2 INNER JOIN T3 ON T3.B=T2.B)  ON T2.A=T1.A  WHERE T3.C > 0,

That can be rewritten only to the form still containing the embedding outer join operation:

SELECT * FROM T1 LEFT JOIN  (T2,T3)  ON (T2.A=T1.A AND T3.B=T2.B)  WHERE T3.C > 0.

When trying to convert an embedded outer join operation in a query, we must take into account the join condition for the embedding outer join together with the WHERE condition. In the query:

SELECT * FROM T1 LEFT JOIN  (T2 LEFT JOIN T3 ON T3.B=T2.B)  ON T2.A=T1.A AND T3.C=T1.C  WHERE T3.D > 0 OR T1.D > 0

the WHERE condition is not null-rejected for the embedded outer join, but the join condition of the embedding outer join T2.A=T1.A AND T3.C=T1.C is null-rejected. So the query can be converted to:

SELECT * FROM T1 LEFT JOIN  (T2, T3)  ON T2.A=T1.A AND T3.C=T1.C AND T3.B=T2.B  WHERE T3.D > 0 OR T1.D > 0

8.13.9. ORDER BY Optimization

In some cases, MySQL can use an index to satisfy an ORDER BY clause without doing any extra sorting.

The index can also be used even if the ORDER BY does not match the index exactly, as long as all of the unused portions of the index and all the extra ORDER BY columns are constants in the WHERE clause. The following queries use the index to resolve the ORDER BY part:

SELECT * FROM t1  ORDER BY key_part1,key_part2,... ;SELECT * FROM t1  WHERE key_part1=constant  ORDER BY key_part2;SELECT * FROM t1  ORDER BY key_part1 DESC, key_part2 DESC;SELECT * FROM t1  WHERE key_part1=1  ORDER BY key_part1 DESC, key_part2 DESC;

In some cases, MySQL cannot use indexes to resolve the ORDER BY, although it still uses indexes to find the rows that match the WHERE clause. These cases include the following:

  • You use ORDER BY on different keys:

    SELECT * FROM t1 ORDER BY key1, key2;
  • You use ORDER BY on nonconsecutive parts of a key:

    SELECT * FROM t1 WHERE key2=constant ORDER BY key_part2;
  • You mix ASC and DESC:

    SELECT * FROM t1 ORDER BY key_part1 DESC, key_part2 ASC;
  • The key used to fetch the rows is not the same as the one used in the ORDER BY:

    SELECT * FROM t1 WHERE key2=constant ORDER BY key1;
  • You use ORDER BY with an expression that includes terms other than the key column name:

    SELECT * FROM t1 ORDER BY ABS(key);SELECT * FROM t1 ORDER BY -key;
  • You are joining many tables, and the columns in the ORDER BY are not all from the first nonconstant table that is used to retrieve rows. (This is the first table in the EXPLAIN output that does not have a const join type.)

  • You have different ORDER BY and GROUP BY expressions.

  • You index only a prefix of a column named in the ORDER BY clause. In this case, the index cannot be used to fully resolve the sort order. For example, if you have a CHAR(20) column, but index only the first 10 bytes, the index cannot distinguish values past the 10th byte and a filesort will be needed.

  • The type of table index used does not store rows in order. For example, this is true for a HASH index in a MEMORY table.

Availability of an index for sorting may be affected by the use of column aliases. Suppose that the column t1.a is indexed. In this statement, the name of the column in the select list is a. It refers to t1.a, so for the reference to a in the ORDER BY, the index can be used:

SELECT a FROM t1 ORDER BY a;

In this statement, the name of the column in the select list is also a, but it is the alias name. It refers to ABS(a), so for the reference to a in the ORDER BY, the index cannot be used:

SELECT ABS(a) AS a FROM t1 ORDER BY a;

In the following statement, the ORDER BY refers to a name that is not the name of a column in the select list. But there is a column in t1 named a, so the ORDER BY uses that, and the index can be used. (The resulting sort order may be completely different from the order for ABS(a), of course.)

SELECT ABS(a) AS b FROM t1 ORDER BY a;

By default, MySQL sorts all GROUP BY col1, col2, ... queries as if you specified ORDER BY col1, col2, ... in the query as well. If you include an ORDER BY clause explicitly that contains the same column list, MySQL optimizes it away without any speed penalty, although the sorting still occurs. If a query includes GROUP BY but you want to avoid the overhead of sorting the result, you can suppress sorting by specifying ORDER BY NULL. For example:

INSERT INTO fooSELECT a, COUNT(*) FROM bar GROUP BY a ORDER BY NULL;

With EXPLAIN SELECT ... ORDER BY, you can check whether MySQL can use indexes to resolve the query. It cannot if you see Using filesort in the Extra column. See Section 8.8.1, "Optimizing Queries with EXPLAIN". Filesort uses a fixed-length row-storage format similar to that used by the MEMORY storage engine. Variable-length types such as VARCHAR are stored using a fixed length.

MySQL has two filesort algorithms for sorting and retrieving results. The original method uses only the ORDER BY columns. The modified method uses not just the ORDER BY columns, but all the columns used in the query.

The optimizer selects which filesort algorithm to use. It normally uses the modified algorithm except when BLOB or TEXT columns are involved, in which case it uses the original algorithm.

The original filesort algorithm works as follows:

  1. Read all rows according to key or by table scanning. Rows that do not match the WHERE clause are skipped.

  2. For each row, store a pair of values in a buffer (the sort key and the row pointer). The size of the buffer is the value of the sort_buffer_size system variable.

  3. When the buffer gets full, run a qsort (quicksort) on it and store the result in a temporary file. Save a pointer to the sorted block. (If all pairs fit into the sort buffer, no temporary file is created.)

  4. Repeat the preceding steps until all rows have been read.

  5. Do a multi-merge of up to MERGEBUFF (7) regions to one block in another temporary file. Repeat until all blocks from the first file are in the second file.

  6. Repeat the following until there are fewer than MERGEBUFF2 (15) blocks left.

  7. On the last multi-merge, only the pointer to the row (the last part of the sort key) is written to a result file.

  8. Read the rows in sorted order by using the row pointers in the result file. To optimize this, we read in a big block of row pointers, sort them, and use them to read the rows in sorted order into a row buffer. The size of the buffer is the value of the read_rnd_buffer_size system variable. The code for this step is in the sql/records.cc source file.

One problem with this approach is that it reads rows twice: One time when evaluating the WHERE clause, and again after sorting the pair values. And even if the rows were accessed successively the first time (for example, if a table scan is done), the second time they are accessed randomly. (The sort keys are ordered, but the row positions are not.)

The modified filesort algorithm incorporates an optimization such that it records not only the sort key value and row position, but also the columns required for the query. This avoids reading the rows twice. The modified filesort algorithm works like this:

  1. Read the rows that match the WHERE clause.

  2. For each row, record a tuple of values consisting of the sort key value and row position, and also the columns required for the query.

  3. Sort the tuples by sort key value

  4. Retrieve the rows in sorted order, but read the required columns directly from the sorted tuples rather than by accessing the table a second time.

Using the modified filesort algorithm, the tuples are longer than the pairs used in the original method, and fewer of them fit in the sort buffer (the size of which is given by sort_buffer_size). As a result, it is possible for the extra I/O to make the modified approach slower, not faster. To avoid a slowdown, the optimization is used only if the total size of the extra columns in the sort tuple does not exceed the value of the max_length_for_sort_data system variable. (A symptom of setting the value of this variable too high is a combination of high disk activity and low CPU activity.)

For slow queries for which filesort is not used, try lowering max_length_for_sort_data to a value that is appropriate to trigger a filesort.

If you want to increase ORDER BY speed, check whether you can get MySQL to use indexes rather than an extra sorting phase. If this is not possible, you can try the following strategies:

  • Increase the size of the sort_buffer_size variable.

  • Increase the size of the read_rnd_buffer_size variable.

  • Use less RAM per row by declaring columns only as large as they need to be to hold the values stored in them. For example, CHAR(16) is better than CHAR(200) if values never exceed 16 characters.

  • Change tmpdir to point to a dedicated file system with large amounts of free space. Also, this option accepts several paths that are used in round-robin fashion, so you can use this feature to spread the load across several directories. Paths should be separated by colon characters (":") on Unix and semicolon characters (";") on Windows. The paths should be for directories in file systems that are located on different physical disks, not different partitions on the same disk.

8.13.10. GROUP BY Optimization

The most general way to satisfy a GROUP BY clause is to scan the whole table and create a new temporary table where all rows from each group are consecutive, and then use this temporary table to discover groups and apply aggregate functions (if any). In some cases, MySQL is able to do much better than that and to avoid creation of temporary tables by using index access.

The most important preconditions for using indexes for GROUP BY are that all GROUP BY columns reference attributes from the same index, and that the index stores its keys in order (for example, this is a BTREE index and not a HASH index). Whether use of temporary tables can be replaced by index access also depends on which parts of an index are used in a query, the conditions specified for these parts, and the selected aggregate functions.

There are two ways to execute a GROUP BY query through index access, as detailed in the following sections. In the first method, the grouping operation is applied together with all range predicates (if any). The second method first performs a range scan, and then groups the resulting tuples.

In MySQL, GROUP BY is used for sorting, so the server may also apply ORDER BY optimizations to grouping. See Section 8.13.9, "ORDER BY Optimization".

8.13.10.1. Loose Index Scan

The most efficient way to process GROUP BY is when an index is used to directly retrieve the grouping columns. With this access method, MySQL uses the property of some index types that the keys are ordered (for example, BTREE). This property enables use of lookup groups in an index without having to consider all keys in the index that satisfy all WHERE conditions. This access method considers only a fraction of the keys in an index, so it is called a loose index scan. When there is no WHERE clause, a loose index scan reads as many keys as the number of groups, which may be a much smaller number than that of all keys. If the WHERE clause contains range predicates (see the discussion of the range join type in Section 8.8.1, "Optimizing Queries with EXPLAIN"), a loose index scan looks up the first key of each group that satisfies the range conditions, and again reads the least possible number of keys. This is possible under the following conditions:

  • The query is over a single table.

  • The GROUP BY names only columns that form a leftmost prefix of the index and no other columns. (If, instead of GROUP BY, the query has a DISTINCT clause, all distinct attributes refer to columns that form a leftmost prefix of the index.) For example, if a table t1 has an index on (c1,c2,c3), loose index scan is applicable if the query has GROUP BY c1, c2,. It is not applicable if the query has GROUP BY c2, c3 (the columns are not a leftmost prefix) or GROUP BY c1, c2, c4 (c4 is not in the index).

  • The only aggregate functions used in the select list (if any) are MIN() and MAX(), and all of them refer to the same column. The column must be in the index and must follow the columns in the GROUP BY.

  • Any other parts of the index than those from the GROUP BY referenced in the query must be constants (that is, they must be referenced in equalities with constants), except for the argument of MIN() or MAX() functions.

  • For columns in the index, full column values must be indexed, not just a prefix. For example, with c1 VARCHAR(20), INDEX (c1(10)), the index cannot be used for loose index scan.

If loose index scan is applicable to a query, the EXPLAIN output shows Using index for group-by in the Extra column.

Assume that there is an index idx(c1,c2,c3) on table t1(c1,c2,c3,c4). The loose index scan access method can be used for the following queries:

SELECT c1, c2 FROM t1 GROUP BY c1, c2;SELECT DISTINCT c1, c2 FROM t1;SELECT c1, MIN(c2) FROM t1 GROUP BY c1;SELECT c1, c2 FROM t1 WHERE c1 < const GROUP BY c1, c2;SELECT MAX(c3), MIN(c3), c1, c2 FROM t1 WHERE c2 > const GROUP BY c1, c2;SELECT c2 FROM t1 WHERE c1 < const GROUP BY c1, c2;SELECT c1, c2 FROM t1 WHERE c3 = const GROUP BY c1, c2;

The following queries cannot be executed with this quick select method, for the reasons given:

  • There are aggregate functions other than MIN() or MAX():

    SELECT c1, SUM(c2) FROM t1 GROUP BY c1;
  • The columns in the GROUP BY clause do not form a leftmost prefix of the index:

    SELECT c1, c2 FROM t1 GROUP BY c2, c3;
  • The query refers to a part of a key that comes after the GROUP BY part, and for which there is no equality with a constant:

    SELECT c1, c3 FROM t1 GROUP BY c1, c2;

    Were the query to include WHERE c3 = const, loose index scan could be used.

As of MySQL 5.5, the loose index scan access method can be applied to other forms of aggregate function references in the select list, in addition to the MIN() and MAX() references already supported:

Assume that there is an index idx(c1,c2,c3) on table t1(c1,c2,c3,c4). The loose index scan access method can be used for the following queries:

SELECT COUNT(DISTINCT c1), SUM(DISTINCT c1) FROM t1;SELECT COUNT(DISTINCT c1, c2), COUNT(DISTINCT c2, c1) FROM t1;

Loose index scan is not applicable for the following queries:

SELECT DISTINCT COUNT(DISTINCT c1) FROM t1;SELECT COUNT(DISTINCT c1) FROM t1 GROUP BY c1;

8.13.10.2. Tight Index Scan

A tight index scan may be either a full index scan or a range index scan, depending on the query conditions.

When the conditions for a loose index scan are not met, it still may be possible to avoid creation of temporary tables for GROUP BY queries. If there are range conditions in the WHERE clause, this method reads only the keys that satisfy these conditions. Otherwise, it performs an index scan. Because this method reads all keys in each range defined by the WHERE clause, or scans the whole index if there are no range conditions, we term it a tight index scan. With a tight index scan, the grouping operation is performed only after all keys that satisfy the range conditions have been found.

For this method to work, it is sufficient that there is a constant equality condition for all columns in a query referring to parts of the key coming before or in between parts of the GROUP BY key. The constants from the equality conditions fill in any "gaps" in the search keys so that it is possible to form complete prefixes of the index. These index prefixes then can be used for index lookups. If we require sorting of the GROUP BY result, and it is possible to form search keys that are prefixes of the index, MySQL also avoids extra sorting operations because searching with prefixes in an ordered index already retrieves all the keys in order.

Assume that there is an index idx(c1,c2,c3) on table t1(c1,c2,c3,c4). The following queries do not work with the loose index scan access method described earlier, but still work with the tight index scan access method.

  • There is a gap in the GROUP BY, but it is covered by the condition c2 = 'a':

    SELECT c1, c2, c3 FROM t1 WHERE c2 = 'a' GROUP BY c1, c3;
  • The GROUP BY does not begin with the first part of the key, but there is a condition that provides a constant for that part:

    SELECT c1, c2, c3 FROM t1 WHERE c1 = 'a' GROUP BY c2, c3;

8.13.11. DISTINCT Optimization

DISTINCT combined with ORDER BY needs a temporary table in many cases.

Because DISTINCT may use GROUP BY, learn how MySQL works with columns in ORDER BY or HAVING clauses that are not part of the selected columns. See Section 12.16.3, "MySQL Extensions to GROUP BY".

In most cases, a DISTINCT clause can be considered as a special case of GROUP BY. For example, the following two queries are equivalent:

SELECT DISTINCT c1, c2, c3 FROM t1WHERE c1 > const;SELECT c1, c2, c3 FROM t1WHERE c1 > const GROUP BY c1, c2, c3;

Due to this equivalence, the optimizations applicable to GROUP BY queries can be also applied to queries with a DISTINCT clause. Thus, for more details on the optimization possibilities for DISTINCT queries, see Section 8.13.10, "GROUP BY Optimization".

When combining LIMIT row_count with DISTINCT, MySQL stops as soon as it finds row_count unique rows.

If you do not use columns from all tables named in a query, MySQL stops scanning any unused tables as soon as it finds the first match. In the following case, assuming that t1 is used before t2 (which you can check with EXPLAIN), MySQL stops reading from t2 (for any particular row in t1) when it finds the first row in t2:

SELECT DISTINCT t1.a FROM t1, t2 where t1.a=t2.a;

8.13.12. Optimizing Subqueries with EXISTS Strategy

Certain optimizations are applicable to comparisons that use the IN operator to test subquery results (or that use =ANY, which is equivalent). This section discusses these optimizations, particularly with regard to the challenges that NULL values present. The last part of the discussion includes suggestions on what you can do to help the optimizer.

Consider the following subquery comparison:

outer_expr IN (SELECT inner_expr FROM ... WHERE subquery_where)

MySQL evaluates queries "from outside to inside." That is, it first obtains the value of the outer expression outer_expr, and then runs the subquery and captures the rows that it produces.

A very useful optimization is to "inform" the subquery that the only rows of interest are those where the inner expression inner_expr is equal to outer_expr. This is done by pushing down an appropriate equality into the subquery's WHERE clause. That is, the comparison is converted to this:

EXISTS (SELECT 1 FROM ... WHERE subquery_where AND outer_expr=inner_expr)

After the conversion, MySQL can use the pushed-down equality to limit the number of rows that it must examine when evaluating the subquery.

More generally, a comparison of N values to a subquery that returns N-value rows is subject to the same conversion. If oe_i and ie_i represent corresponding outer and inner expression values, this subquery comparison:

(oe_1, ..., oe_N) IN  (SELECT ie_1, ..., ie_N FROM ... WHERE subquery_where)

Becomes:

EXISTS (SELECT 1 FROM ... WHERE subquery_where  AND oe_1 = ie_1  AND ...  AND oe_N = ie_N)

The following discussion assumes a single pair of outer and inner expression values for simplicity.

The conversion just described has its limitations. It is valid only if we ignore possible NULL values. That is, the "pushdown" strategy works as long as both of these two conditions are true:

  • outer_expr and inner_expr cannot be NULL.

  • You do not need to distinguish NULL from FALSE subquery results. (If the subquery is a part of an OR or AND expression in the WHERE clause, MySQL assumes that you do not care.)

When either or both of those conditions do not hold, optimization is more complex.

Suppose that outer_expr is known to be a non-NULL value but the subquery does not produce a row such that outer_expr = inner_expr. Then outer_expr IN (SELECT ...) evaluates as follows:

  • NULL, if the SELECT produces any row where inner_expr is NULL

  • FALSE, if the SELECT produces only non-NULL values or produces nothing

In this situation, the approach of looking for rows with outer_expr = inner_expr is no longer valid. It is necessary to look for such rows, but if none are found, also look for rows where inner_expr is NULL. Roughly speaking, the subquery can be converted to:

EXISTS (SELECT 1 FROM ... WHERE subquery_where AND (outer_expr=inner_expr OR inner_expr IS NULL))

The need to evaluate the extra IS NULL condition is why MySQL has the ref_or_null access method:

mysql> EXPLAIN -> SELECT outer_expr IN (SELECT t2.maybe_null_key ->   FROM t2, t3 WHERE ...) -> FROM t1;*************************** 1. row ***************************   id: 1  select_type: PRIMARY table: t1...*************************** 2. row ***************************   id: 2  select_type: DEPENDENT SUBQUERY table: t2 type: ref_or_nullpossible_keys: maybe_null_key  key: maybe_null_key  key_len: 5  ref: func rows: 2 Extra: Using where; Using index...

The unique_subquery and index_subquery subquery-specific access methods also have "or NULL" variants. However, they are not visible in EXPLAIN output, so you must use EXPLAIN EXTENDED followed by SHOW WARNINGS (note the checking NULL in the warning message):

mysql> EXPLAIN EXTENDED -> SELECT outer_expr IN (SELECT maybe_null_key FROM t2) FROM t1\G*************************** 1. row ***************************   id: 1  select_type: PRIMARY table: t1...*************************** 2. row ***************************   id: 2  select_type: DEPENDENT SUBQUERY table: t2 type: index_subquerypossible_keys: maybe_null_key  key: maybe_null_key  key_len: 5  ref: func rows: 2 Extra: Using indexmysql> SHOW WARNINGS\G*************************** 1. row ***************************  Level: Note   Code: 1003Message: select (`test`.`t1`.`outer_expr`, (((`test`.`t1`.`outer_expr`) in t2 on maybe_null_key checking NULL))) AS `outer_expr IN (SELECT maybe_null_key FROM t2)` from `test`.`t1`

The additional OR ... IS NULL condition makes query execution slightly more complicated (and some optimizations within the subquery become inapplicable), but generally this is tolerable.

The situation is much worse when outer_expr can be NULL. According to the SQL interpretation of NULL as "unknown value," NULL IN (SELECT inner_expr ...) should evaluate to:

  • NULL, if the SELECT produces any rows

  • FALSE, if the SELECT produces no rows

For proper evaluation, it is necessary to be able to check whether the SELECT has produced any rows at all, so outer_expr = inner_expr cannot be pushed down into the subquery. This is a problem, because many real world subqueries become very slow unless the equality can be pushed down.

Essentially, there must be different ways to execute the subquery depending on the value of outer_expr.

The optimizer chooses SQL compliance over speed, so it accounts for the possibility that outer_expr might be NULL.

If outer_expr is NULL, to evaluate the following expression, it is necessary to run the SELECT to determine whether it produces any rows:

NULL IN (SELECT inner_expr FROM ... WHERE subquery_where)

It is necessary to run the original SELECT here, without any pushed-down equalities of the kind mentioned earlier.

On the other hand, when outer_expr is not NULL, it is absolutely essential that this comparison:

outer_expr IN (SELECT inner_expr FROM ... WHERE subquery_where)

be converted to this expression that uses a pushed-down condition:

EXISTS (SELECT 1 FROM ... WHERE subquery_where AND outer_expr=inner_expr)

Without this conversion, subqueries will be slow. To solve the dilemma of whether to push down or not push down conditions into the subquery, the conditions are wrapped in "trigger" functions. Thus, an expression of the following form:

outer_expr IN (SELECT inner_expr FROM ... WHERE subquery_where)

is converted into:

EXISTS (SELECT 1 FROM ... WHERE subquery_where  AND trigcond(outer_expr=inner_expr))

More generally, if the subquery comparison is based on several pairs of outer and inner expressions, the conversion takes this comparison:

(oe_1, ..., oe_N) IN (SELECT ie_1, ..., ie_N FROM ... WHERE subquery_where)

and converts it to this expression:

EXISTS (SELECT 1 FROM ... WHERE subquery_where  AND trigcond(oe_1=ie_1)  AND ...  AND trigcond(oe_N=ie_N)   )

Each trigcond(X) is a special function that evaluates to the following values:

  • X when the "linked" outer expression oe_i is not NULL

  • TRUE when the "linked" outer expression oe_i is NULL

Note that trigger functions are not triggers of the kind that you create with CREATE TRIGGER.

Equalities that are wrapped into trigcond() functions are not first class predicates for the query optimizer. Most optimizations cannot deal with predicates that may be turned on and off at query execution time, so they assume any trigcond(X) to be an unknown function and ignore it. At the moment, triggered equalities can be used by those optimizations:

  • Reference optimizations: trigcond(X=Y [OR Y IS NULL]) can be used to construct ref, eq_ref, or ref_or_null table accesses.

  • Index lookup-based subquery execution engines: trigcond(X=Y) can be used to construct unique_subquery or index_subquery accesses.

  • Table-condition generator: If the subquery is a join of several tables, the triggered condition will be checked as soon as possible.

When the optimizer uses a triggered condition to create some kind of index lookup-based access (as for the first two items of the preceding list), it must have a fallback strategy for the case when the condition is turned off. This fallback strategy is always the same: Do a full table scan. In EXPLAIN output, the fallback shows up as Full scan on NULL key in the Extra column:

mysql> EXPLAIN SELECT t1.col1, -> t1.col1 IN (SELECT t2.key1 FROM t2 WHERE t2.col2=t1.col2) FROM t1\G*************************** 1. row ***************************   id: 1  select_type: PRIMARY table: t1 ...*************************** 2. row ***************************   id: 2  select_type: DEPENDENT SUBQUERY table: t2 type: index_subquerypossible_keys: key1  key: key1  key_len: 5  ref: func rows: 2 Extra: Using where; Full scan on NULL key

If you run EXPLAIN EXTENDED followed by SHOW WARNINGS, you can see the triggered condition:

*************************** 1. row ***************************  Level: Note   Code: 1003Message: select `test`.`t1`.`col1` AS `col1`, <in_optimizer>(`test`.`t1`.`col1`, <exists>(<index_lookup>(<cache>(`test`.`t1`.`col1`) in t2 on key1 checking NULL where (`test`.`t2`.`col2` = `test`.`t1`.`col2`) having trigcond(<is_not_null_test>(`test`.`t2`.`key1`))))) AS `t1.col1 IN (select t2.key1 from t2 where t2.col2=t1.col2)` from `test`.`t1`

The use of triggered conditions has some performance implications. A NULL IN (SELECT ...) expression now may cause a full table scan (which is slow) when it previously did not. This is the price paid for correct results (the goal of the trigger-condition strategy was to improve compliance and not speed).

For multiple-table subqueries, execution of NULL IN (SELECT ...) will be particularly slow because the join optimizer does not optimize for the case where the outer expression is NULL. It assumes that subquery evaluations with NULL on the left side are very rare, even if there are statistics that indicate otherwise. On the other hand, if the outer expression might be NULL but never actually is, there is no performance penalty.

To help the query optimizer better execute your queries, use these tips:

  • Declare a column as NOT NULL if it really is. (This also helps other aspects of the optimizer by simplifying condition testing for the column.)

  • If you do not need to distinguish a NULL from FALSE subquery result, you can easily avoid the slow execution path. Replace a comparison that looks like this:

    outer_expr IN (SELECT inner_expr FROM ...)

    with this expression:

    (outer_expr IS NOT NULL) AND (outer_expr IN (SELECT inner_expr FROM ...))

    Then NULL IN (SELECT ...) will never be evaluated because MySQL stops evaluating AND parts as soon as the expression result is clear.

Copyright © 1997, 2013, Oracle and/or its affiliates. All rights reserved. Legal Notices
(Sebelumnya) 8.9. Buffering and Caching9. Language Structure (Berikutnya)