New Features of InnoDB 1.1 - MySQL Manual

14.4. New Features of InnoDB 1.1

14.4.1. Introduction to InnoDB 1.1
14.4.2. Fast Index Creation in the InnoDB Storage Engine
14.4.3. InnoDB Data Compression
14.4.4. InnoDB File-Format Management
14.4.5. How InnoDB Stores Variable-Length Columns
14.4.6. InnoDB INFORMATION_SCHEMA tables
14.4.7. InnoDB Performance and Scalability Enhancements
14.4.8. Changes for Flexibility, Ease of Use and Reliability
14.4.9. Installing the InnoDB Storage Engine
14.4.10. Upgrading the InnoDB Storage Engine
14.4.11. Downgrading the InnoDB Storage Engine
14.4.12. InnoDB Storage Engine Change History
14.4.13. Third-Party Software
14.4.14. List of Parameters Changed in InnoDB 1.1 and InnoDB Plugin 1.0

14.4.1. Introduction to InnoDB 1.1

14.4.1.1. Features of the InnoDB Storage Engine
14.4.1.2. Obtaining and Installing the InnoDB Storage Engine
14.4.1.3. Viewing the InnoDB Storage Engine Version Number
14.4.1.4. Compatibility Considerations for Downgrade and Backup

InnoDB 1.1 combines the familiar reliability and performance of the InnoDB storage engine, with new performance and usability enhancements. InnoDB 1.1 includes all the features that were part of the InnoDB Plugin for MySQL 5.1, plus new features specific to MySQL 5.5 and higher.

Beginning with MySQL version 5.5, InnoDB is the default storage engine, rather than MyISAM, to promote greater data reliability and reducing the chance of corruption.

14.4.1.1. Features of the InnoDB Storage Engine

InnoDB in MySQL 5.5 contains several important new features:

Upward and Downward Compatibility

Note that the ability to use data compression and the new row format require the use of a new InnoDB file format called Barracuda. The previous file format, used by the built-in InnoDB in MySQL 5.1 and earlier, is now called Antelope and does not support these features, but does support the other features introduced with the InnoDB storage engine.

The InnoDB storage engine is upward compatible from standard InnoDB as built in to, and distributed with, MySQL. Existing databases can be used with the InnoDB Storage Engine for MySQL. The new parameter innodb_file_format can help protect upward and downward compatibility between InnoDB versions and database files, allowing users to enable or disable use of new features that can only be used with certain versions of InnoDB.

InnoDB since version 5.0.21 has a safety feature that prevents it from opening tables that are in an unknown format. However, the system tablespace may contain references to new-format tables that confuse the built-in InnoDB in MySQL 5.1 and earlier. These references are cleared in a slow shutdown.

With previous versions of InnoDB, no error would be returned until you try to access a table that is in a format "too new" for the software. To provide early feedback, InnoDB 1.1 checks the system tablespace before startup to ensure that the file format used in the database is supported by the storage engine. See Section 14.4.4.2.1, "Compatibility Check When InnoDB Is Started" for the details.

14.4.1.2. Obtaining and Installing the InnoDB Storage Engine

Starting with MySQL 5.4.2, you do not need to do anything special to get or install the most up-to-date InnoDB storage engine. From that version forward, the InnoDB storage engine in the server is what was formerly known as the InnoDB Plugin. Earlier versions of MySQL required some extra build and configuration steps to get the Plugin-specific features such as fast index creation and table compression.

Report any bugs in the InnoDB storage engine using the My Oracle Support site. For general discussions about InnoDB Storage Engine for MySQL, see http://forums.mysql.com/list.php?22.

14.4.1.3. Viewing the InnoDB Storage Engine Version Number

InnoDB storage engine releases are numbered with version numbers independent of MySQL release numbers. The initial release of the InnoDB storage engine is version 1.0, and it is designed to work with MySQL 5.1. Version 1.1 of the InnoDB storage engine is for MySQL 5.5 and up.

The first component of the InnoDB storage engine version number designates a major release level.
The second component corresponds to the MySQL release. The digit 0 corresponds to MySQL 5.1. The digit 1 corresponds to MySQL 5.5.
The third component indicates the specific release of the InnoDB storage engine (at a given major release level and for a specific MySQL release); only bug fixes and minor functional changes are introduced at this level.

Once you have installed the InnoDB storage engine, you can check its version number in three ways:

In the error log, it is printed during startup.
SELECT * FROM information_schema.plugins;
SELECT @@innodb_version;

The InnoDB storage engine writes its version number to the error log, which can be helpful in diagnosis of errors:

091105 12:28:06 InnoDB Plugin 1.0.5 started; log sequence number 46509

Note that the PLUGIN_VERSION column in the table INFORMATION_SCHEMA.PLUGINS does not display the third component of the version number, only the first and second components, as in 1.0.

14.4.1.4. Compatibility Considerations for Downgrade and Backup

Because InnoDB 1.1 supports the "Barracuda" file format, with new on-disk data structures within both the database and log files, pay special attention to file format compatibility with respect to the following scenarios:

Downgrading from MySQL 5.5 to the MySQL 5.1 or earlier (without the InnoDB Plugin enabled), or otherwise using earlier versions of MySQL with database files created by MySQL 5.5 and higher.
Using mysqldump.
Using MySQL replication.
Using MySQL Enterprise Backup or InnoDB Hot Backup.

WARNING: Once you create any tables with the Barracuda file format, take care to avoid crashes and corruptions when using those files with an earlier version of MySQL. It is strongly recommended that you use a "slow shutdown" (SET GLOBAL innodb_fast_shutdown=0) when stopping the MySQL server before downgrading to MySQL 5.1 or earlier. This ensures that the log files and other system information do not cause consistency issues or startup problems when using a prior version of MySQL.

WARNING: If you dump a database containing compressed tables with mysqldump, the dump file may contain CREATE TABLE statements that attempt to create compressed tables, or those using ROW_FORMAT=DYNAMIC in the new database. Therefore, be sure the new database is running the InnoDB storage engine, with the proper settings for innodb_file_format and innodb_file_per_table, if you want to have the tables re-created as they exist in the original database. Typically, when the mysqldump file is loaded, MySQL and InnoDB ignore CREATE TABLE options they do not recognize, and the table(s) are created in a format used by the running server.

WARNING: If you use MySQL replication, ensure all slaves are configured with the InnoDB storage engine, with the same settings for innodb_file_format and innodb_file_per_table. If you do not do so, and you create tables that require the new Barracuda file format, replication errors may occur. If a slave MySQL server is running an older version of MySQL, it ignores the CREATE TABLE options to create a compressed table or one with ROW_FORMAT=DYNAMIC, and creates the table uncompressed, with ROW_FORMAT=COMPACT.

WARNING: Version 3.0 of InnoDB Hot Backup does not support the new Barracuda file format. Using InnoDB Hot Backup Version 3 to backup databases in this format causes unpredictable behavior. MySQL Enterprise Backup, the successor product to InnoDB Hot Backup, does support tables with the Barracuda file format. You can also back up such databases with mysqldump.

14.4.2. Fast Index Creation in the InnoDB Storage Engine

14.4.2.1. Overview of Fast Index Creation
14.4.2.2. Examples of Fast Index Creation
14.4.2.3. Implementation Details of Fast Index Creation
14.4.2.4. Concurrency Considerations for Fast Index Creation
14.4.2.5. How Crash Recovery Works with Fast Index Creation
14.4.2.6. Limitations of Fast Index Creation

In MySQL 5.5 and higher, or in MySQL 5.1 with the InnoDB Plugin, creating and dropping secondary indexes does not copy the contents of the entire table, making this operation much more efficient than with prior releases.

14.4.2.1. Overview of Fast Index Creation

With MySQL 5.5 and higher, or MySQL 5.1 with the InnoDB Plugin, creating and dropping secondary indexes for InnoDB tables is much faster than before. Historically, adding or dropping an index on a table with existing data could be very slow. The CREATE INDEX and DROP INDEX statements worked by creating a new, empty table defined with the requested set of indexes, then copying the existing rows to the new table one-by-one, updating the indexes as the rows are inserted. After all rows from the original table were copied, the old table was dropped and the copy was renamed with the name of the original table.

The performance speedup for fast index creation applies to secondary indexes, not to the primary key index. The rows of an InnoDB table are stored in a clustered index organized based on the primary key, forming what some database systems call an "index-organized table". Because the table structure is so closely tied to the primary key, redefining the primary key still requires copying the data.

This new mechanism also means that you can generally speed the overall process of creating and loading an indexed table by creating the table with only the clustered index, and adding the secondary indexes after the data is loaded.

Although no syntax changes are required in the CREATE INDEX or DROP INDEX commands, some factors affect the performance, space usage, and semantics of this operation (see Section 14.4.2.6, "Limitations of Fast Index Creation").

14.4.2.2. Examples of Fast Index Creation

It is possible to create multiple indexes on a table with one ALTER TABLE statement. This is relatively efficient, because the clustered index of the table needs to be scanned only once (although the data is sorted separately for each new index). For example:

CREATE TABLE T1(A INT PRIMARY KEY, B INT, C CHAR(1)) ENGINE=InnoDB;INSERT INTO T1 VALUES (1,2,'a'), (2,3,'b'), (3,2,'c'), (4,3,'d'), (5,2,'e');COMMIT;ALTER TABLE T1 ADD INDEX (B), ADD UNIQUE INDEX (C);

The above statements create table T1 with the clustered index (primary key) on column A, insert several rows, and then build two new indexes on columns B and C. If there were many rows inserted into T1 before the ALTER TABLE statement, this approach is much more efficient than creating all the secondary indexes before loading the data.

You can also create the indexes one at a time, but then the clustered index of the table is scanned (as well as sorted) once for each CREATE INDEX statement. Thus, the following statements are not as efficient as the ALTER TABLE statement above, even though neither requires recreating the clustered index for table T1.

CREATE INDEX B ON T1 (B);CREATE UNIQUE INDEX C ON T1 (C);

Dropping InnoDB secondary indexes also does not require any copying of table data. You can equally quickly drop multiple indexes with a single ALTER TABLE statement or multiple DROP INDEX statements:

ALTER TABLE T1 DROP INDEX B, DROP INDEX C;

or:

DROP INDEX B ON T1;DROP INDEX C ON T1;

Restructuring the clustered index in InnoDB always requires copying the data in the table. For example, if you create a table without a primary key, InnoDB chooses one for you, which may be the first UNIQUE key defined on NOT NULL columns, or a system-generated key. Defining a PRIMARY KEY later causes the data to be copied, as in the following example:

CREATE TABLE T2 (A INT, B INT) ENGINE=InnoDB;INSERT INTO T2 VALUES (NULL, 1);ALTER TABLE T2 ADD PRIMARY KEY (B);

When you create a UNIQUE or PRIMARY KEY index, InnoDB must do some extra work. For UNIQUE indexes, InnoDB checks that the table contains no duplicate values for the key. For a PRIMARY KEY index, InnoDB also checks that none of the PRIMARY KEY columns contains a NULL. It is best to define the primary key when you create a table, so you need not rebuild the table later.

14.4.2.3. Implementation Details of Fast Index Creation

InnoDB has two types of indexes: the clustered index and secondary indexes. Since the clustered index contains the data values in its B-tree nodes, adding or dropping a clustered index does involve copying the data, and creating a new copy of the table. A secondary index, however, contains only the index key and the value of the primary key. This type of index can be created or dropped without copying the data in the clustered index. Because each secondary index contains copies of the primary key values (used to access the clustered index when needed), when you change the definition of the primary key, all secondary indexes are recreated as well.

Dropping a secondary index is simple. Only the internal InnoDB system tables and the MySQL data dictionary tables are updated to reflect the fact that the index no longer exists. InnoDB returns the storage used for the index to the tablespace that contained it, so that new indexes or additional table rows can use the space.

To add a secondary index to an existing table, InnoDB scans the table, and sorts the rows using memory buffers and temporary files in order by the values of the secondary index key columns. The B-tree is then built in key-value order, which is more efficient than inserting rows into an index in random order. Because the B-tree nodes are split when they fill, building the index in this way results in a higher fill-factor for the index, making it more efficient for subsequent access.

14.4.2.4. Concurrency Considerations for Fast Index Creation

While an InnoDB secondary index is being created or dropped, the table is locked in shared mode. Any writes to the table are blocked, but the data in the table can be read. When you alter the clustered index of a table, the table is locked in exclusive mode, because the data must be copied. Thus, during the creation of a new clustered index, all operations on the table are blocked.

A CREATE INDEX or ALTER TABLE statement for an InnoDB table always waits for currently executing transactions that are accessing the table to commit or roll back. ALTER TABLE statements that redefine an InnoDB primary key wait for all SELECT statements that access the table to complete, or their containing transactions to commit. No transactions whose execution spans the creation of the index can be accessing the table, because the original table is dropped when the clustered index is restructured.

Once a CREATE INDEX or ALTER TABLE statement that creates an InnoDB secondary index begins executing, queries can access the table for read access, but cannot update the table. If an ALTER TABLE statement is changing the clustered index for an InnoDB table, all queries wait until the operation completes.

A newly-created InnoDB secondary index contains only the committed data in the table at the time the CREATE INDEX or ALTER TABLE statement begins to execute. It does not contain any uncommitted values, old versions of values, or values marked for deletion but not yet removed from the old index.

Because a newly-created index contains only information about data current at the time the index was created, queries that need to see data that was deleted or changed before the index was created cannot use the index. The only queries that could be affected by this limitation are those executing in transactions that began before the creation of the index was begun. For such queries, unpredictable results could occur. Newer queries can use the index.

14.4.2.5. How Crash Recovery Works with Fast Index Creation

Although no data is lost if the server crashes while an ALTER TABLE statement is executing, the crash recovery process is different for clustered indexes and secondary indexes.

If the server crashes while creating an InnoDB secondary index, upon recovery, MySQL drops any partially created indexes. You must re-run the ALTER TABLE or CREATE INDEX statement.

When a crash occurs during the creation of an InnoDB clustered index, recovery is more complicated, because the data in the table must be copied to an entirely new clustered index. Remember that all InnoDB tables are stored as clustered indexes. In the following discussion, we use the word table and clustered index interchangeably.

MySQL creates the new clustered index by copying the existing data from the original InnoDB table to a temporary table that has the desired index structure. Once the data is completely copied to this temporary table, the original table is renamed with a different temporary table name. The temporary table comprising the new clustered index is renamed with the name of the original table, and the original table is dropped from the database.

If a system crash occurs while creating a new clustered index, no data is lost, but you must complete the recovery process using the temporary tables that exist during the process. Since it is rare to re-create a clustered index or re-define primary keys on large tables, or to encounter a system crash during this operation, this manual does not provide information on recovering from this scenario. Instead, please see the InnoDB web site: http://www.innodb.com/support/tips.

14.4.2.6. Limitations of Fast Index Creation

Take the following considerations into account when creating or dropping InnoDB indexes:

During index creation, files are written to the temporary directory ($TMPDIR on Unix, %TEMP% on Windows, or the value of the --tmpdir configuration variable). Each temporary file is large enough to hold one column that makes up the new index, and each one is removed as soon as it is merged into the final index.
The table is copied, rather than using Fast Index Creation when you create an index on a TEMPORARY TABLE. This has been reported as MySQL Bug #39833.
To avoid consistency issues between the InnoDB data dictionary and the MySQL data dictionary, the table is copied, rather than using Fast Index Creation when you use the ALTER TABLE ... RENAME COLUMN syntax.
The statement ALTER IGNORE TABLE t ADD UNIQUE INDEX does not delete duplicate rows. This has been reported as MySQL Bug #40344. The IGNORE keyword is ignored. If any duplicate rows exist, the operation fails with the following error message:
```
ERROR 23000: Duplicate entry '347' for key 'pl'
```
As noted above, a newly-created index contains only information about data current at the time the index was created. Therefore, you should not run queries in a transaction that might use a secondary index that did not exist at the beginning of the transaction. There is no way for InnoDB to access "old" data that is consistent with the rest of the data read by the transaction. See the discussion of locking in Section 14.4.2.4, "Concurrency Considerations for Fast Index Creation".
Prior to InnoDB storage engine 1.0.4, unexpected results could occur if a query attempts to use an index created after the start of the transaction containing the query. If an old transaction attempts to access a "too new" index, InnoDB storage engine 1.0.4 and later reports an error:
```
ERROR HY000: Table definition has changed, please retry transaction
```
As the error message suggests, committing (or rolling back) the transaction, and restarting it, cures the problem.
InnoDB storage engine 1.0.2 introduces some improvements in error handling when users attempt to drop indexes. See section Section 14.4.8.6, "Better Error Handling when Dropping Indexes" for details.
MySQL 5.5 does not support efficient creation or dropping of FOREIGN KEY constraints. Therefore, if you use ALTER TABLE to add or remove a REFERENCES constraint, the child table is copied, rather than using Fast Index Creation.

14.4.3. InnoDB Data Compression

14.4.3.1. Overview of Table Compression
14.4.3.2. Enabling Compression for a Table
14.4.3.3. Tuning InnoDB Compression
14.4.3.4. How Compression Works in InnoDB

By setting InnoDB configuration options, you can create tables where the data is stored in compressed form. The compression means less data is transferred between disk and memory, and takes up less space in memory. The benefits are amplified for tables with secondary indexes, because index data is compressed also. Compression can be especially important for SSD storage devices, because they tend to have lower capacity than HDD devices.

14.4.3.1. Overview of Table Compression

Because processors and cache memories have increased in speed more than disk storage devices, many workloads are I/O-bound. Data compression enables smaller database size, reduced I/O, and improved throughput, at the small cost of increased CPU utilization. Compression is especially valuable for read-intensive applications, on systems with enough RAM to keep frequently-used data in memory.

An InnoDB table created with ROW_FORMAT=COMPRESSED can use a smaller page size on disk than the usual 16KB default. Smaller pages require less I/O to read from and write to disk, which is especially valuable for SSD devices.

The page size is specified through the KEY_BLOCK_SIZE parameter. The different page size means the table must be in its own .ibd file rather than the system tablespace, which requires enabling the innodb_file_per_table option. The level of compression is the same regardless of the KEY_BLOCK_SIZE value. As you specify smaller values for KEY_BLOCK_SIZE, you get the I/O benefits of increasingly smaller pages. But if you specify a value that is too small, there is additional overhead to reorganize the pages when data values cannot be compressed enough to fit multiple rows in each page. There is a hard limit on how small KEY_BLOCK_SIZE can be for a table, based on the lengths of the key columns for each of its indexes. Specify a value that is too small, and the CREATE TABLE or ALTER TABLE statement fails.

In the buffer pool, the compressed data is held in small pages, with a page size based on the KEY_BLOCK_SIZE value. For extracting or updating the column values, InnoDB also creates a 16KB page in the buffer pool with the uncompressed data. Within the buffer pool, any updates to the uncompressed page are also re-written back to the equivalent compressed page. You might need to size your buffer pool to accommodate the additional data of both compressed and uncompressed pages, although the uncompressed pages are evicted from the buffer pool when space is needed, and then uncompressed again on the next access.

14.4.3.2. Enabling Compression for a Table

The default uncompressed size of InnoDB data pages is 16KB. You can use the attributes ROW_FORMAT=COMPRESSED, KEY_BLOCK_SIZE, or both in the CREATE TABLE and ALTER TABLE statements to enable table compression. Depending on the combination of option values, InnoDB uses a page size of 1KB, 2KB, 4KB, 8KB, or 16KB for the .ibd file of the table. (The actual compression algorithm is not affected by the KEY_BLOCK_SIZE value.)

Note

Compression is applicable to tables, not to individual rows, despite the option name ROW_FORMAT.

To create a compressed table, you might use a statement like this:

CREATE TABLE name (column1 INT PRIMARY KEY)  ENGINE=InnoDB ROW_FORMAT=COMPRESSED  KEY_BLOCK_SIZE=4;

If you specify ROW_FORMAT=COMPRESSED but not KEY_BLOCK_SIZE, the default compressed page size of 8KB is used. If KEY_BLOCK_SIZE is specified, you can omit the attribute ROW_FORMAT=COMPRESSED.

Setting KEY_BLOCK_SIZE=16 typically does not result in much compression, since the normal InnoDB page size is 16KB. This setting may still be useful for tables with many long BLOB, VARCHAR or TEXT columns, because such values often do compress well, and might therefore require fewer "overflow" pages as described in Section 14.4.3.4, " Compressing BLOB, VARCHAR, and TEXT Columns ".

All indexes of a table (including the clustered index) are compressed using the same page size, as specified in the CREATE TABLE or ALTER TABLE statement. Table attributes such as ROW_FORMAT and KEY_BLOCK_SIZE are not part of the CREATE INDEX syntax, and are ignored if they are specified (although you see them in the output of the SHOW CREATE TABLE statement).

14.4.3.2.1. Configuration Parameters for Compression

Compressed tables are stored in a format that previous versions of InnoDB cannot process. To preserve downward compatibility of database files, compression can be specified only when the Barracuda data file format is enabled using the configuration parameter innodb_file_format.

Table compression is also not available for the InnoDB system tablespace. The system tablespace (space 0, the ibdata* files) may contain user data, but it also contains internal InnoDB system information, and therefore is never compressed. Thus, compression applies only to tables (and indexes) stored in their own tablespaces.

To use compression, enable the file-per-table mode using the configuration parameter innodb_file_per_table and enable the Barracuda disk file format using the parameter innodb_file_format. If necessary, you can set these parameters in the MySQL option file my.cnf or my.ini, or with the SET statement without shutting down the MySQL server.

Specifying ROW_FORMAT=COMPRESSED or KEY_BLOCK_SIZE in CREATE TABLE or ALTER TABLE statements produces these warnings if the Barracuda file format is not enabled. You can view them with the SHOW WARNINGS statement.

Level	Code	Message
Warning	1478	InnoDB: KEY_BLOCK_SIZE requires innodb_file_per_table.
Warning	1478	InnoDB: KEY_BLOCK_SIZE requires innodb_file_format=1
Warning	1478	InnoDB: ignoring KEY_BLOCK_SIZE=4.
Warning	1478	InnoDB: ROW_FORMAT=COMPRESSED requiresinnodb_file_per_table.
Warning	1478	InnoDB: assuming ROW_FORMAT=COMPACT.

Note

These messages are only warnings, not errors, and the table is created as if the options were not specified. When InnoDB "strict mode" (see Section 14.4.8.4, "InnoDB Strict Mode") is enabled, InnoDB generates an error, not a warning, for these cases. In strict mode, the table is not created if the current configuration does not permit using compressed tables.

The "non-strict" behavior is intended to permit you to import a mysqldump file into a database that does not support compressed tables, even if the source database contained compressed tables. In that case, MySQL creates the table in ROW_FORMAT=COMPACT instead of preventing the operation.

When you import the dump file into a new database, if you want to have the tables re-created as they exist in the original database, ensure the server is running the InnoDB storage engine with the proper settings for the configuration parameters innodb_file_format and innodb_file_per_table,

14.4.3.2.2. SQL Compression Syntax Warnings and Errors

The attribute KEY_BLOCK_SIZE is permitted only when ROW_FORMAT is specified as COMPRESSED or is omitted. Specifying a KEY_BLOCK_SIZE with any other ROW_FORMAT generates a warning that you can view with SHOW WARNINGS. However, the table is non-compressed; the specified KEY_BLOCK_SIZE is ignored).

Level	Code	Message
Warning	1478	InnoDB: ignoring KEY_BLOCK_SIZE=nunless ROW_FORMAT=COMPRESSED.

If you are running in InnoDB strict mode, the combination of a KEY_BLOCK_SIZE with any ROW_FORMAT other than COMPRESSED generates an error, not a warning, and the table is not created.

Table 14.4, "Meaning of CREATE TABLE and ALTER TABLE options" summarizes how the various options on CREATE TABLE and ALTER TABLE are handled.

Table 14.4. Meaning of CREATE TABLE andALTER TABLE options

Option	Usage	Description
ROW_FORMAT=?REDUNDANT	Storage format used prior to MySQL 5.0.3	Less efficient than ROW_FORMAT=COMPACT; for backward compatibility
ROW_FORMAT=?COMPACT	Default storage format since MySQL 5.0.3	Stores a prefix of 768 bytes of long column values in the clustered index page, with the remaining bytes stored in an overflow page
ROW_FORMAT=?DYNAMIC	Available only with innodb_file?_format=Barracuda	Store values within the clustered index page if they fit; if not, stores only a 20-byte pointer to an overflow page (no prefix)
ROW_FORMAT=?COMPRESSED	Available only with innodb_file?_format=Barracuda	Compresses the table and indexes using zlib to default compressed page size of 8K bytes; implies ROW_FORMAT=DYNAMIC
KEY_BLOCK_?SIZE=n	Available only with innodb_file?_format=Barracuda	Specifies compressed page size of 1, 2, 4, 8 or 16K bytes; implies ROW_FORMAT=DYNAMIC andROW_FORMAT=COMPRESSED

Table 14.5, "CREATE/ALTER TABLE Warnings and Errors when InnoDB Strict Mode is OFF" summarizes error conditions that occur with certain combinations of configuration parameters and options on the CREATE TABLE or ALTER TABLE statements, and how the options appear in the output of SHOW TABLE STATUS.

When InnoDB strict mode is OFF, InnoDB creates or alters the table, but may ignore certain settings, as shown below. You can see the warning messages in the MySQL error log. When InnoDB strict mode is ON, these specified combinations of options generate errors, and the table is not created or altered. You can see the full description of the error condition with SHOW ERRORS. For example:

mysql> CREATE TABLE x (id INT PRIMARY KEY, c INT)-> ENGINE=INNODB KEY_BLOCK_SIZE=33333;ERROR 1005 (HY000): Can't create table 'test.x' (errno: 1478)mysql> SHOW ERRORS;+-------+------+-------------------------------------------+ | Level | Code | Message   | +-------+------+-------------------------------------------+ | Error | 1478 | InnoDB: invalid KEY_BLOCK_SIZE=33333. | | Error | 1005 | Can't create table 'test.x' (errno: 1478) | +-------+------+-------------------------------------------+ 2 rows in set (0.00 sec)

Table 14.5. CREATE/ALTER TABLE Warnings and Errors when InnoDBStrict Mode is OFF

Syntax	Warning or Error Condition	Resulting ROW_FORMAT, as shown in SHOW TABLE STATUS
ROW_FORMAT=REDUNDANT	None	REDUNDANT
ROW_FORMAT=COMPACT	None	COMPACT
ROW_FORMAT=COMPRESSED or ROW_FORMAT=DYNAMIC or KEY_BLOCK_SIZE is specified	Ignored unless both innodb_file_format=Barracuda and innodb_file_per_table are enabled	COMPACT
Invalid KEY_BLOCK_SIZE is specified (not 1, 2, 4, 8 or 16)	KEY_BLOCK_SIZE is ignored	the requested one, or COMPACT by default
ROW_FORMAT=COMPRESSED and valid KEY_BLOCK_SIZE are specified	None; KEY_BLOCK_SIZE specified is used, not the 8K default	COMPRESSED
KEY_BLOCK_SIZE is specified with REDUNDANT, COMPACT or DYNAMIC row format	KEY_BLOCK_SIZE is ignored	REDUNDANT, COMPACT or DYNAMIC
ROW_FORMAT is not one of REDUNDANT, COMPACT, DYNAMIC or COMPRESSED	Ignored if recognized by the MySQL parser. Otherwise, an error isissued.	COMPACT or N/A

When InnoDB strict mode is ON (innodb_strict_mode=1), the InnoDB storage engine rejects invalid ROW_FORMAT or KEY_BLOCK_SIZE parameters. For compatibility with earlier versions of InnoDB, strict mode is not enabled by default; instead, InnoDB issues warnings (not errors) for ignored invalid parameters.

Note that it is not possible to see the chosen KEY_BLOCK_SIZE using SHOW TABLE STATUS. The statement SHOW CREATE TABLE displays the KEY_BLOCK_SIZE (even if it was ignored by InnoDB). The real compressed page size inside InnoDB cannot be displayed by MySQL.

14.4.3.3. Tuning InnoDB Compression

Most often, the internal optimizations in InnoDB described in Section 14.4.3.4, " InnoDB Data Storage and Compression " ensure that the system runs well with compressed data. However, because the efficiency of compression depends on the nature of your data, there are some factors you should consider to get best performance. You need to choose which tables to compress, and what compressed page size to use. You might also adjust the size of the buffer pool based on run-time performance characteristics, such as the amount of time the system spends compressing and uncompressing data.

When to Use Compression

In general, compression works best on tables that include a reasonable number of character string columns and where the data is read far more often than it is written. Because there are no guaranteed ways to predict whether or not compression benefits a particular situation, always test with a specific workload and data set running on a representative configuration. Consider the following factors when deciding which tables to compress.

Data Characteristics and Compression

A key determinant of the efficiency of compression in reducing the size of data files is the nature of the data itself. Recall that compression works by identifying repeated strings of bytes in a block of data. Completely randomized data is the worst case. Typical data often has repeated values, and so compresses effectively. Character strings often compress well, whether defined in CHAR, VARCHAR, TEXT or BLOB columns. On the other hand, tables containing mostly binary data (integers or floating point numbers) or data that is previously compressed (for example JPEG or PNG images) may not generally compress well, significantly or at all.

You choose whether to turn on compression for each InnoDB tables. A table and all of its indexes use the same (compressed) page size. It might be that the primary key (clustered) index, which contains the data for all columns of a table, compresses more effectively than the secondary indexes. For those cases where there are long rows, the use of compression might result in long column values being stored "off-page", as discussed in Section 14.4.5.3, "Barracuda File Format: DYNAMIC and COMPRESSED Row Formats". Those overflow pages may compress well. Given these considerations, for many applications, some tables compress more effectively than others, and you might find that your workload performs best only with a subset of tables compressed.

Experimenting is the only way to determine whether or not to compress a particular table. InnoDB compresses data in 16K chunks corresponding to the uncompressed page size, and in addition to user data, the page format includes some internal system data that is not compressed. Compression utilities compress an entire stream of data, and so may find more repeated strings across the entire input stream than InnoDB would find in a table compressed in 16K chunks. But you can get a sense of how compression efficiency by using a utility that implements LZ77 compression (such as gzip or WinZip) on your data file.

Another way to test compression on a specific table is to copy some data from your uncompressed table to a similar, compressed table (having all the same indexes) and look at the size of the resulting file. When you do so (if nothing else using compression is running), you can examine the ratio of successful compression operations to overall compression operations. (In the INNODB_CMP table, compare COMPRESS_OPS to COMPRESS_OPS_OK. See INNODB_CMP for more information.) If a high percentage of compression operations complete successfully, the table might be a good candidate for compression.

Compression and Application and Schema Design

Decide whether to compress data in your application or in the InnoDB table. It is usually not sensible to store data that is compressed by an application in an InnoDB compressed table. Further compression is extremely unlikely, and the attempt to compress just wastes CPU cycles.

Compressing in the Database

The InnoDB table compression is automatic and applies to all columns and index values. The columns can still be tested with operators such as LIKE, and sort operations can still use indexes even when the index values are compressed. Because indexes are often a significant fraction of the total size of a database, compression could result in significant savings in storage, I/O or processor time. The compression and decompression operations happen on the database server, which likely is a powerful system that is sized to handle the expected load.

Compressing in the Application

If you compress data such as text in your application, before it is inserted into the database, You might save overhead for data that does not compress well by compressing some columns and not others. This approach uses CPU cycles for compression and uncompression on the client machine rather than the database server, which might be appropriate for a distributed application with many clients, or where the client machine has spare CPU cycles.

Hybrid Approach

Of course, it is possible to combine these approaches. For some applications, it may be appropriate to use some compressed tables and some uncompressed tables. It may be best to externally compress some data (and store it in uncompressed InnoDB tables) and allow InnoDB to compress (some of) the other tables in the application. As always, up-front design and real-life testing are valuable in reaching the right decision.

Workload Characteristics and Compression

In addition to choosing which tables to compress (and the page size), the workload is another key determinant of performance. If the application is dominated by reads, rather than updates, fewer pages need to be reorganized and recompressed after the index page runs out of room for the per-page "modification log" that InnoDB maintains for compressed data. If the updates predominantly change non-indexed columns or those containing BLOBs or large strings that happen to be stored "off-page", the overhead of compression may be acceptable. If the only changes to a table are INSERTs that use a monotonically increasing primary key, and there are few secondary indexes, there is little need to reorganize and recompress index pages. Since InnoDB can "delete-mark" and delete rows on compressed pages "in place" by modifying uncompressed data, DELETE operations on a table are relatively efficient.

For some environments, the time it takes to load data can be as important as run-time retrieval. Especially in data warehouse environments, many tables may be read-only or read-mostly. In those cases, it might or might not be acceptable to pay the price of compression in terms of increased load time, unless the resulting savings in fewer disk reads or in storage cost is significant.

Fundamentally, compression works best when the CPU time is available for compressing and uncompressing data. Thus, if your workload is I/O bound, rather than CPU-bound, you might find that compression can improve overall performance. When you test your application performance with different compression configurations, test on a platform similar to the planned configuration of the production system.

Configuration Characteristics and Compression

Reading and writing database pages from and to disk is the slowest aspect of system performance. Compression attempts to reduce I/O by using CPU time to compress and uncompress data, and is most effective when I/O is a relatively scarce resource compared to processor cycles.

This is often especially the case when running in a multi-user environment with fast, multi-core CPUs. When a page of a compressed table is in memory, InnoDB often uses an additional 16K in the buffer pool for an uncompressed copy of the page. The adaptive LRU algorithm in the InnoDB storage engine attempts to balance the use of memory between compressed and uncompressed pages to take into account whether the workload is running in an I/O-bound or CPU-bound manner. Still, a configuration with more memory dedicated to the InnoDB buffer pool tends to run better when using compressed tables than a configuration where memory is highly constrained.

Choosing the Compressed Page Size

The optimal setting of the compressed page size depends on the type and distribution of data that the table and its indexes contain. The compressed page size should always be bigger than the maximum record size, or operations may fail as noted in Section 14.4.3.4, " Compression of B-Tree Pages ".

Setting the compressed page size too large wastes some space, but the pages do not have to be compressed as often. If the compressed page size is set too small, inserts or updates may require time-consuming recompression, and the B-tree nodes may have to be split more frequently, leading to bigger data files and less efficient indexing.

Typically, you set the compressed page size to 8K or 4K bytes. Given that the maximum InnoDB record size is around 8K, KEY_BLOCK_SIZE=8 is usually a safe choice.

Monitoring Compression at Runtime

Overall application performance, CPU and I/O utilization and the size of disk files are good indicators of how effective compression is for your application.

To dig deeper into performance considerations for compressed tables, you can monitor compression performance at run time. using the Information Schema tables described in Example 14.1, "Using the Compression Information Schema Tables". These tables reflect the internal use of memory and the rates of compression used overall.

The INNODB_CMP tables report information about compression activity for each compressed page size (KEY_BLOCK_SIZE) in use. The information in these tables is system-wide, and includes summary data across all compressed tables in your database. You can use this data to help decide whether or not to compress a table by examining these tables when no other compressed tables are being accessed.

The key statistics to consider are the number of, and amount of time spent performing, compression and uncompression operations. Since InnoDB must split B-tree nodes when they are too full to contain the compressed data following a modification, compare the number of "successful" compression operations with the number of such operations overall. Based on the information in the INNODB_CMP tables and overall application performance and hardware resource utilization, you might make changes in your hardware configuration, adjust the size of the InnoDB buffer pool, choose a different page size, or select a different set of tables to compress.

If the amount of CPU time required for compressing and uncompressing is high, changing to faster CPUs, or those with more cores, can help improve performance with the same data, application workload and set of compressed tables. Increasing the size of the InnoDB buffer pool might also help performance, so that more uncompressed pages can stay in memory, reducing the need to uncompress pages that exist in memory only in compressed form.

A large number of compression operations overall (compared to the number of INSERT, UPDATE and DELETE operations in your application and the size of the database) could indicate that some of your compressed tables are being updated too heavily for effective compression. If so, choose a larger page size, or be more selective about which tables you compress.

If the number of "successful" compression operations (COMPRESS_OPS_OK) is a high percentage of the total number of compression operations (COMPRESS_OPS), then the system is likely performing well. If the ratio is low, then InnoDB is reorganizing, recompressing, and splitting B-tree nodes more often than is desirable. In this case, avoid compressing some tables, or increase KEY_BLOCK_SIZE for some of the compressed tables. You might turn off compression for tables that cause the number of "compression failures" in your application to be more than 1% or 2% of the total. (Such a failure ratio might be acceptable during a temporary operation such as a data load).

14.4.3.4. How Compression Works in InnoDB

This section describes some internal implementation details about compression in InnoDB. The information presented here may be helpful in tuning for performance, but is not necessary to know for basic use of compression.

Compression Algorithms

Some operating systems implement compression at the file system level. Files are typically divided into fixed-size blocks that are compressed into variable-size blocks, which easily leads into fragmentation. Every time something inside a block is modified, the whole block is recompressed before it is written to disk. These properties make this compression technique unsuitable for use in an update-intensive database system.

InnoDB implements compression with the help of the well-known zlib library, which implements the LZ77 compression algorithm. This compression algorithm is mature, robust, and efficient in both CPU utilization and in reduction of data size. The algorithm is "lossless", so that the original uncompressed data can always be reconstructed from the compressed form. LZ77 compression works by finding sequences of data that are repeated within the data to be compressed. The patterns of values in your data determine how well it compresses, but typical user data often compresses by 50% or more.

Unlike compression performed by an application, or compression features of some other database management systems, InnoDB compression applies both to user data and to indexes. In many cases, indexes can constitute 40-50% or more of the total database size, so this difference is significant. When compression is working well for a data set, the size of the InnoDB data files (the .idb files) is 25% to 50% of the uncompressed size or possibly smaller. Depending on the workload, this smaller database can in turn lead to a reduction in I/O, and an increase in throughput, at a modest cost in terms of increased CPU utilization.

InnoDB Data Storage and Compression

All user data in InnoDB is stored in pages comprising a B-tree index (the clustered index). In some other database systems, this type of index is called an "index-organized table". Each row in the index node contains the values of the (user-specified or system-generated) primary key and all the other columns of the table.

Secondary indexes in InnoDB are also B-trees, containing pairs of values: the index key and a pointer to a row in the clustered index. The pointer is in fact the value of the primary key of the table, which is used to access the clustered index if columns other than the index key and primary key are required. Secondary index records must always fit on a single B-tree page.

The compression of B-tree nodes (of both clustered and secondary indexes) is handled differently from compression of overflow pages used to store long VARCHAR, BLOB, or TEXT columns, as explained in the following sections.

Compression of B-Tree Pages

Because they are frequently updated, B-tree pages require special treatment. It is important to minimize the number of times B-tree nodes are split, as well as to minimize the need to uncompress and recompress their content.

One technique InnoDB uses is to maintain some system information in the B-tree node in uncompressed form, thus facilitating certain in-place updates. For example, this allows rows to be delete-marked and deleted without any compression operation.

In addition, InnoDB attempts to avoid unnecessary uncompression and recompression of index pages when they are changed. Within each B-tree page, the system keeps an uncompressed "modification log" to record changes made to the page. Updates and inserts of small records may be written to this modification log without requiring the entire page to be completely reconstructed.

When the space for the modification log runs out, InnoDB uncompresses the page, applies the changes and recompresses the page. If recompression fails, the B-tree nodes are split and the process is repeated until the update or insert succeeds.

Generally, InnoDB requires that each B-tree page can accommodate at least two records. For compressed tables, this requirement has been relaxed. Leaf pages of B-tree nodes (whether of the primary key or secondary indexes) only need to accommodate one record, but that record must fit in uncompressed form, in the per-page modification log. Starting with InnoDB storage engine version 1.0.2, and if InnoDB strict mode is ON, the InnoDB storage engine checks the maximum row size during CREATE TABLE or CREATE INDEX. If the row does not fit, the following error message is issued: ERROR HY000: Too big row.

If you create a table when InnoDB strict mode is OFF, and a subsequent INSERT or UPDATE statement attempts to create an index entry that does not fit in the size of the compressed page, the operation fails with ERROR 42000: Row size too large. (This error message does not name the index for which the record is too large, or mention the length of the index record or the maximum record size on that particular index page.) To solve this problem, rebuild the table with ALTER TABLE and select a larger compressed page size (KEY_BLOCK_SIZE), shorten any column prefix indexes, or disable compression entirely with ROW_FORMAT=DYNAMIC or ROW_FORMAT=COMPACT.

Compressing BLOB, VARCHAR, and TEXT Columns

In a clustered index, BLOB, VARCHAR, and TEXT columns that are not part of the primary key may be stored on separately allocated ("overflow") pages. We call these off-page columns whose values are stored on singly-linked lists of overflow pages.

For tables created in ROW_FORMAT=DYNAMIC or ROW_FORMAT=COMPRESSED, the values of BLOB, TEXT, or VARCHAR columns may be stored fully off-page, depending on their length and the length of the entire row. For columns that are stored off-page, the clustered index record only contains 20-byte pointers to the overflow pages, one per column. Whether any columns are stored off-page depends on the page size and the total size of the row. When the row is too long to fit entirely within the page of the clustered index, InnoDB chooses the longest columns for off-page storage until the row fits on the clustered index page. As noted above, if a row does not fit by itself on a compressed page, an error occurs.

Tables created in previous versions of InnoDB use the Antelope file format, which supports only ROW_FORMAT=REDUNDANT and ROW_FORMAT=COMPACT. In these formats, InnoDB stores the first 768 bytes of BLOB, VARCHAR, and TEXT columns in the clustered index record along with the primary key. The 768-byte prefix is followed by a 20-byte pointer to the overflow pages that contain the rest of the column value.

When a table is in COMPRESSED format, all data written to overflow pages is compressed "as is"; that is, InnoDB applies the zlib compression algorithm to the entire data item. Other than the data, compressed overflow pages contain an uncompressed header and trailer comprising a page checksum and a link to the next overflow page, among other things. Therefore, very significant storage savings can be obtained for longer BLOB, TEXT, or VARCHAR columns if the data is highly compressible, as is often the case with text data (but not previously compressed images).

The overflow pages are of the same size as other pages. A row containing ten columns stored off-page occupies ten overflow pages, even if the total length of the columns is only 8K bytes. In an uncompressed table, ten uncompressed overflow pages occupy 160K bytes. In a compressed table with an 8K page size, they occupy only 80K bytes. Thus, it is often more efficient to use compressed table format for tables with long column values.

Using a 16K compressed page size can reduce storage and I/O costs for BLOB, VARCHAR, or TEXT columns, because such data often compress well, and might therefore require fewer "overflow" pages, even though the B-tree nodes themselves take as many pages as in the uncompressed form.

Compression and the InnoDB Buffer Pool

In a compressed InnoDB table, every compressed page (whether 1K, 2K, 4K or 8K) corresponds to an uncompressed page of 16K bytes. To access the data in a page, InnoDB reads the compressed page from disk if it is not already in the buffer pool, then uncompresses the page to its original 16K byte form. This section describes how InnoDB manages the buffer pool with respect to pages of compressed tables.

To minimize I/O and to reduce the need to uncompress a page, at times the buffer pool contains both the compressed and uncompressed form of a database page. To make room for other required database pages, InnoDB may "evict" from the buffer pool an uncompressed page, while leaving the compressed page in memory. Or, if a page has not been accessed in a while, the compressed form of the page may be written to disk, to free space for other data. Thus, at any given time, the buffer pool may contain both the compressed and uncompressed forms of the page, or only the compressed form of the page, or neither.

InnoDB keeps track of which pages to keep in memory and which to evict using a least-recently-used (LRU) list, so that "hot" or frequently accessed data tends to stay in memory. When compressed tables are accessed, InnoDB uses an adaptive LRU algorithm to achieve an appropriate balance of compressed and uncompressed pages in memory. This adaptive algorithm is sensitive to whether the system is running in an I/O-bound or CPU-bound manner. The goal is to avoid spending too much processing time uncompressing pages when the CPU is busy, and to avoid doing excess I/O when the CPU has spare cycles that can be used for uncompressing compressed pages (that may already be in memory). When the system is I/O-bound, the algorithm prefers to evict the uncompressed copy of a page rather than both copies, to make more room for other disk pages to become memory resident. When the system is CPU-bound, InnoDB prefers to evict both the compressed and uncompressed page, so that more memory can be used for "hot" pages and reducing the need to uncompress data in memory only in compressed form.

Compression and the InnoDB Log Files

Before a compressed page is written to a database file, InnoDB writes a copy of the page to the redo log (if it has been recompressed since the last time it was written to the database). This is done to ensure that redo logs will always be usable, even if a future version of InnoDB uses a slightly different compression algorithm. Therefore, some increase in the size of log files, or a need for more frequent checkpoints, can be expected when using compression. The amount of increase in the log file size or checkpoint frequency depends on the number of times compressed pages are modified in a way that requires reorganization and recompression.

Note that the redo log file format (and the database file format) are different from previous releases when using compression. The MySQL Enterprise Backup product does support this latest Barracuda file format for compressed InnoDB tables. The older InnoDB Hot Backup product can only back up tables using the file format Antelope, and thus does not support InnoDB tables that use compression.

14.4.4. InnoDB File-Format Management

14.4.4.1. Enabling File Formats
14.4.4.2. Verifying File Format Compatibility
14.4.4.3. Identifying the File Format in Use
14.4.4.4. Downgrading the File Format
14.4.4.5. Future InnoDB File Formats

As InnoDB evolves, new on-disk data structures are sometimes required to support new features. Features such as compressed tables (see Section 14.4.3, "InnoDB Data Compression"), and long variable-length columns stored off-page (see Section 14.4.5, "How InnoDB Stores Variable-Length Columns") require data file formats that are not compatible with prior versions of InnoDB. These features both require use of the new Barracuda file format.

Note

All other new features are compatible with the original Antelope file format and do not require the Barracuda file format.

This section discusses how to specify the file format for new InnoDB tables, compatibility of different file formats between MySQL releases,

Named File Formats. InnoDB 1.1 has the idea of a named file format and a configuration parameter to enable the use of features that require use of that format. The new file format is the Barracuda format, and the original InnoDB file format is called Antelope. Compressed tables and the new row format that stores long columns "off-page" require the use of the Barracuda file format or newer. Future versions of InnoDB may introduce a series of file formats, identified with the names of animals, in ascending alphabetic order.

14.4.4.1. Enabling File Formats

The configuration parameter innodb_file_format controls whether such statements as CREATE TABLE and ALTER TABLE can be used to create tables that depend on support for the Barracuda file format.

Although Oracle recommends using the Barracuda format for new tables where practical, in MySQL 5.5 the default file format is still Antelope, for maximum compatibility with replication configurations containing different MySQL releases.

The file format is a dynamic, global parameter that can be specified in the MySQL option file (my.cnf or my.ini) or changed with the SET GLOBAL command.

14.4.4.2. Verifying File Format Compatibility

InnoDB 1.1 incorporates several checks to guard against the possible crashes and data corruptions that might occur if you run an older release of the MySQL server on InnoDB data files using a newer file format. These checks take place when the server is started, and when you first access a table. This section describes these checks, how you can control them, and error and warning conditions that might arise.

Backward Compatibility

Considerations of backward compatibility only apply when using a recent version of InnoDB (the InnoDB Plugin, or MySQL 5.5 and higher with InnoDB 1.1) alongside an older one (MySQL 5.1 or earlier, with the built-in InnoDB rather than the InnoDB Plugin). To minimize the chance of compatibility issues, you can standardize on the InnoDB Plugin for all your MySQL 5.1 and earlier database servers.

In general, a newer version of InnoDB may create a table or index that cannot safely be read or written with a prior version of InnoDB without risk of crashes, hangs, wrong results or corruptions. InnoDB 1.1 includes a mechanism to guard against these conditions, and to help preserve compatibility among database files and versions of InnoDB. This mechanism lets you take advantage of some new features of an InnoDB release (such as performance improvements and bug fixes), and still preserve the option of using your database with a prior version of InnoDB, by preventing accidental use of new features that create downward-incompatible disk files.

If a version of InnoDB supports a particular file format (whether or not that format is the default), you can query and update any table that requires that format or an earlier format. Only the creation of new tables using new features is limited based on the particular file format enabled. Conversely, if a tablespace contains a table or index that uses a file format that is not supported by the currently running software, it cannot be accessed at all, even for read access.

The only way to "downgrade" an InnoDB tablespace to an earlier file format is to copy the data to a new table, in a tablespace that uses the earlier format. This can be done with the ALTER TABLE statement, as described in Section 14.4.4.4, "Downgrading the File Format".

The easiest way to determine the file format of an existing InnoDB tablespace is to examine the properties of the table it contains, using the SHOW TABLE STATUS command or querying the table INFORMATION_SCHEMA.TABLES. If the Row_format of the table is reported as 'Compressed' or 'Dynamic', the tablespace containing the table uses the Barracuda format. Otherwise, it uses the prior InnoDB file format, Antelope.

Internal Details

Every InnoDB per-table tablespace (represented by a *.ibd file) file is labeled with a file format identifier. The system tablespace (represented by the ibdata files) is tagged with the "highest" file format in use in a group of InnoDB database files, and this tag is checked when the files are opened.

Creating a compressed table, or a table with ROW_FORMAT=DYNAMIC, updates the file header for the corresponding .ibd file and the table type in the InnoDB data dictionary with the identifier for the Barracuda file format. From that point forward, the table cannot be used with a version of InnoDB that does not support this new file format. To protect against anomalous behavior, InnoDB version 5.0.21 and later performs a compatibility check when the table is opened. (In many cases, the ALTER TABLE statement recreates a table and thus changes its properties. The special case of adding or dropping indexes without rebuilding the table is described in Section 14.4.2, "Fast Index Creation in the InnoDB Storage Engine".)

Definition of ib-file set

To avoid confusion, for the purposes of this discussion we define the term "ib-file set" to mean the set of operating system files that InnoDB manages as a unit. The ib-file set includes the following files:

The system tablespace (one or more ibdata files) that contain internal system information (including internal catalogs and undo information) and may include user data and indexes.
Zero or more single-table tablespaces (also called "file per table" files, named *.ibd files).
InnoDB log files; usually two, ib_logfile0 and ib_logfile1. Used for crash recovery and in backups.

An "ib-file set" does not include the corresponding .frm files that contain metadata about InnoDB tables. The .frm files are created and managed by MySQL, and can sometimes get out of sync with the internal metadata in InnoDB.

Multiple tables, even from more than one database, can be stored in a single "ib-file set". (In MySQL, a "database" is a logical collection of tables, what other systems refer to as a "schema" or "catalog".)

14.4.4.2.1. Compatibility Check When InnoDB Is Started

To prevent possible crashes or data corruptions when InnoDB opens an ib-file set, it checks that it can fully support the file formats in use within the ib-file set. If the system is restarted following a crash, or a "fast shutdown" (i.e., innodb_fast_shutdown is greater than zero), there may be on-disk data structures (such as redo or undo entries, or doublewrite pages) that are in a "too-new" format for the current software. During the recovery process, serious damage can be done to your data files if these data structures are accessed. The startup check of the file format occurs before any recovery process begins, thereby preventing consistency issues with the new tables or startup problems for the MySQL server.

Beginning with version InnoDB 1.0.1, the system tablespace records an identifier or tag for the "highest" file format used by any table in any of the tablespaces that is part of the ib-file set. Checks against this file format tag are controlled by the configuration parameter innodb_file_format_check, which is ON by default.

If the file format tag in the system tablespace is newer or higher than the highest version supported by the particular currently executing software and if innodb_file_format_check is ON, the following error is issued when the server is started:

InnoDB: Error: the system tablespace is in afile format that this version doesn't support

You can also set innodb_file_format to a file format name. Doing so prevents InnoDB from starting if the current software does not support the file format specified. It also sets the "high water mark" to the value you specify. The ability to set innodb_file_format_check will be useful (with future releases of InnoDB) if you manually "downgrade" all of the tables in an ib-file set (as described in Section 14.4.11, "Downgrading the InnoDB Storage Engine"). You can then rely on the file format check at startup if you subsequently use an older version of InnoDB to access the ib-file set.

In some limited circumstances, you might want to start the server and use an ib-file set that is in a "too new" format (one that is not supported by the software you are using). If you set the configuration parameter innodb_file_format_check to OFF, InnoDB opens the database, but issues this warning message in the error log:

InnoDB: Warning: the system tablespace is in afile format that this version doesn't support

Note

This is a very dangerous setting, as it permits the recovery process to run, possibly corrupting your database if the previous shutdown was a crash or "fast shutdown". You should only set innodb_file_format_check to OFF if you are sure that the previous shutdown was done with innodb_fast_shutdown=0, so that essentially no recovery process occurs. In a future release, this parameter setting may be renamed from OFF to UNSAFE. (However, until there are newer releases of InnoDB that support additional file formats, even disabling the startup checking is in fact "safe".)

The parameter innodb_file_format_check affects only what happens when a database is opened, not subsequently. Conversely, the parameter innodb_file_format (which enables a specific format) only determines whether or not a new table can be created in the enabled format and has no effect on whether or not a database can be opened.

The file format tag is a "high water mark", and as such it is increased after the server is started, if a table in a "higher" format is created or an existing table is accessed for read or write (assuming its format is supported). If you access an existing table in a format higher than the format the running software supports, the system tablespace tag is not updated, but table-level compatibility checking applies (and an error is issued), as described in Section 14.4.4.2.2, "Compatibility Check When a Table Is Opened". Any time the high water mark is updated, the value of innodb_file_format_check is updated as well, so the command SELECT @@innodb_file_format_check; displays the name of the newest file format known to be used by tables in the currently open ib-file set and supported by the currently executing software.

To best illustrate this behavior, consider the scenario described in Table 14.6, "InnoDB Data File Compatibility and Related InnoDB Parameters". Imagine that some future version of InnoDB supports the Cheetah format and that an ib-file set has been used with that version.

Table 14.6. InnoDB Data File Compatibility and Related InnoDB Parameters

innodb file format check	innodb file format	Highest file format used in ib-file set	Highest file format supported by InnoDB	Result
OFF	Antelope or Barracuda	Barracuda	Barracuda	Database can be opened; tables can be created which require Antelope or Barracuda file format
OFF	Antelope or Barracuda	Cheetah	Barracuda	Database can be opened with a warning, since the database contains files in a "too new" format; tables can be created in Antelope or Barracuda file format; tables in Cheetah format cannot be accessed
OFF	Cheetah	Barracuda	Barracuda	Database cannot be opened; innodb_file_format cannot be set to Cheetah
ON	Antelope or Barracuda	Barracuda	Barracuda	Database can be opened; tables can be created in Antelope or Barracuda file format
ON	Antelope or Barracuda	Cheetah	Barracuda	Database cannot be opened, since the database contains files in a "too new" format (Cheetah)
ON	Cheetah	Barracuda	Barracuda	Database cannot be opened; innodb_file_formatcannot be set to Cheetah

14.4.4.2.2. Compatibility Check When a Table Is Opened

When a table is first accessed, InnoDB (including some releases prior to InnoDB 1.0) checks that the file format of the tablespace in which the table is stored is fully supported. This check prevents crashes or corruptions that would otherwise occur when tables using a "too new" data structure are encountered.

All tables using any file format supported by a release can be read or written (assuming the user has sufficient privileges). The setting of the system configuration parameter innodb_file_format can prevent creating a new table that uses specific file formats, even if they are supported by a given release. Such a setting might be used to preserve backward compatibility, but it does not prevent accessing any table that uses any supported format.

As noted in Named File Formats, versions of MySQL older than 5.0.21 cannot reliably use database files created by newer versions if a new file format was used when a table was created. To prevent various error conditions or corruptions, InnoDB checks file format compatibility when it opens a file (for example, upon first access to a table). If the currently running version of InnoDB does not support the file format identified by the table type in the InnoDB data dictionary, MySQL reports the following error:

ERROR 1146 (42S02): Table 'test.t1' doesn't exist

InnoDB also writes a message to the error log:

InnoDB: table test/t1: unknown table type 33

The table type should be equal to the tablespace flags, which contains the file format version as discussed in Section 14.4.4.3, "Identifying the File Format in Use".

Versions of InnoDB prior to MySQL 4.1 did not include table format identifiers in the database files, and versions prior to MySQL 5.0.21 did not include a table format compatibility check. Therefore, there is no way to ensure proper operations if a table in a "too new" format is used with versions of InnoDB prior to 5.0.21.

The file format management capability in InnoDB 1.0 and higher (tablespace tagging and run-time checks) allows InnoDB to verify as soon as possible that the running version of software can properly process the tables existing in the database.

If you permit InnoDB to open a database containing files in a format it does not support (by setting the parameter innodb_file_format_check to OFF), the table-level checking described in this section still applies.

Users are strongly urged not to use database files that contain Barracuda file format tables with releases of InnoDB older than the MySQL 5.1 with the InnoDB Plugin. It is possible to "downgrade" such tables to the Antelope format with the procedure described in Section 14.4.4.4, "Downgrading the File Format".

14.4.4.3. Identifying the File Format in Use

After you enable a given innodb_file_format, this change applies only to newly created tables rather than existing ones. If you do create a new table, the tablespace containing the table is tagged with the "earliest" or "simplest" file format that is required for the table's features. For example, if you enable file format Barracuda, and create a new table that is not compressed and does not use ROW_FORMAT=DYNAMIC, the new tablespace that contains the table is tagged as using file format Antelope.

It is easy to identify the file format used by a given tablespace or table. The table uses the Barracuda format if the Row_format reported by SHOW CREATE TABLE or INFORMATION_SCHEMA.TABLES is one of 'Compressed' or 'Dynamic'. (The Row_format is a separate column; ignore the contents of the Create_options column, which may contain the string ROW_FORMAT.) If the table in a tablespace uses neither of those features, the file uses the format supported by prior releases of InnoDB, now called file format Antelope. Then, the Row_format is one of 'Redundant' or 'Compact'.

Internal Details

The file format identifier is written as part of the tablespace flags (a 32-bit number) in the *.ibd file in the 4 bytes starting at position 54 of the file, most significant byte first. (The first byte of the file is byte zero.) On some systems, you can display these bytes in hexadecimal with the command od -t x1 -j 54 -N 4 tablename.ibd. If all bytes are zero, the tablespace uses the Antelope file format (which is the format used by the standard InnoDB storage engine up to version 5.1). Otherwise, the least significant bit should be set in the tablespace flags, and the file format identifier is written in the bits 5 through 11. (Divide the tablespace flags by 32 and take the remainder after dividing the integer part of the result by 128.)

14.4.4.4. Downgrading the File Format

Each InnoDB tablespace file (with a name matching *.ibd) is tagged with the file format used to create its table and indexes. The way to downgrade the tablespace is to re-create the table and its indexes. The easiest way to recreate a table and its indexes is to use the command:

ALTER TABLE t ROW_FORMAT=COMPACT;

on each table that you want to downgrade. The COMPACT row format uses the file format Antelope. It was introduced in MySQL 5.0.3.

14.4.4.5. Future InnoDB File Formats

The file format used by the standard built-in InnoDB in MySQL 5.1 is the Antelope format. The file format introduced with InnoDB Plugin 1.0 is the Barracuda format. Although no new features have been announced that would require additional new file formats, the InnoDB file format mechanism allows for future enhancements.

For the sake of completeness, these are the file format names that might be used for future file formats: Antelope, Barracuda, Cheetah, Dragon, Elk, Fox, Gazelle, Hornet, Impala, Jaguar, Kangaroo, Leopard, Moose, Nautilus, Ocelot, Porpoise, Quail, Rabbit, Shark, Tiger, Urchin, Viper, Whale, Xenops, Yak and Zebra. These file formats correspond to the internal identifiers 0..25.

14.4.5. How InnoDB Stores Variable-Length Columns

14.4.5.1. Overview of InnoDB Row Storage
14.4.5.2. Specifying the Row Format for a Table
14.4.5.3. Barracuda File Format: DYNAMIC and COMPRESSED Row Formats
14.4.5.4. Antelope File Format: COMPACT andREDUNDANT Row Formats

This section discusses how certain InnoDB features, such as table compression and off-page storage of long columns, are controlled by the ROW_FORMAT clause of the CREATE TABLE statement. It discusses considerations for choosing the right row format and compatibility of row formats between MySQL releases.

14.4.5.1. Overview of InnoDB Row Storage

The storage for rows and associated columns affects performance for queries and DML operations. As more rows fit into a single disk page, queries and index lookups can work faster, less cache memory is required in the InnoDB buffer pool, and less I/O is required to write out updated values for the numeric and short string columns.

All data in InnoDB is stored in database pages that make up a B-tree index (the clustered index organized according to the primary key columns). Table data and indexes both use this type of structure. The nodes of the index data structure contain the values of all the columns in that row (for the clustered index) or the index columns and the primary key columns (for secondary indexes).

Variable-length columns are an exception to this rule. Columns such as BLOB and VARCHAR that are too long to fit on a B-tree page are stored on separately allocated disk pages called overflow pages. We call such columns off-page column. The values of these columns are stored on singly-linked lists of overflow pages, and each such column has its own list of one or more overflow pages. In some cases, all or a prefix of the long column value is stored in the B-tree, to avoid wasting storage and eliminating the need to read a separate page.

The Barracuda file format provides a new option (KEY_BLOCK_SIZE) to control how much column data is stored in the clustered index, and how much is placed on overflow pages.

14.4.5.2. Specifying the Row Format for a Table

You specify the row format for a table with the ROW_FORMAT clause of the CREATE TABLE and ALTER TABLE statements.

14.4.5.3. Barracuda File Format: DYNAMIC andCOMPRESSED Row Formats

When innodb_file_format is set to Barracuda and a table is created with ROW_FORMAT=DYNAMIC or ROW_FORMAT=COMPRESSED, long column values are stored fully off-page, and the clustered index record contains only a 20-byte pointer to the overflow page.

Whether any columns are stored off-page depends on the page size and the total size of the row. When the row is too long, InnoDB chooses the longest columns for off-page storage until the clustered index record fits on the B-tree page.

The DYNAMIC row format maintains the efficiency of storing the entire row in the index node if it fits (as do the COMPACT and REDUNDANT formats), but this new format avoids the problem of filling B-tree nodes with a large number of data bytes of long columns. The DYNAMIC format is based on the idea that if a portion of a long data value is stored off-page, it is usually most efficient to store all of the value off-page. With DYNAMIC format, shorter columns are likely to remain in the B-tree node, minimizing the number of overflow pages needed for any given row.

The COMPRESSED row format uses similar internal details for off-page storage as the DYNAMIC row format, with additional storage and performance considerations from the table and index data being compressed and using smaller page sizes. For full details about the COMPRESSED row format, see Section 14.4.3, "InnoDB Data Compression".

14.4.5.4. Antelope File Format: COMPACT andREDUNDANT Row Formats

Early versions of InnoDB used an unnamed file format (now called Antelope) for database files. With that format, tables were defined with ROW_FORMAT=COMPACT (or ROW_FORMAT=REDUNDANT) and InnoDB stored up to the first 768 bytes of variable-length columns (such as BLOB and VARCHAR) in the index record within the B-tree node, with the remainder stored on the overflow pages.

To preserve compatibility with those prior versions, tables created with the newest InnoDB use the prefix format, unless one of ROW_FORMAT=DYNAMIC or ROW_FORMAT=COMPRESSED is specified (or implied) on the CREATE TABLE statement.

With the Antelope file format, if the value of a column is 768 bytes or less, no overflow page is needed, and some savings in I/O may result, since the value is in the B-tree node. This works well for relatively short BLOB values, but may cause B-tree nodes to fill with data rather than key values, reducing their efficiency. Tables with many BLOB columns could cause B-tree nodes to become too full of data, and contain too few rows, making the entire index less efficient than if the rows were shorter or if the column values were stored off-page.

14.4.6. InnoDB INFORMATION_SCHEMA tables

14.4.6.1. Information Schema Tables about Compression
14.4.6.2. Information Schema Tables about Transactions
14.4.6.3. Special Locking Considerations for InnoDBINFORMATION_SCHEMA Tables

The INFORMATION_SCHEMA is a MySQL feature that helps you monitor server activity to diagnose capacity and performance issues. Several InnoDB-related INFORMATION_SCHEMA tables (INNODB_CMP, INNODB_CMP_RESET, INNODB_CMPMEM, INNODB_CMPMEM_RESET, INNODB_TRX, INNODB_LOCKS and INNODB_LOCK_WAITS) contain live information about compressed InnoDB tables, the compressed InnoDB buffer pool, all transactions currently executing inside InnoDB, the locks that transactions hold and those that are blocking transactions waiting for access to a resource (a table or row).

The Information Schema tables are themselves plugins to the MySQL server, and must be activated by INSTALL statements. If they are installed, but the InnoDB storage engine plugin is not installed, these tables appear to be empty.

This section describes the InnoDB-related Information Schema tables and shows some examples of their use.

14.4.6.1. Information Schema Tables about Compression

Two new pairs of Information Schema tables provided by the InnoDB storage engine can give you some insight into how well compression is working overall. One pair of tables contains information about the number of compression operations and the amount of time spent performing compression. Another pair of tables contains information on the way memory is allocated for compression.

14.4.6.1.1. INNODB_CMP and INNODB_CMP_RESET

The tables INNODB_CMP and INNODB_CMP_RESET contain status information on the operations related to compressed tables, which are covered in Section 14.4.3, "InnoDB Data Compression". The compressed page size is in the column PAGE_SIZE.

These two tables have identical contents, but reading from INNODB_CMP_RESET resets the statistics on compression and uncompression operations. For example, if you archive the output of INNODB_CMP_RESET every 60 minutes, you see the statistics for each hourly period. If you monitor the output of INNODB_CMP (making sure never to read INNODB_CMP_RESET), you see the cumulated statistics since InnoDB was started.

For the table definition, see Table 20.1, "Columns of INNODB_CMP and INNODB_CMP_RESET".

14.4.6.1.2. INNODB_CMPMEM andINNODB_CMPMEM_RESET

The tables INNODB_CMPMEM and INNODB_CMPMEM_RESET contain status information on the compressed pages that reside in the buffer pool. Please consult Section 14.4.3, "InnoDB Data Compression" for further information on compressed tables and the use of the buffer pool. The tables INNODB_CMP and INNODB_CMP_RESET should provide more useful statistics on compression.

Internal Details

The InnoDB storage engine uses a so-called "buddy allocator" system to manage memory allocated to pages of various sizes, from 1KB to 16KB. Each row of the two tables described here corresponds to a single page size.

These two tables have identical contents, but reading from INNODB_CMPMEM_RESET resets the statistics on relocation operations. For example, if every 60 minutes you archived the output of INNODB_CMPMEM_RESET, it would show the hourly statistics. If you never read INNODB_CMPMEM_RESET and monitored the output of INNODB_CMPMEM instead, it would show the cumulated statistics since InnoDB was started.

For the table definition, see Table 20.2, "Columns of INNODB_CMPMEM and INNODB_CMPMEM_RESET".

14.4.6.1.3. Using the Compression Information Schema Tables

Example 14.1. Using the Compression Information Schema Tables

The following is sample output from a database that contains compressed tables (see Section 14.4.3, "InnoDB Data Compression", INNODB_CMP, and INNODB_CMPMEM).

The following table shows the contents of INFORMATION_SCHEMA.INNODB_CMP under a light workload. The only compressed page size that the buffer pool contains is 8K. Compressing or uncompressing pages has consumed less than a second since the time the statistics were reset, because the columns COMPRESS_TIME and UNCOMPRESS_TIME are zero.

page size	compress ops	compress ops ok	uncompress ops
1024	0	0	0
2048	0	0	0
4096	0	0	0
8192	1048	921	61
16384	0	0	0

According to INNODB_CMPMEM, there are 6169 compressed 8KB pages in the buffer pool.

The following table shows the contents of INFORMATION_SCHEMA.INNODB_CMPMEM under a light workload. Some memory is unusable due to fragmentation of the InnoDB memory allocator for compressed pages: SUM(PAGE_SIZE*PAGES_FREE)=6784. This is because small memory allocation requests are fulfilled by splitting bigger blocks, starting from the 16K blocks that are allocated from the main buffer pool, using the buddy allocation system. The fragmentation is this low, because some allocated blocks have been relocated (copied) to form bigger adjacent free blocks. This copying of SUM(PAGE_SIZE*RELOCATION_OPS) bytes has consumed less than a second (SUM(RELOCATION_TIME)=0).

page size	pages used	pages free	relocation ops
1024	0	0	0
2048	0	1	0
4096	0	1	0
8192	6169	0	5
16384	0	0	0

14.4.6.2. Information Schema Tables about Transactions

Three InnoDB-related Information Schema tables make it easy to monitor transactions and diagnose possible locking problems. The three tables are INNODB_TRX, INNODB_LOCKS and INNODB_LOCK_WAITS.

14.4.6.2.1. INNODB_TRX

Contains information about every transaction currently executing inside InnoDB, including whether the transaction is waiting for a lock, when the transaction started, and the particular SQL statement the transaction is executing.

For the table definition, see Table 20.3, "INNODB_TRX Columns".

14.4.6.2.2. INNODB_LOCKS

Each transaction in InnoDB that is waiting for another transaction to release a lock (INNODB_TRX.TRX_STATE='LOCK WAIT') is blocked by exactly one "blocking lock request". That blocking lock request is for a row or table lock held by another transaction in an incompatible mode. The waiting or blocked transaction cannot proceed until the other transaction commits or rolls back, thereby releasing the requested lock. For every blocked transaction, INNODB_LOCKS contains one row that describes each lock the transaction has requested, and for which it is waiting. INNODB_LOCKS also contains one row for each lock that is blocking another transaction, whatever the state of the transaction that holds the lock ('RUNNING', 'LOCK WAIT', 'ROLLING BACK' or 'COMMITTING'). The lock that is blocking a transaction is always held in a mode (read vs. write, shared vs. exclusive) incompatible with the mode of requested lock.

For the table definition, see Table 20.4, "INNODB_LOCKS Columns".

14.4.6.2.3. INNODB_LOCK_WAITS

Using this table, you can tell which transactions are waiting for a given lock, or for which lock a given transaction is waiting. This table contains one or more rows for each blocked transaction, indicating the lock it has requested and the lock(s) that is (are) blocking that request. The REQUESTED_LOCK_ID refers to the lock that a transaction is requesting, and the BLOCKING_LOCK_ID refers to the lock (held by another transaction) that is preventing the first transaction from proceeding. For any given blocked transaction, all rows in INNODB_LOCK_WAITS have the same value for REQUESTED_LOCK_ID and different values for BLOCKING_LOCK_ID.

For the table definition, see Table 20.5, "INNODB_LOCK_WAITS Columns".

14.4.6.2.4. Using the Transaction Information Schema Tables

Example 14.2. Identifying Blocking Transactions

It is sometimes helpful to be able to identify which transaction is blocking another. You can use the Information Schema tables to find out which transaction is waiting for another, and which resource is being requested.

Suppose you have the following scenario, with three users running concurrently. Each user (or session) corresponds to a MySQL thread, and executes one transaction after another. Consider the state of the system when these users have issued the following commands, but none has yet committed its transaction:

User A:

BEGIN;SELECT a FROM t FOR UPDATE;SELECT SLEEP(100);

User B:
```
SELECT b FROM t FOR UPDATE;
```
User C:
```
SELECT c FROM t FOR UPDATE;
```

In this scenario, you may use this query to see who is waiting for whom:

SELECT r.trx_id waiting_trx_id, r.trx_mysql_thread_id waiting_thread,   r.trx_query waiting_query,   b.trx_id blocking_trx_id, b.trx_mysql_thread_id blocking_thread,   b.trx_query blocking_query   FROM   information_schema.innodb_lock_waits w   INNER JOIN information_schema.innodb_trx b  ON  b.trx_id = w.blocking_trx_id  INNER JOIN information_schema.innodb_trx r  ON  r.trx_id = w.requesting_trx_id;

waiting trx id	waiting thread	waiting query	blocking trx id	blocking thread	blocking query
A4	6	SELECT b FROM t FOR UPDATE	A3	5	SELECT SLEEP(100)
A5	7	SELECT c FROM t FOR UPDATE	A3	5	SELECT SLEEP(100)
A5	7	SELECT c FROM t FOR UPDATE	A4	6	SELECT b FROM t FOR UPDATE

In the above result, you can identify users by the "waiting query" or "blocking query". As you can see:

User B (trx id 'A4', thread 6) and User C (trx id 'A5', thread 7) are both waiting for User A (trx id 'A3', thread 5).
User C is waiting for User B as well as User A.

You can see the underlying data in the tables INNODB_TRX, INNODB_LOCKS, and INNODB_LOCK_WAITS.

The following table shows some sample contents of INFORMATION_SCHEMA.INNODB_TRX.

trx id	trx state	trx started	trx requested lock id	trx wait started	trx weight	trx mysql thread id	trx query
A3	RUN�NING	2008-01-15 16:44:54	NULL	NULL	2	5	SELECT SLEEP(100)
A4	LOCK WAIT	2008-01-15 16:45:09	A4:1:3:2	2008-01-15 16:45:09	2	6	SELECT b FROM t FOR UPDATE
A5	LOCK WAIT	2008-01-15 16:45:14	A5:1:3:2	2008-01-15 16:45:14	2	7	SELECT c FROM t FOR UPDATE

The following table shows some sample contents of INFORMATION_SCHEMA.INNODB_LOCKS.

lock id	lock trx id	lock mode	lock type	lock table	lock index	lock space	lock page	lock rec	lock data
A3:1:3:2	A3	X	RECORD	`test`.`t`	`PRIMARY`	1	3	2	0x0200
A4:1:3:2	A4	X	RECORD	`test`.`t`	`PRIMARY`	1	3	2	0x0200
A5:1:3:2	A5	X	RECORD	`test`.`t`	`PRIMARY`	1	3	2	0x0200

The following table shows some sample contents of INFORMATION_SCHEMA.INNODB_LOCK_WAITS.

requesting trx id	requested lock id	blocking trx id	blocking lock id
A4	A4:1:3:2	A3	A3:1:3:2
A5	A5:1:3:2	A3	A3:1:3:2
A5	A5:1:3:2	A4	A4:1:3:2

Example 14.3. More Complex Example of Transaction Data in Information Schema Tables

Sometimes you would like to correlate the internal InnoDB locking information with session-level information maintained by MySQL. For example, you might like to know, for a given InnoDB transaction ID, the corresponding MySQL session ID and name of the user that may be holding a lock, and thus blocking another transaction.

The following output from the INFORMATION_SCHEMA tables is taken from a somewhat loaded system.

As can be seen in the following tables, there are several transactions running.

The following INNODB_LOCKS and INNODB_LOCK_WAITS tables shows that:

Transaction 77F (executing an INSERT) is waiting for transactions 77E, 77D and 77B to commit.
Transaction 77E (executing an INSERT) is waiting for transactions 77D and 77B to commit.
Transaction 77D (executing an INSERT) is waiting for transaction 77B to commit.
Transaction 77B (executing an INSERT) is waiting for transaction 77A to commit.
Transaction 77A is running, currently executing SELECT.
Transaction E56 (executing an INSERT) is waiting for transaction E55 to commit.
Transaction E55 (executing an INSERT) is waiting for transaction 19C to commit.
Transaction 19C is running, currently executing an INSERT.

Note that there may be an inconsistency between queries shown in the two tables INNODB_TRX.TRX_QUERY and PROCESSLIST.INFO. The current transaction ID for a thread, and the query being executed in that transaction, may be different in these two tables for any given thread. See Section 14.4.6.3.3, "Possible Inconsistency with PROCESSLIST" for an explanation.

The following table shows the contents of INFORMATION_SCHEMA.PROCESSLIST in a system running a heavy workload.

ID	USER	HOST	DB	COMMAND	TIME	STATE	INFO
384	root	localhost	test	Query	10	update	insert into t2 values �
257	root	localhost	test	Query	3	update	insert into t2 values �
130	root	localhost	test	Query	0	update	insert into t2 values �
61	root	localhost	test	Query	1	update	insert into t2 values �
8	root	localhost	test	Query	1	update	insert into t2 values �
4	root	localhost	test	Query	0	preparing	SELECT * FROM processlist
2	root	localhost	test	Sleep	566		NULL

The following table shows the contents of INFORMATION_SCHEMA.INNODB_TRX in a system running a heavy workload.

trx id	trx state	trx started	trx requested lock id	trx wait started	trx weight	trx mysql thread id	trx query
77F	LOCK WAIT	2008-01-15 13:10:16	77F:806	2008-01-15 13:10:16	1	876	insert into t09 (D, B, C) values �
77E	LOCK WAIT	2008-01-15 13:10:16	77E:806	2008-01-15 13:10:16	1	875	insert into t09 (D, B, C) values �
77D	LOCK WAIT	2008-01-15 13:10:16	77D:806	2008-01-15 13:10:16	1	874	insert into t09 (D, B, C) values �
77B	LOCK WAIT	2008-01-15 13:10:16	77B:733?:12:1	2008-01-15 13:10:16	4	873	insert into t09 (D, B, C) values �
77A	RUN�NING	2008-01-15 13:10:16	NULL	NULL	4	872	select b, c from t09 where �
E56	LOCK WAIT	2008-01-15 13:10:06	E56:743?:6:2	2008-01-15 13:10:06	5	384	insert into t2 values �
E55	LOCK WAIT	2008-01-15 13:10:06	E55:743?:38:2	2008-01-15 13:10:13	965	257	insert into t2 values �
19C	RUN�NING	2008-01-15 13:09:10	NULL	NULL	2900	130	insert into t2 values �
E15	RUN�NING	2008-01-15 13:08:59	NULL	NULL	5395	61	insert into t2 values �
51D	RUN�NING	2008-01-15 13:08:47	NULL	NULL	9807	8	insert into t2 values �

The following table shows the contents of INFORMATION_SCHEMA.INNODB_LOCK_WAITS in a system running a heavy workload.

requesting trx id	requested lock id	blocking trx id	blocking lock id
77F	77F:806	77E	77E:806
77F	77F:806	77D	77D:806
77F	77F:806	77B	77B:806
77E	77E:806	77D	77D:806
77E	77E:806	77B	77B:806
77D	77D:806	77B	77B:806
77B	77B:733:12:1	77A	77A:733:12:1
E56	E56:743:6:2	E55	E55:743:6:2
E55	E55:743:38:2	19C	19C:743:38:2

The following table shows the contents of INFORMATION_SCHEMA.INNODB_LOCKS in a system running a heavy workload.

lock id	lock trx id	lock mode	lock type	lock table	lock index	lock space	lock page	lock rec	lock data
77F:806	77F	AUTO?_INC	TABLE	`test`?.`t09`	NULL	NULL	NULL	NULL	NULL
77E:806	77E	AUTO?_INC	TABLE	`test`?.`t09`	NULL	NULL	NULL	NULL	NULL
77D:806	77D	AUTO?_INC	TABLE	`test`?.`t09`	NULL	NULL	NULL	NULL	NULL
77B:806	77B	AUTO?_INC	TABLE	`test`?.`t09`	NULL	NULL	NULL	NULL	NULL
77B:733?:12:1	77B	X	RECORD	`test`?.`t09`	`PRIMARY`	733	12	1	supremum pseudo-record
77A:733?:12:1	77A	X	RECORD	`test`?.`t09`	`PRIMARY`	733	12	1	supremum pseudo-record
E56:743?:6:2	E56	S	RECORD	`test`?.`t2`	`PRIMARY`	743	6	2	0, 0
E55:743?:6:2	E55	X	RECORD	`test`?.`t2`	`PRIMARY`	743	6	2	0, 0
E55:743?:38:2	E55	S	RECORD	`test`?.`t2`	`PRIMARY`	743	38	2	1922, 1922
19C:743?:38:2	19C	X	RECORD	`test`?.`t2`	`PRIMARY`	743	38	2	1922, 1922

14.4.6.3. Special Locking Considerations for InnoDBINFORMATION_SCHEMA Tables

14.4.6.3.1. Understanding InnoDB Locking

When a transaction updates a row in a table, or locks it with SELECT FOR UPDATE, InnoDB establishes a list or queue of locks on that row. Similarly, InnoDB maintains a list of locks on a table for table-level locks transactions hold. If a second transaction wants to update a row or lock a table already locked by a prior transaction in an incompatible mode, InnoDB adds a lock request for the row to the corresponding queue. For a lock to be acquired by a transaction, all incompatible lock requests previously entered into the lock queue for that row or table must be removed (the transactions holding or requesting those locks either commit or roll back).

A transaction may have any number of lock requests for different rows or tables. At any given time, a transaction may be requesting a lock that is held by another transaction, in which case it is blocked by that other transaction. The requesting transaction must wait for the transaction that holds the blocking lock to commit or rollback. If a transaction is not waiting for a a lock, it is in the 'RUNNING' state. If a transaction is waiting for a lock, it is in the 'LOCK WAIT' state.

The table INNODB_LOCKS holds one or more row for each 'LOCK WAIT' transaction, indicating the lock request(s) that is (are) preventing its progress. This table also contains one row describing each lock in a queue of locks pending for a given row or table. The table INNODB_LOCK_WAITS shows which locks already held by a transaction are blocking locks requested by other transactions.

14.4.6.3.2. Granularity of INFORMATION_SCHEMA Data

The data exposed by the transaction and locking tables represent a glimpse into fast-changing data. This is not like other (user) tables, where the data only changes when application-initiated updates occur. The underlying data is internal system-managed data, and can change very quickly.

For performance reasons, and to minimize the chance of misleading JOINs between the INFORMATION_SCHEMA tables, InnoDB collects the required transaction and locking information into an intermediate buffer whenever a SELECT on any of the tables is issued. This buffer is refreshed only if more than 0.1 seconds has elapsed since the last time the buffer was read. The data needed to fill the three tables is fetched atomically and consistently and is saved in this global internal buffer, forming a point-in-time "snapshot". If multiple table accesses occur within 0.1 seconds (as they almost certainly do when MySQL processes a join among these tables), then the same snapshot is used to satisfy the query.

A correct result is returned when you JOIN any of these tables together in a single query, because the data for the three tables comes from the same snapshot. Because the buffer is not refreshed with every query of any of these tables, if you issue separate queries against these tables within a tenth of a second, the results are the same from query to query. On the other hand, two separate queries of the same or different tables issued more than a tenth of a second apart may see different results, since the data come from different snapshots.

Because InnoDB must temporarily stall while the transaction and locking data is collected, too frequent queries of these tables can negatively impact performance as seen by other users.

As these tables contain sensitive information (at least INNODB_LOCKS.LOCK_DATA and INNODB_TRX.TRX_QUERY), for security reasons, only the users with the PROCESS privilege are allowed to SELECT from them.

14.4.6.3.3. Possible Inconsistency with PROCESSLIST

As just described, while the transaction and locking data is correct and consistent when these INFORMATION_SCHEMA tables are populated, the underlying data changes so fast that similar glimpses at other, similarly fast-changing data, may not be in sync. Thus, you should be careful in comparing the data in the InnoDB transaction and locking tables with that in the MySQL table PROCESSLIST. The data from the PROCESSLIST table does not come from the same snapshot as the data about locking and transactions. Even if you issue a single SELECT (JOINing INNODB_TRX and PROCESSLIST, for example), the content of those tables is generally not consistent. INNODB_TRX may reference rows that are not present in PROCESSLIST or the currently executing SQL query of a transaction, shown in INNODB_TRX.TRX_QUERY may be different from the one in PROCESSLIST.INFO. The query in INNODB_TRX is always consistent with the rest of INNODB_TRX, INNODB_LOCKS and INNODB_LOCK_WAITS when the data comes from the same snapshot.

14.4.7. InnoDB Performance and Scalability Enhancements

14.4.7.1. Overview of InnoDB Performance
14.4.7.2. Faster Locking for Improved Scalability
14.4.7.3. Using Operating System Memory Allocators
14.4.7.4. Controlling InnoDB Change Buffering
14.4.7.5. Controlling Adaptive Hash Indexing
14.4.7.6. Changes Regarding Thread Concurrency
14.4.7.7. Changes in the Read-Ahead Algorithm
14.4.7.8. Multiple Background I/O Threads
14.4.7.9. Asynchronous I/O on Linux
14.4.7.10. Group Commit
14.4.7.11. Controlling the Master Thread I/O Rate
14.4.7.12. Controlling the Flushing Rate of Dirty Pages
14.4.7.13. Using the PAUSE Instruction in InnoDB Spin Loops
14.4.7.14. Control of Spin Lock Polling
14.4.7.15. Making the Buffer Pool Scan Resistant
14.4.7.16. Improvements to Crash Recovery Performance
14.4.7.17. Integration with the MySQL Performance Schema
14.4.7.18. Improvements to Performance from Multiple Buffer Pools
14.4.7.19. Better Scalability with Multiple Rollback Segments
14.4.7.20. Better Scalability with Improved Purge Scheduling
14.4.7.21. Improved Log Sys Mutex
14.4.7.22. Separate Flush List Mutex

This section discusses recent InnoDB enhancements to performance and scalability, covering the performance features in InnoDB 1.1 with MySQL 5.5, and the features in the InnoDB Plugin for MySQL 5.1. This information is useful to any DBA or developer who is concerned with performance and scalability. Although some of the enhancements do not require any action on your part, knowing this information can still help you diagnose performance issues more quickly and modernize systems and applications that rely on older, inefficient behavior.

14.4.7.1. Overview of InnoDB Performance

InnoDB has always been highly efficient, and includes several unique architectural elements to assure high performance and scalability. The latest InnoDB storage engine includes new features that take advantage of advances in operating systems and hardware platforms, such as multi-core processors and improved memory allocation systems. In addition, new configuration options let you better control some InnoDB internal subsystems to achieve the best performance with your workload.

Starting with MySQL 5.5 and InnoDB 1.1, the built-in InnoDB storage engine within MySQL is upgraded to the full feature set and performance of the former InnoDB Plugin. This change makes these performance and scalability enhancements available to a much wider audience than before, and eliminates the separate installation step of the InnoDB Plugin. After learning about the InnoDB performance features in this section, continue with Chapter 8, Optimization to learn the best practices for overall MySQL performance, and Section 8.5, "Optimizing for InnoDB Tables" in particular for InnoDB tips and guidelines.

14.4.7.2. Faster Locking for Improved Scalability

In MySQL and InnoDB, multiple threads of execution access shared data structures. InnoDB synchronizes these accesses with its own implementation of mutexes and read/write locks. InnoDB has historically protected the internal state of a read/write lock with an InnoDB mutex. On Unix and Linux platforms, the internal state of an InnoDB mutex is protected by a Pthreads mutex, as in IEEE Std 1003.1c (POSIX.1c).

On many platforms, there is a more efficient way to implement mutexes and read/write locks. Atomic operations can often be used to synchronize the actions of multiple threads more efficiently than Pthreads. Each operation to acquire or release a lock can be done in fewer CPU instructions, and thus result in less wasted time when threads are contending for access to shared data structures. This in turn means greater scalability on multi-core platforms.

InnoDB implements mutexes and read/write locks with the built-in functions provided by the GNU Compiler Collection (GCC) for atomic memory access instead of using the Pthreads approach previously used. More specifically, an InnoDB that is compiled with GCC version 4.1.2 or later uses the atomic builtins instead of a pthread_mutex_t to implement InnoDB mutexes and read/write locks.

On 32-bit Microsoft Windows, InnoDB has implemented mutexes (but not read/write locks) with hand-written assembler instructions. Beginning with Microsoft Windows 2000, functions for Interlocked Variable Access are available that are similar to the built-in functions provided by GCC. On Windows 2000 and higher, InnoDB makes use of the Interlocked functions. Unlike the old hand-written assembler code, the new implementation supports read/write locks and 64-bit platforms.

Solaris 10 introduced library functions for atomic operations, and InnoDB uses these functions by default. When MySQL is compiled on Solaris 10 with a compiler that does not support the built-in functions provided by the GNU Compiler Collection (GCC) for atomic memory access, InnoDB uses the library functions.

This change improves the scalability of InnoDB on multi-core systems. This feature is enabled out-of-the-box on the platforms where it is supported. You do not have to set any parameter or option to take advantage of the improved performance. On platforms where the GCC, Windows, or Solaris functions for atomic memory access are not available, InnoDB uses the traditional Pthreads method of implementing mutexes and read/write locks.

When MySQL starts, InnoDB writes a message to the log file indicating whether atomic memory access is used for mutexes, for mutexes and read/write locks, or neither. If suitable tools are used to build InnoDB and the target CPU supports the atomic operations required, InnoDB uses the built-in functions for mutexing. If, in addition, the compare-and-swap operation can be used on thread identifiers (pthread_t), then InnoDB uses the instructions for read-write locks as well.

Note: If you are building from source, ensure that the build process properly takes advantage of your platform capabilities.

For more information about the performance implications of locking, see Section 8.10, "Optimizing Locking Operations".

14.4.7.3. Using Operating System Memory Allocators

When InnoDB was developed, the memory allocators supplied with operating systems and run-time libraries were often lacking in performance and scalability. At that time, there were no memory allocator libraries tuned for multi-core CPUs. Therefore, InnoDB implemented its own memory allocator in the mem subsystem. This allocator is guarded by a single mutex, which may become a bottleneck. InnoDB also implements a wrapper interface around the system allocator (malloc and free) that is likewise guarded by a single mutex.

Today, as multi-core systems have become more widely available, and as operating systems have matured, significant improvements have been made in the memory allocators provided with operating systems. New memory allocators perform better and are more scalable than they were in the past. The leading high-performance memory allocators include Hoard, libumem, mtmalloc, ptmalloc, tbbmalloc, and TCMalloc. Most workloads, especially those where memory is frequently allocated and released (such as multi-table joins), benefit from using a more highly tuned memory allocator as opposed to the internal, InnoDB-specific memory allocator.

You can control whether InnoDB uses its own memory allocator or an allocator of the operating system, by setting the value of the system configuration parameter innodb_use_sys_malloc in the MySQL option file (my.cnf or my.ini). If set to ON or 1 (the default), InnoDB uses the malloc and free functions of the underlying system rather than manage memory pools itself. This parameter is not dynamic, and takes effect only when the system is started. To continue to use the InnoDB memory allocator, set innodb_use_sys_malloc to 0.

Note

When the InnoDB memory allocator is disabled, InnoDB ignores the value of the parameter innodb_additional_mem_pool_size. The InnoDB memory allocator uses an additional memory pool for satisfying allocation requests without having to fall back to the system memory allocator. When the InnoDB memory allocator is disabled, all such allocation requests are fulfilled by the system memory allocator.

On Unix-like systems that use dynamic linking, replacing the memory allocator may be as easy as making the environment variable LD_PRELOAD or LD_LIBRARY_PATH point to the dynamic library that implements the allocator. On other systems, some relinking may be necessary. Please refer to the documentation of the memory allocator library of your choice.

Since InnoDB cannot track all memory use when the system memory allocator is used (innodb_use_sys_malloc is ON), the section "BUFFER POOL AND MEMORY" in the output of the SHOW ENGINE INNODB STATUS command only includes the buffer pool statistics in the "Total memory allocated". Any memory allocated using the mem subsystem or using ut_malloc is excluded.

For more information about the performance implications of InnoDB memory usage, see Section 8.9, "Buffering and Caching".

14.4.7.4. Controlling InnoDB Change Buffering

When INSERT, UPDATE, and DELETE operations are done to a table, often the values of indexed columns (particularly the values of secondary keys) are not in sorted order, requiring substantial I/O to bring secondary indexes up to date. InnoDB has an insert buffer that caches changes to secondary index entries when the relevant page is not in the buffer pool, thus avoiding I/O operations by not reading in the page from the disk. The buffered changes are merged when the page is loaded to the buffer pool, and the updated page is later flushed to disk using the normal mechanism. The InnoDB main thread merges buffered changes when the server is nearly idle, and during a slow shutdown.

Because it can result in fewer disk reads and writes, this feature is most valuable for workloads that are I/O-bound, for example applications with a high volume of DML operations such as bulk inserts.

However, the insert buffer occupies a part of the buffer pool, reducing the memory available to cache data pages. If the working set almost fits in the buffer pool, or if your tables have relatively few secondary indexes, it may be useful to disable insert buffering. If the working set entirely fits in the buffer pool, insert buffering does not impose any extra overhead, because it only applies to pages that are not in the buffer pool.

You can control the extent to which InnoDB performs insert buffering with the system configuration parameter innodb_change_buffering. You can turn on and off buffering for inserts, delete operations (when index records are initially marked for deletion) and purge operations (when index records are physically deleted). An update operation is represented as a combination of an insert and a delete. In MySQL 5.5 and higher, the default value is changed from inserts to all.

The allowed values of innodb_change_buffering are:

all
The default value: buffer inserts, delete-marking operations, and purges.
none
Do not buffer any operations.
inserts
Buffer insert operations.
deletes
Buffer delete-marking operations.
changes
Buffer both inserts and delete-marking.
purges
Buffer the physical deletion operations that happen in the background.

You can set the value of this parameter in the MySQL option file (my.cnf or my.ini) or change it dynamically with the SET GLOBAL command, which requires the SUPER privilege. Changing the setting affects the buffering of new operations; the merging of already buffered entries is not affected.

For more information about speeding up INSERT, UPDATE, and DELETE statements, see Section 8.2.2, "Optimizing DML Statements".

14.4.7.5. Controlling Adaptive Hash Indexing

If a table fits almost entirely in main memory, the fastest way to perform queries on it is to use hash indexes rather than B-tree lookups. MySQL monitors searches on each index defined for an InnoDB table. If it notices that certain index values are being accessed frequently, it automatically builds an in-memory hash table for that index. See Section 14.3.11.4, "Adaptive Hash Indexes" for background information and usage guidelines for the adaptive hash index feature and the innodb_adaptive_hash_index configuration option.

14.4.7.6. Changes Regarding Thread Concurrency

InnoDB uses operating system threads to process requests from user transactions. (Transactions may issue many requests to InnoDB before they commit or roll back.) On modern operating systems and servers with multi-core processors, where context switching is efficient, most workloads run well without any limit on the number of concurrent threads. Scalability improvements in MySQL 5.5 and up reduce the need to limit the number of concurrently executing threads inside InnoDB.

In situations where it is helpful to minimize context switching between threads, InnoDB can use a number of techniques to limit the number of concurrently executing operating system threads (and thus the number of requests that are processed at any one time). When InnoDB receives a new request from a user session, if the number of threads concurrently executing is at a pre-defined limit, the new request sleeps for a short time before it tries again. A request that cannot be rescheduled after the sleep is put in a first-in/first-out queue and eventually is processed. Threads waiting for locks are not counted in the number of concurrently executing threads.

You can limit the number of concurrent threads by setting the configuration parameter innodb_thread_concurrency. Once the number of executing threads reaches this limit, additional threads sleep for a number of microseconds, set by the configuration parameter innodb_thread_sleep_delay, before being placed into the queue.

The default value for innodb_thread_concurrency and the implied default limit on the number of concurrent threads has been changed in various releases of MySQL and InnoDB. Currently, the default value of innodb_thread_concurrency is 0, so that by default there is no limit on the number of concurrently executing threads, as shown in Table 14.7, "Changes to innodb_thread_concurrency".

Table 14.7. Changes to innodb_thread_concurrency

InnoDB Version	MySQL Version	Default value	Default limit of concurrent threads	Value to allow unlimited threads
Built-in	Earlier than 5.1.11	20	No limit	20 or higher
Built-in	5.1.11 and newer	8	8	0
InnoDB before 1.0.3	(corresponding to Plugin)	8	8	0
InnoDB 1.0.3 and newer	(corresponding to Plugin)	0	No limit	0

Note that InnoDB causes threads to sleep only when the number of concurrent threads is limited. When there is no limit on the number of threads, all contend equally to be scheduled. That is, if innodb_thread_concurrency is 0, the value of innodb_thread_sleep_delay is ignored.

When there is a limit on the number of threads, InnoDB reduces context switching overhead by permitting multiple requests made during the execution of a single SQL statement to enter InnoDB without observing the limit set by innodb_thread_concurrency. Since an SQL statement (such as a join) may comprise multiple row operations within InnoDB, InnoDB assigns "tickets" that allow a thread to be scheduled repeatedly with minimal overhead.

When a new SQL statement starts, a thread has no tickets, and it must observe innodb_thread_concurrency. Once the thread is entitled to enter InnoDB, it is assigned a number of tickets that it can use for subsequently entering InnoDB. If the tickets run out, innodb_thread_concurrency is observed again and further tickets are assigned. The number of tickets to assign is specified by the global option innodb_concurrency_tickets, which is 500 by default. A thread that is waiting for a lock is given one ticket once the lock becomes available.

The correct values of these variables depend on your environment and workload. Try a range of different values to determine what value works for your applications. Before limiting the number of concurrently executing threads, review configuration options that may improve the performance of InnoDB on multi-core and multi-processor computers, such as innodb_use_sys_malloc and innodb_adaptive_hash_index.

For general performance information about MySQL thread handling, see Section 8.11.5.1, "How MySQL Uses Threads for Client Connections".

14.4.7.7. Changes in the Read-Ahead Algorithm

A read-ahead request is an I/O request to prefetch multiple pages in the buffer pool asynchronously, in anticipation that these pages will be needed soon. InnoDB uses or has used two read-ahead algorithms to improve I/O performance:

Linear read-ahead is based on the access pattern of the pages in the buffer pool, not just their number. You can control when InnoDB performs a read-ahead operation by adjusting the number of sequential page accesses required to trigger an asynchronous read request, using the configuration parameter innodb_read_ahead_threshold. Before this parameter was added, InnoDB would only calculate whether to issue an asynchronous prefetch request for the entire next extent when it read in the last page of the current extent.

Random read-ahead is a former technique that has now been removed as of MySQL 5.5. If a certain number of pages from the same extent (64 consecutive pages) were found in the buffer pool, InnoDB asynchronously issued a request to prefetch the remaining pages of the extent. Random read-ahead added unnecessary complexity to the InnoDB code and often resulted in performance degradation rather than improvement. This feature is no longer part of InnoDB, and users should generally see equivalent or improved performance.

If the number of pages read from an extent of 64 pages is greater or equal to innodb_read_ahead_threshold, InnoDB initiates an asynchronous read-ahead operation of the entire following extent. Thus, this parameter controls how sensitive InnoDB is to the pattern of page accesses within an extent in deciding whether to read the following extent asynchronously. The higher the value, the more strict the access pattern check. For example, if you set the value to 48, InnoDB triggers a linear read-ahead request only when 48 pages in the current extent have been accessed sequentially. If the value is 8, InnoDB would trigger an asynchronous read-ahead even if as few as 8 pages in the extent were accessed sequentially.

The new configuration parameter innodb_read_ahead_threshold can be set to any value from 0-64. The default value is 56, meaning that an asynchronous read-ahead is performed only when 56 of the 64 pages in the extent are accessed sequentially. You can set the value of this parameter in the MySQL option file (my.cnf or my.ini), or change it dynamically with the SET GLOBAL command, which requires the SUPER privilege.

The SHOW ENGINE INNODB STATUS command displays statistics to help you evaluate the effectiveness of the read-ahead algorithm. See Section 14.4.8.8, "More Read-Ahead Statistics" for more information.

For more information about I/O performance, see Section 8.5.7, "Optimizing InnoDB Disk I/O" and Section 8.11.3, "Optimizing Disk I/O".

14.4.7.8. Multiple Background I/O Threads

InnoDB uses background threads to service various types of I/O requests. You can configure the number of background threads that service read and write I/O on data pages, using the configuration parameters innodb_read_io_threads and innodb_write_io_threads. These parameters signify the number of background threads used for read and write requests respectively. They are effective on all supported platforms. You can set the value of these parameters in the MySQL option file (my.cnf or my.ini); you cannot change them dynamically. The default value for these parameters is 4 and the permissible values range from 1-64.

These parameters replace innodb_file_io_threads from earlier versions of MySQL. If you try to set a value for this obsolete parameter, a warning is written to the log file and the value is ignored. This parameter only applied to Windows platforms. (On non-Windows platforms, there was only one thread each for read and write.)

The purpose of this change is to make InnoDB more scalable on high end systems. Each background thread can handle up to 256 pending I/O requests. A major source of background I/O is the read-ahead requests. InnoDB tries to balance the load of incoming requests in such way that most of the background threads share work equally. InnoDB also attempts to allocate read requests from the same extent to the same thread to increase the chances of coalescing the requests together. If you have a high end I/O subsystem and you see more than 64 � innodb_read_io_threads pending read requests in SHOW ENGINE INNODB STATUS, you might gain by increasing the value of innodb_read_io_threads.

For more information about InnoDB I/O performance, see Section 8.5.7, "Optimizing InnoDB Disk I/O".

14.4.7.9. Asynchronous I/O on Linux

Starting in InnoDB 1.1 with MySQL 5.5, the asynchronous I/O capability that InnoDB has had on Windows systems is now available on Linux systems. (Other Unix-like systems continue to use synchronous I/O calls.) This feature improves the scalability of heavily I/O-bound systems, which typically show many pending reads/writes in the output of the command SHOW ENGINE INNODB STATUS\G.

Running with a large number of InnoDB I/O threads, and especially running multiple such instances on the same server machine, can exceed capacity limits on Linux systems. In this case, you can fix the error:

EAGAIN: The specified maxevents exceeds the user's limit of available events.

In general, if a problem with the asynchronous I/O subsystem in the OS prevents InnoDB from starting, set the option innodb_use_native_aio=0 in the configuration file. This new configuration option applies to Linux systems only, and cannot be changed once the server is running.

For more information about InnoDB I/O performance, see Section 8.5.7, "Optimizing InnoDB Disk I/O".

14.4.7.10. Group Commit

InnoDB, like any other ACID-compliant database engine, flushes the redo log of a transaction before it is committed. Historically, InnoDB used group commit functionality to group multiple such flush requests together to avoid one flush for each commit. With group commit, InnoDB issues a single write to the log file to perform the commit action for multiple user transactions that commit at about the same time, significantly improving throughput.

Group commit in InnoDB worked until MySQL 4.x, and works once again with MySQL 5.1 with the InnoDB Plugin, and MySQL 5.5 and higher. The introduction of support for the distributed transactions and Two Phase Commit (2PC) in MySQL 5.0 interfered with the InnoDB group commit functionality. This issue is now resolved.

The group commit functionality inside InnoDB works with the Two Phase Commit protocol in MySQL. Re-enabling of the group commit functionality fully ensures that the ordering of commit in the MySQL binlog and the InnoDB logfile is the same as it was before. It means it is totally safe to use MySQL Enterprise Backup with InnoDB 1.0.4 (that is, the InnoDB Plugin with MySQL 5.1) and above. When the binlog is enabled, you typically also set the configuration option sync_binlog=0, because group commit for the binary log is only supported if it is set to 0.

Group commit is transparent; you do not need to do anything to take advantage of this significant performance improvement.

For more information about performance of COMMIT and other transactional operations, see Section 8.5.2, "Optimizing InnoDB Transaction Management".

14.4.7.11. Controlling the Master Thread I/O Rate

The master thread in InnoDB is a thread that performs various tasks in the background. Most of these tasks are I/O related, such as flushing dirty pages from the buffer pool or writing changes from the insert buffer to the appropriate secondary indexes. The master thread attempts to perform these tasks in a way that does not adversely affect the normal working of the server. It tries to estimate the free I/O bandwidth available and tune its activities to take advantage of this free capacity. Historically, InnoDB has used a hard coded value of 100 IOPs (input/output operations per second) as the total I/O capacity of the server.

The parameter innodb_io_capacity indicates the overall I/O capacity available to InnoDB. This parameter should be set to approximately the number of I/O operations that the system can perform per second. The value depends on your system configuration. When innodb_io_capacity is set, the master threads estimates the I/O bandwidth available for background tasks based on the set value. Setting the value to 100 reverts to the old behavior.

You can set the value of innodb_io_capacity to any number 100 or greater. The default value is 200, reflecting that the performance of typical modern I/O devices is higher than in the early days of MySQL. Typically, values around the previous default of 100 are appropriate for consumer-level storage devices, such as hard drives up to 7200 RPMs. Faster hard drives, RAID configurations, and SSDs benefit from higher values.

You can set the value of this parameter in the MySQL option file (my.cnf or my.ini) or change it dynamically with the SET GLOBAL command, which requires the SUPER privilege.

For more information about InnoDB I/O performance, see Section 8.5.7, "Optimizing InnoDB Disk I/O".

14.4.7.12. Controlling the Flushing Rate of Dirty Pages

InnoDB performs certain tasks in the background, including flushing of dirty pages (those pages that have been changed but are not yet written to the database files) from the buffer pool, a task performed by the master thread. Currently, InnoDB aggressively flushes buffer pool pages if the percentage of dirty pages in the buffer pool exceeds innodb_max_dirty_pages_pct.

InnoDB uses a new algorithm to estimate the required rate of flushing, based on the speed of redo log generation and the current rate of flushing. The intent is to smooth overall performance by ensuring that buffer flush activity keeps up with the need to keep the buffer pool "clean". Automatically adjusting the rate of flushing can help to avoid steep dips in throughput, when excessive buffer pool flushing limits the I/O capacity available for ordinary read and write activity.

InnoDB uses its log files in a circular fashion. Before reusing a portion of a log file, InnoDB flushes to disk all dirty buffer pool pages whose redo entries are contained in that portion of the log file, a process known as a sharp checkpoint. If a workload is write-intensive, it generates a lot of redo information, all written to the log file. If all available space in the log files is used up, a sharp checkpoint occurs, causing a temporary reduction in throughput. This situation can happen even though innodb_max_dirty_pages_pct is not reached.

InnoDB uses a heuristic-based algorithm to avoid such a scenario, by measuring the number of dirty pages in the buffer pool and the rate at which redo is being generated. Based on these numbers, InnoDB decides how many dirty pages to flush from the buffer pool each second. This self-adapting algorithm is able to deal with sudden changes in the workload.

Internal benchmarking has also shown that this algorithm not only maintains throughput over time, but can also improve overall throughput significantly.

Because adaptive flushing is a new feature that can significantly affect the I/O pattern of a workload, a new configuration parameter lets you turn off this feature. The default value of the boolean parameter innodb_adaptive_flushing is TRUE, enabling the new algorithm. You can set the value of this parameter in the MySQL option file (my.cnf or my.ini) or change it dynamically with the SET GLOBAL command, which requires the SUPER privilege.

For more information about InnoDB I/O performance, see Section 8.5.7, "Optimizing InnoDB Disk I/O".

14.4.7.13. Using the PAUSE Instruction in InnoDB Spin Loops

Synchronization inside InnoDB frequently involves the use of spin loops: while waiting, InnoDB executes a tight loop of instructions repeatedly to avoid having the InnoDB process and threads be rescheduled by the operating system. If the spin loops are executed too quickly, system resources are wasted, imposing a performance penalty on transaction throughput. Most modern processors implement the PAUSE instruction for use in spin loops, so the processor can be more efficient.

InnoDB uses a PAUSE instruction in its spin loops on all platforms where such an instruction is available. This technique increases overall performance with CPU-bound workloads, and has the added benefit of minimizing power consumption during the execution of the spin loops.

You do not have to do anything to take advantage of this performance improvement.

For performance considerations for InnoDB locking operations, see Section 8.10, "Optimizing Locking Operations".

14.4.7.14. Control of Spin Lock Polling

Many InnoDB mutexes and rw-locks are reserved for a short time. On a multi-core system, it can be more efficient for a thread to continuously check if it can acquire a mutex or rw-lock for a while before sleeping. If the mutex or rw-lock becomes available during this polling period, the thread can continue immediately, in the same time slice. However, too-frequent polling by multiple threads of a shared object can cause "cache ping pong", different processors invalidating portions of each others' cache. InnoDB minimizes this issue by waiting a random time between subsequent polls. The delay is implemented as a busy loop.

You can control the maximum delay between testing a mutex or rw-lock using the parameter innodb_spin_wait_delay. The duration of the delay loop depends on the C compiler and the target processor. (In the 100MHz Pentium era, the unit of delay was one microsecond.) On a system where all processor cores share a fast cache memory, you might reduce the maximum delay or disable the busy loop altogether by setting innodb_spin_wait_delay=0. On a system with multiple processor chips, the effect of cache invalidation can be more significant and you might increase the maximum delay.

The default value of innodb_spin_wait_delay is 6. The spin wait delay is a dynamic global parameter that you can specify in the MySQL option file (my.cnf or my.ini) or change at runtime with the command SET GLOBAL innodb_spin_wait_delay=delay, where delay is the desired maximum delay. Changing the setting requires the SUPER privilege.

For performance considerations for InnoDB locking operations, see Section 8.10, "Optimizing Locking Operations".

14.4.7.15. Making the Buffer Pool Scan Resistant

Rather than using a strictly LRU algorithm, InnoDB uses a technique to minimize the amount of data that is brought into the buffer pool and never accessed again. The goal is to make sure that frequently accessed ("hot") pages remain in the buffer pool, even as read-ahead and full table scans bring in new blocks that might or might not be accessed afterward.

Newly read blocks are inserted into the middle of the list representing the buffer pool. of the LRU list. All newly read pages are inserted at a location that by default is 3/8 from the tail of the LRU list. The pages are moved to the front of the list (the most-recently used end) when they are accessed in the buffer pool for the first time. Thus pages that are never accessed never make it to the front portion of the LRU list, and "age out" sooner than with a strict LRU approach. This arrangement divides the LRU list into two segments, where the pages downstream of the insertion point are considered "old" and are desirable victims for LRU eviction.

For an explanation of the inner workings of the InnoDB buffer pool and the specifics of its LRU replacement algorithm, see Section 8.9.1, "The InnoDB Buffer Pool".

You can control the insertion point in the LRU list, and choose whether InnoDB applies the same optimization to blocks brought into the buffer pool by table or index scans. The configuration parameter innodb_old_blocks_pct controls the percentage of "old" blocks in the LRU list. The default value of innodb_old_blocks_pct is 37, corresponding to the original fixed ratio of 3/8. The value range is 5 (new pages in the buffer pool age out very quickly) to 95 (only 5% of the buffer pool is reserved for hot pages, making the algorithm close to the familiar LRU strategy).

The optimization that keeps the buffer pool from being churned by read-ahead can avoid similar problems due to table or index scans. In these scans, a data page is typically accessed a few times in quick succession and is never touched again. The configuration parameter innodb_old_blocks_time specifies the time window (in milliseconds) after the first access to a page during which it can be accessed without being moved to the front (most-recently used end) of the LRU list. The default value of innodb_old_blocks_time is 0, corresponding to the original behavior of moving a page to the most-recently used end of the buffer pool list when it is first accessed in the buffer pool. Increasing this value makes more and more blocks likely to age out faster from the buffer pool.

Both the new parameters innodb_old_blocks_pct and innodb_old_blocks_time are dynamic, global and can be specified in the MySQL option file (my.cnf or my.ini) or changed at runtime with the SET GLOBAL command. Changing the setting requires the SUPER privilege.

To help you gauge the effect of setting these parameters, the SHOW ENGINE INNODB STATUS command reports additional statistics. The BUFFER POOL AND MEMORY section now looks like:

Total memory allocated 1107296256; in additional pool allocated 0Dictionary memory allocated 80360Buffer pool size   65535Free buffers   0Database pages 63920Old database pages 23600Modified db pages  34969Pending reads 32Pending writes: LRU 0, flush list 0, single page 0Pages made young 414946, not young 29306731274.75 youngs/s, 16521.90 non-youngs/sPages read 486005, created 3178, written 1605852132.37 reads/s, 3.40 creates/s, 323.74 writes/sBuffer pool hit rate 950 / 1000, young-making rate 30 / 1000 not 392 / 1000Pages read ahead 1510.10/s, evicted without access 0.00/sLRU len: 63920, unzip_LRU len: 0I/O sum[43690]:cur[221], unzip sum[0]:cur[0]

Old database pages is the number of pages in the "old" segment of the LRU list.
Pages made young and not young is the total number of "old" pages that have been made young or not respectively.
youngs/s and non-young/s is the rate at which page accesses to the "old" pages have resulted in making such pages young or otherwise respectively since the last invocation of the command.
young-making rate and not provides the same rate but in terms of overall buffer pool accesses instead of accesses just to the "old" pages.

Because the effects of these parameters can vary widely based on your hardware configuration, your data, and the details of your workload, always benchmark to verify the effectiveness before changing these settings in any performance-critical or production environment.

In mixed workloads where most of the activity is OLTP type with periodic batch reporting queries which result in large scans, setting the value of innodb_old_blocks_time during the batch runs can help keep the working set of the normal workload in the buffer pool.

When scanning large tables that cannot fit entirely in the buffer pool, setting innodb_old_blocks_pct to a small value keeps the data that is only read once from consuming a significant portion of the buffer pool. For example, setting innodb_old_blocks_pct=5 restricts this data that is only read once to 5% of the buffer pool.

When scanning small tables that do fit into memory, there is less overhead for moving pages around within the buffer pool, so you can leave innodb_old_blocks_pct at its default value, or even higher, such as innodb_old_blocks_pct=50.

The effect of the innodb_old_blocks_time parameter is harder to predict than the innodb_old_blocks_pct parameter, is relatively small, and varies more with the workload. To arrive at an optimal value, conduct your own benchmarks if the performance improvement from adjusting innodb_old_blocks_pct is not sufficient.

For more information about the InnoDB buffer pool, see Section 8.9.1, "The InnoDB Buffer Pool".

14.4.7.16. Improvements to Crash Recovery Performance

A number of optimizations speed up certain steps of the recovery that happens on the next startup after a crash. In particular, scanning the redo log and applying the redo log are faster than in MySQL 5.1 and earlier, due to improved algorithms for memory management. You do not need to take any actions to take advantage of this performance enhancement. If you kept the size of your redo log files artificially low because recovery took a long time, you can consider increasing the file size.

For more information about InnoDB recovery, see Section 14.3.7.1, "The InnoDB Recovery Process".

14.4.7.17. Integration with the MySQL Performance Schema

Starting with InnoDB 1.1 with MySQL 5.5, you can profile certain internal InnoDB operations using the MySQL Performance Schema feature. This type of tuning is primarily for expert users, those who push the limits of MySQL performance, read the MySQL source code, and evaluate optimization strategies to overcome performance bottlenecks. DBAs can also use this feature for capacity planning, to see whether their typical workload encounters any performance bottlenecks with a particular combination of CPU, RAM, and disk storage; and if so, to judge whether performance can be improved by increasing the capacity of some part of the system.

To use this feature to examine InnoDB performance:

You must be running MySQL 5.5 or higher. You must build the database server from source, enabling the Performance Schema feature by building with the --with-perfschema option. Since the Performance Schema feature introduces some performance overhead, you should use it on a test or development system rather than on a production system.
You must be running InnoDB 1.1 or higher.
You must be generally familiar with how to use the Performance Schema feature, for example to query tables in the performance_schema database.
Examine the following kinds of InnoDB objects by querying the appropriate performance_schema tables. The items associated with InnoDB all contain the substring innodb in the NAME column.
For the definitions of the *_instances tables, see Section 21.7.2, "Performance Schema Instance Tables". For the definitions of the *_summary_* tables, see Section 21.7.4, "Performance Schema Summary Tables". For the definition of the thread table, see Section 21.7.5, "Performance Schema Miscellaneous Tables". For the definition of the *_current_* and *_history_* tables, see Section 21.7.3, "Performance Schema Wait Event Tables".
- Mutexes in the mutex_instances table. (Mutexes and RW-locks related to the InnoDB buffer pool are not included in this coverage; the same applies to the output of the SHOW ENGINE INNODB MUTEX command.)
- RW-locks in the rwlock_instances table.
- RW-locks in the rwlock_instances table.
- File I/O operations in the file_instances, file_summary_by_event_name, and file_summary_by_instance tables.
- Threads in the PROCESSLIST table.
During performance testing, examine the performance data in the events_waits_current and events_waits_history_long tables. If you are interested especially in InnoDB-related objects, use the clause where name like "%innodb%" to see just those entries; otherwise, examine the performance statistics for the overall MySQL server.
You must be running MySQL 5.5, with the Performance Schema enabled by building with the --with-perfschema build option.

For more information about the MySQL Performance Schema, see Chapter 21, MySQL Performance Schema.

14.4.7.18. Improvements to Performance from Multiple Buffer Pools

This performance enhancement is primarily useful for people with a large buffer pool size, typically in the multi-gigabyte range. To take advantage of this speedup, you must set the new innodb_buffer_pool_instances configuration option, and you might also adjust the innodb_buffer_pool_size value.

When the InnoDB buffer pool is large, many data requests can be satisfied by retrieving from memory. You might encounter bottlenecks from multiple threads trying to access the buffer pool at once. Starting in InnoDB 1.1 and MySQL 5.5, you can enable multiple buffer pools to minimize this contention. Each page that is stored in or read from the buffer pool is assigned to one of the buffer pools randomly, using a hashing function. Each buffer pool manages its own free lists, flush lists, LRUs, and all other data structures connected to a buffer pool, and is protected by its own buffer pool mutex.

To enable this feature, set the innodb_buffer_pool_instances configuration option to a value greater than 1 (the default) up to 64 (the maximum). This option takes effect only when you set the innodb_buffer_pool_size to a size of 1 gigabyte or more. The total size you specify is divided among all the buffer pools. For best efficiency, specify a combination of innodb_buffer_pool_instances and innodb_buffer_pool_size so that each buffer pool instance is at least 1 gigabyte.

For more information about the InnoDB buffer pool, see Section 8.9.1, "The InnoDB Buffer Pool".

14.4.7.19. Better Scalability with Multiple Rollback Segments

Starting in InnoDB 1.1 with MySQL 5.5, the limit on concurrent transactions is greatly expanded, removing a bottleneck with the InnoDB rollback segment that affected high-capacity systems. The limit applies to concurrent transactions that change any data; read-only transactions do not count against that maximum.

The single rollback segment is now divided into 128 segments, each of which can support up to 1023 transactions that perform writes, for a total of approximately 128K concurrent transactions. The original transaction limit was 1023.

Each transaction is assigned to one of the rollback segments, and remains tied to that rollback segment for the duration. This enhancement improves both scalability (higher number of concurrent transactions) and performance (less contention when different transactions access the rollback segments).

To take advantage of this feature, you do not need to create any new database or tables, or reconfigure anything. You must do a slow shutdown before upgrading from MySQL 5.1 or earlier, or some time afterward. InnoDB makes the required changes inside the system tablespace automatically, the first time you restart after performing a slow shutdown.

For more information about performance of InnoDB under high transactional load, see Section 8.5.2, "Optimizing InnoDB Transaction Management".

14.4.7.20. Better Scalability with Improved Purge Scheduling

Starting in InnoDB 1.1 with MySQL 5.5, the purge operations (a type of garbage collection) that InnoDB performs automatically can be done in a separate thread, rather than as part of the master thread. This change improves scalability, because the main database operations run independently from maintenance work happening in the background.

To enable this feature, set the configuration option innodb_purge_threads=1, as opposed to the default of 0, which combines the purge operation into the master thread.

You might not notice a significant speedup, because the purge thread might encounter new types of contention; the single purge thread really lays the groundwork for further tuning and possibly multiple purge threads in the future. There is another new configuration option, innodb_purge_batch_size with a default of 20 and maximum of 5000. This option is mainly intended for experimentation and tuning of purge operations, and should not be interesting to typical users.

For more information about InnoDB I/O performance, see Section 8.5.7, "Optimizing InnoDB Disk I/O".

14.4.7.21. Improved Log Sys Mutex

This is another performance improvement that comes for free, with no user action or configuration needed. The details here are intended for performance experts who delve into the InnoDB source code, or interpret reports with keywords such as "mutex" and "log_sys".

The mutex known as the log sys mutex has historically done double duty, controlling access to internal data structures related to log records and the LSN, as well as pages in the buffer pool that are changed when a mini-transaction is committed. Starting in InnoDB 1.1 with MySQL 5.5, these two kinds of operations are protected by separate mutexes, with a new log_buf mutex controlling writes to buffer pool pages due to mini-transactions.

For performance considerations for InnoDB locking operations, see Section 8.10, "Optimizing Locking Operations".

14.4.7.22. Separate Flush List Mutex

Starting with InnoDB 1.1 with MySQL 5.5, concurrent access to the buffer pool is faster. Operations involving the flush list, a data structure related to the buffer pool, are now controlled by a separate mutex and do not block access to the buffer pool. You do not need to configure anything to take advantage of this speedup; it is fully automatic.

For more information about the InnoDB buffer pool, see Section 8.9.1, "The InnoDB Buffer Pool".

14.4.8. Changes for Flexibility, Ease of Use and Reliability

14.4.8.1. The Barracuda File Format
14.4.8.2. Dynamic Control of System Configuration Parameters
14.4.8.3. TRUNCATE TABLE Reclaims Space
14.4.8.4. InnoDB Strict Mode
14.4.8.5. Controlling Optimizer Statistics Estimation
14.4.8.6. Better Error Handling when Dropping Indexes
14.4.8.7. More Compact Output of SHOW ENGINE INNODB MUTEX
14.4.8.8. More Read-Ahead Statistics

This chapter describes several recently added InnoDB features that offer new flexibility and improve ease of use, reliability and performance. The Barracuda file format improves efficiency for storing large variable-length columns, and enables table compression. Configuration options that once were unchangeable after startup, are now flexible and can be changed dynamically. Some improvements are automatic, such as faster and more efficient TRUNCATE TABLE. Others allow you the flexibility to control InnoDB behavior; for example, you can control whether certain problems cause errors or just warnings. And informational messages and error reporting continue to be made more user-friendly.

14.4.8.1. The Barracuda File Format

InnoDB has started using named file formats to improve compatibility in upgrade and downgrade situations, or heterogeneous systems running different levels of MySQL. Many important InnoDB features, such as table compression and the DYNAMIC row format for more efficient BLOB storage, require creating tables in the Barracuda file format. The original file format, which previously didn't have a name, is known now as Antelope.

To create new tables that take advantage of the Barracuda features, enable that file format using the configuration parameter innodb_file_format. The value of this parameter determines whether a newly created table or index can use compression or the new DYNAMIC row format.

To preclude the use of new features that would make your database inaccessible to the built-in InnoDB in MySQL 5.1 and prior releases, omit innodb_file_format or set it to Antelope.

You can set the value of innodb_file_format on the command line when you start mysqld, or in the option file my.cnf (Unix operating systems) or my.ini (Windows). You can also change it dynamically with the SET GLOBAL statement.

For more information about managing file formats, see Section 14.4.4, "InnoDB File-Format Management".

14.4.8.2. Dynamic Control of System Configuration Parameters

In MySQL 5.5 and higher, you can change certain system configuration parameters without shutting down and restarting the server, as was necessary in MySQL 5.1 and lower. This increases uptime, and makes it easier to test and prototype new SQL and application code. The following sections explain these parameters.

14.4.8.2.1. Dynamically Changing innodb_file_per_table

Since MySQL version 4.1, InnoDB has provided two alternatives for how tables are stored on disk. You can create a new table and its indexes in the shared system tablespace, physically stored in the ibdata files. Or, you can store a new table and its indexes in a separate tablespace (a .ibd file). The storage layout for each InnoDB table is determined by the configuration parameter innodb_file_per_table at the time the table is created.

In MySQL 5.5 and higher, the configuration parameter innodb_file_per_table is dynamic, and can be set ON or OFF using the SET GLOBAL. Previously, the only way to set this parameter was in the MySQL option file (my.cnf or my.ini), and changing it required shutting down and restarting the server.

The default setting is OFF, so new tables and indexes are created in the system tablespace. Dynamically changing the value of this parameter requires the SUPER privilege and immediately affects the operation of all connections.

Tables created when innodb_file_per_table is enabled can use the Barracuda file format, and TRUNCATE returns the disk space for those tables to the operating system. The Barracuda file format in turn enables features such as table compression and the DYNAMIC row format. Tables created when innodb_file_per_table is off cannot use these features. To take advantage of those features for an existing table, you can turn on the file-per-table setting and run ALTER TABLE t ENGINE=INNODB for that table.

When you redefine the primary key for an InnoDB table, the table is re-created using the current settings for innodb_file_per_table and innodb_file_format. This behavior does not apply when adding or dropping InnoDB secondary indexes, as explained in Section 14.4.2, "Fast Index Creation in the InnoDB Storage Engine". When a secondary index is created without rebuilding the table, the index is stored in the same file as the table data, regardless of the current innodb_file_per_table setting.

14.4.8.2.2. Dynamically Changing innodb_stats_on_metadata

In MySQL 5.5 and higher, you can change the setting of innodb_stats_on_metadata dynamically at runtime, to control whether or not InnoDB performs statistics gathering when metadata statements are executed. To change the setting, issue the statement SET GLOBAL innodb_stats_on_metadata=mode, where mode is either ON or OFF (or 1 or 0). Changing this setting requires the SUPER privilege and immediately affects the operation of all connections.

This setting is related to the feature described in Section 14.4.8.5, "Controlling Optimizer Statistics Estimation".

14.4.8.2.3. Dynamically Changing innodb_lock_wait_timeout

The length of time a transaction waits for a resource, before giving up and rolling back the statement, is determined by the value of the configuration parameter innodb_lock_wait_timeout. (In MySQL 5.0.12 and earlier, the entire transaction was rolled back, not just the statement.) Your application can try the statement again (usually after waiting for a while), or roll back the entire transaction and restart.

The error returned when the timeout period is exceeded is:

ERROR HY000: Lock wait timeout exceeded; try restarting transaction

In MySQL 5.5 and higher, the configuration parameter innodb_lock_wait_timeout can be set at runtime with the SET GLOBAL or SET SESSION statement. Changing the GLOBAL setting requires the SUPER privilege and affects the operation of all clients that subsequently connect. Any client can change the SESSION setting for innodb_lock_wait_timeout, which affects only that client.

In MySQL 5.1 and earlier, the only way to set this parameter was in the MySQL option file (my.cnf or my.ini), and changing it required shutting down and restarting the server.

14.4.8.2.4. Dynamically Changing innodb_adaptive_hash_index

As described in Section 14.4.7.5, "Controlling Adaptive Hash Indexing", it may be desirable, depending on your workload, to dynamically enable or disable the adaptive hash indexing scheme InnoDB uses to improve query performance.

The start-up option innodb_adaptive_hash_index allows the adaptive hash index to be disabled. It is enabled by default. You can modify this parameter through the SET GLOBAL statement, without restarting the server. Changing the setting requires the SUPER privilege.

Disabling the adaptive hash index empties the hash table immediately. Normal operations can continue while the hash table is emptied, and executing queries that were using the hash table access the index B-trees directly instead. When the adaptive hash index is re-enabled, the hash table is populated again during normal operation.

14.4.8.3. TRUNCATE TABLE Reclaims Space

When you truncate a table that is stored in a .ibd file of its own (because innodb_file_per_table was enabled when the table was created), and if the table is not referenced in a FOREIGN KEY constraint, the table is dropped and re-created in a new .ibd file. This operation is much faster than deleting the rows one by one. The operating system can reuse the disk space, in contrast to tables within the InnoDB system tablespace, where only InnoDB can use the space after they are truncated. Physical backups can also be smaller, without big blocks of unused space in the middle of the system tablespace.

Previous versions of InnoDB would re-use the existing .ibd file, thus releasing the space only to InnoDB for storage management, but not to the operating system. Note that when the table is truncated, the count of rows affected by the TRUNCATE TABLE statement is an arbitrary number.

Note

If there is a referential constraint between two columns in the same table, that table can still be truncated using this fast technique.

If there are referential constraints between the table being truncated and other tables, the truncate operation fails. This is a change to the previous behavior, which would transform the TRUNCATE operation to a DELETE operation that removed all the rows and triggered ON DELETE operations on child tables.

14.4.8.4. InnoDB Strict Mode

To guard against ignored typos and syntax errors in SQL, or other unintended consequences of various combinations of operational modes and SQL statements, InnoDB provides a strict mode of operations. In this mode, InnoDB raises error conditions in certain cases, rather than issuing a warning and processing the specified statement (perhaps with unintended behavior). This is analogous to sql_mode in MySQL, which controls what SQL syntax MySQL accepts, and determines whether it silently ignores errors, or validates input syntax and data values. Since strict mode is relatively new, some statements that execute without errors with earlier versions of MySQL might generate errors unless you disable strict mode.

The setting of InnoDB strict mode affects the handling of syntax errors on the CREATE TABLE, ALTER TABLE and CREATE INDEX statements. The strict mode also enables a record size check, so that an INSERT or UPDATE never fails due to the record being too large for the selected page size.

We recommend running in strict mode when using the ROW_FORMAT and KEY_BLOCK_SIZE clauses on CREATE TABLE, ALTER TABLE, and CREATE INDEX statements. Without strict mode, InnoDB ignores conflicting clauses and creates the table or index, with only a warning in the message log. The resulting table might have different behavior than you intended, such as having no compression when you tried to create a compressed table. When InnoDB strict mode is on, such problems generate an immediate error and the table or index is not created, avoiding a troubleshooting session later.

InnoDB strict mode is set with the configuration parameter innodb_strict_mode, which can be specified as on or off. You can set the value on the command line when you start mysqld, or in the configuration file my.cnf or my.ini. You can also enable or disable InnoDB strict mode at run time with the statement SET [GLOBAL|SESSION] innodb_strict_mode=mode, where mode is either ON or OFF. Changing the GLOBAL setting requires the SUPER privilege and affects the operation of all clients that subsequently connect. Any client can change the SESSION setting for innodb_strict_mode, and the setting affects only that client.

14.4.8.5. Controlling Optimizer Statistics Estimation

The MySQL query optimizer uses estimated statistics about key distributions to choose the indexes for an execution plan, based on the relative selectivity of the index. Certain operations cause InnoDB to sample random pages from each index on a table to estimate the cardinality of the index. (This technique is known as random dives.) These operations include the ANALYZE TABLE statement, the SHOW TABLE STATUS statement, and accessing the table for the first time after a restart.

To give you control over the quality of the statistics estimate (and thus better information for the query optimizer), you can now change the number of sampled pages using the parameter innodb_stats_sample_pages. Previously, the number of sampled pages was always 8, which could be insufficient to produce an accurate estimate, leading to poor index choices by the query optimizer. This technique is especially important for large tables and tables used in joins. Unnecessary full table scans for such tables can be a substantial performance issue.

You can set the global parameter innodb_stats_sample_pages, at run time. The default value for this parameter is 8, preserving the same behavior as in past releases.

Note

The value of innodb_stats_sample_pages affects the index sampling for all tables and indexes. There are the following potentially significant impacts when you change the index sample size:

Small values like 1 or 2 can result in very inaccurate estimates of cardinality.
Increasing the innodb_stats_sample_pages value might require more disk reads. Values much larger than 8 (say, 100), can cause a big slowdown in the time it takes to open a table or execute SHOW TABLE STATUS.
The optimizer might choose very different query plans based on different estimates of index selectivity.

To disable the cardinality estimation for metadata statements such as SHOW TABLE STATUS, execute the statement SET GLOBAL innodb_stats_on_metadata=OFF (or 0). The ability to set this option dynamically is also relatively new.

All InnoDB tables are opened, and the statistics are re-estimated for all associated indexes, when the mysql client starts if the auto-rehash setting is set on (the default). To improve the start up time of the mysql client, you can turn auto-rehash off. The auto-rehash feature enables automatic name completion of database, table, and column names for interactive users.

Whatever value of innodb_stats_sample_pages works best for a system, set the option and leave it at that value. Choose a value that results in reasonably accurate estimates for all tables in your database without requiring excessive I/O. Because the statistics are automatically recalculated at various times other than on execution of ANALYZE TABLE, it does not make sense to increase the index sample size, run ANALYZE TABLE, then decrease sample size again. The more accurate statistics calculated by ANALYZE running with a high value of innodb_stats_sample_pages can be wiped away later.

Although it is not possible to specify the sample size on a per-table basis, smaller tables generally require fewer index samples than larger tables do. If your database has many large tables, consider using a higher value for innodb_stats_sample_pages than if you have mostly smaller tables.

14.4.8.6. Better Error Handling when Dropping Indexes

For optimal performance with DML statements, InnoDB requires an index to exist on foreign key columns, so that UPDATE and DELETE operations on a parent table can easily check whether corresponding rows exist in the child table. MySQL creates or drops such indexes automatically when needed, as a side-effect of CREATE TABLE, CREATE INDEX, and ALTER TABLE statements.

When you drop an index, InnoDB checks whether the index is not used for checking a foreign key constraint. It is still OK to drop the index if there is another index that can be used to enforce the same constraint. InnoDB prevents you from dropping the last index that can enforce a particular referential constraint.

The message that reports this error condition is:

ERROR 1553 (HY000): Cannot drop index 'fooIdx':needed in a foreign key constraint

This message is friendlier than the earlier message it replaces:

ERROR 1025 (HY000): Error on rename of './db2/#sql-18eb_3'to './db2/foo'(errno: 150)

A similar change in error reporting applies to an attempt to drop the primary key index. For tables without an explicit PRIMARY KEY, InnoDB creates an implicit clustered index using the first columns of the table that are declared UNIQUE and NOT NULL. When you drop such an index, InnoDB automatically copies the table and rebuilds the index using a different UNIQUE NOT NULL group of columns or a system-generated key. Since this operation changes the primary key, it uses the slow method of copying the table and re-creating the index, rather than the Fast Index Creation technique from Section 14.4.2.3, "Implementation Details of Fast Index Creation".

Previously, an attempt to drop an implicit clustered index (the first UNIQUE NOT NULL index) failed if the table did not contain a PRIMARY KEY:

ERROR 42000: This table type requires a primary key

14.4.8.7. More Compact Output of SHOW ENGINE INNODB MUTEX

The statement SHOW ENGINE INNODB MUTEX displays information about InnoDB mutexes and rw-locks. Although this information is useful for tuning on multi-core systems, the amount of output can be overwhelming on systems with a big buffer pool. There is one mutex and one rw-lock in each 16K buffer pool block, and there are 65,536 blocks per gigabyte. It is unlikely that a single block mutex or rw-lock from the buffer pool could become a performance bottleneck.

SHOW ENGINE INNODB MUTEX now skips the mutexes and rw-locks of buffer pool blocks. It also does not list any mutexes or rw-locks that have never been waited on (os_waits=0). Thus, SHOW ENGINE INNODB MUTEX only displays information about mutexes and rw-locks outside of the buffer pool that have caused at least one OS-level wait.

14.4.8.8. More Read-Ahead Statistics

As described in Section 14.4.7.7, "Changes in the Read-Ahead Algorithm", a read-ahead request is an asynchronous I/O request issued in anticipation that a page will be used in the near future. Knowing how many pages are read through this read-ahead mechanism, and how many of them are evicted from the buffer pool without ever being accessed, can be useful to help fine-tune the parameter innodb_read_ahead_threshold.

SHOW ENGINE INNODB STATUS output displays the global status variables Innodb_buffer_pool_read_ahead and Innodb_buffer_pool_read_ahead_evicted. These variables indicate the number of pages brought into the buffer pool by read-ahead requests, and the number of such pages evicted from the buffer pool without ever being accessed respectively. These counters provide global values since the last server restart.

SHOW ENGINE INNODB INNODB STATUS also shows the rate at which the read-ahead pages are read in and the rate at which such pages are evicted without being accessed. The per-second averages are based on the statistics collected since the last invocation of SHOW ENGINE INNODB INNODB STATUS and are displayed in the BUFFER POOL AND MEMORY section of the output.

Since the InnoDB read-ahead mechanism has been simplified to remove random read-ahead, the status variables Innodb_buffer_pool_read_ahead_rnd and Innodb_buffer_pool_read_ahead_seq are no longer part of the SHOW ENGINE INNODB STATUS output.

14.4.9. Installing the InnoDB Storage Engine

When you use the InnoDB storage engine 1.1 and above, with MySQL 5.5 and above, you do not need to do anything special to install: everything comes configured as part of the MySQL source and binary distributions. This is a change from earlier releases of the InnoDB Plugin, where you were required to match up MySQL and InnoDB version numbers and update your build and configuration processes.

The InnoDB storage engine is included in the MySQL distribution, starting from MySQL 5.1.38. From MySQL 5.1.46 and up, this is the only download location for the InnoDB storage engine; it is not available from the InnoDB web site.

If you used any scripts or configuration files with the earlier InnoDB storage engine from the InnoDB web site, be aware that the filename of the shared library as supplied by MySQL is ha_innodb_plugin.so or ha_innodb_plugin.dll, as opposed to ha_innodb.so or ha_innodb.dll in the older Plugin downloaded from the InnoDB web site. You might need to change the applicable file names in your startup or configuration scripts.

Because the InnoDB storage engine has now replaced the built-in InnoDB, you no longer need to specify options like --ignore-builtin-innodb and --plugin-load during startup.

To take best advantage of current InnoDB features, we recommend specifying the following options in your configuration file:

innodb_file_per_table=1innodb_file_format=barracudainnodb_strict_mode=1

For information about these new features, see Section 14.4.8.2.1, "Dynamically Changing innodb_file_per_table", Section 14.4.4, "InnoDB File-Format Management", and Section 14.4.8.4, "InnoDB Strict Mode". You might need to continue to use the previous values for these parameters in some replication and similar configurations involving both new and older versions of MySQL.

14.4.10. Upgrading the InnoDB Storage Engine

Prior to MySQL 5.5, some upgrade scenarios involved upgrading the separate instance of InnoDB known as the InnoDB Plugin. In MySQL 5.5 and higher, the features of the InnoDB Plugin have been folded back into built-in InnoDB, so the upgrade procedure for InnoDB is the same as the one for the MySQL server. For details, see Section 2.12.1, "Upgrading MySQL".

14.4.11. Downgrading the InnoDB Storage Engine

14.4.11.1. Overview

14.4.11.1. Overview

Prior to MySQL 5.5, some downgrade scenarios involved switching the separate instance of InnoDB known as the InnoDB Plugin back to the built-in InnoDB storage engine. In MySQL 5.5 and higher, the features of the InnoDB Plugin have been folded back into built-in InnoDB, so the downgrade procedure for InnoDB is the same as the one for the MySQL server. For details, see Section 2.12.2, "Downgrading MySQL".

14.4.12. InnoDB Storage Engine Change History

14.4.12.1. Changes in InnoDB Storage Engine 1.x
14.4.12.2. Changes in InnoDB Storage Engine 1.1 (April 13, 2010)
14.4.12.3. Changes in InnoDB Plugin 1.0.x

14.4.12.1. Changes in InnoDB Storage Engine 1.x

Since InnoDB 1.1 is tightly integrated with MySQL 5.5, for changes after the initial InnoDB 1.1 release, see the MySQL 5.5 Release Notes.

14.4.12.2. Changes in InnoDB Storage Engine 1.1 (April 13, 2010)

For an overview of the changes, see this introduction article for MySQL 5.5 with InnoDB 1.1. The following is a condensed version of the change log.

Fix for Bug #52580: Crash in ha_innobase::open on executing INSERT with concurrent ALTER TABLE.

Change in MySQL Bug #51557 releases the mutex LOCK_open before ha_innobase::open(), causing racing condition for index translation table creation. Fix it by adding dict_sys mutex for the operation.

Add support for multiple buffer pools.

Fix Bug #26590: MySQL does not allow more than 1023 open transactions. Create additional rollback segments on startup. Reduce the upper limit of total rollback segments from 256 to 128. This is because we can't use the sign bit. It has not caused problems in the past because we only created one segment. InnoDB has always had the capability to use the additional rollback segments, therefore this patch is backward compatible. The only requirement to maintain backward compatibility has been to ensure that the additional segments are created after the double write buffer. This is to avoid breaking assumptions in the existing code.

Implement Performance Schema in InnoDB. Objects in four different modules in InnoDB have been performance instrumented, these modules are: mutexes, rwlocks, file I/O, and threads. We mostly preserved the existing APIs, but APIs would point to instrumented function wrappers if performance schema is defined. There are 4 different defines that control the instrumentation of each module. The feature is off by default, and will be compiled in with a special build option, and require a configure option to turn it on when the server boots.

Implement the buf_pool_watch for DeleteBuffering in the page hash table. This serves two purposes. It allows multiple watches to be set at the same time (by multiple purge threads) and it removes a race condition when the read of a block completes about the time the buffer pool watch is being set.

Introduce a new mutex to protect flush_list. Redesign mtr_commit() in a way that log_sys mutex is not held while all mtr_memos are popped and is released just after the modified blocks are inserted into the flush_list. This should reduce contention on log_sys mutex.

Implement the global variable innodb_change_buffering, with the following values:

none: buffer nothing
inserts: buffer inserts (like InnoDB so far)
deletes: buffer delete-marks
changes: buffer inserts and delete-marks
purges: buffer delete-marks and deletes
all: buffer all operations (insert, delete-mark, delete)

Provide support for native AIO on Linux.

14.4.12.3. Changes in InnoDB Plugin 1.0.x

The InnoDB 1.0.x releases that accompany MySQL 5.1 have their own change history. Changes up to InnoDB 1.0.8 are listed at http://dev.mysql.com/doc/innodb-plugin/1.0/en/innodb-changes.html. Changes from InnoDB 1.0.9 and up are incorporated in the MySQL 5.1 Release Notes.

14.4.13. Third-Party Software

14.4.13.1. Performance Patches from Google
14.4.13.2. Multiple Background I/O Threads Patch from Percona
14.4.13.3. Performance Patches from Sun Microsystems

Oracle acknowledges that certain Third Party and Open Source software has been used to develop or is incorporated in the InnoDB storage engine. This appendix includes required third-party license information.

14.4.13.1. Performance Patches from Google

Oracle gratefully acknowledges the following contributions from Google, Inc. to improve InnoDB performance:

Replacing InnoDB's use of Pthreads mutexes with calls to GCC atomic builtins, as discussed in Section 14.4.7.2, "Faster Locking for Improved Scalability". This change means that InnoDB mutex and and rw-lock operations take less CPU time, and improves throughput on those platforms where the atomic operations are available.
Controlling master thread I/O rate, as discussed in Section 14.4.7.11, "Controlling the Master Thread I/O Rate". The master thread in InnoDB is a thread that performs various tasks in the background. Historically, InnoDB has used a hard coded value as the total I/O capacity of the server. With this change, user can control the number of I/O operations that can be performed per second based on their own workload.

Changes from the Google contributions were incorporated in the following source code files: btr0cur.c, btr0sea.c, buf0buf.c, buf0buf.ic, ha_innodb.cc, log0log.c, log0log.h, os0sync.h, row0sel.c, srv0srv.c, srv0srv.h, srv0start.c, sync0arr.c, sync0rw.c, sync0rw.h, sync0rw.ic, sync0sync.c, sync0sync.h, sync0sync.ic, and univ.i.

These contributions are incorporated subject to the conditions contained in the file COPYING.Google, which are reproduced here.

Copyright (c) 2008, 2009, Google Inc.All rights reserved.Redistribution and use in source and binary forms, with or withoutmodification, are permitted provided that the following conditionsare met: * Redistributions of source code must retain the above copyright  notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above  copyright notice, this list of conditions and the following  disclaimer in the documentation and/or other materials  provided with the distribution. * Neither the name of the Google Inc. nor the names of its  contributors may be used to endorse or promote products  derived from this software without specific prior written  permission.THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOTLIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESSFOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THECOPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVERCAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICTLIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING INANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THEPOSSIBILITY OF SUCH DAMAGE.

14.4.13.2. Multiple Background I/O Threads Patch from Percona

Oracle gratefully acknowledges the contribution of Percona, Inc. to improve InnoDB performance by implementing configurable background threads, as discussed in Section 14.4.7.8, "Multiple Background I/O Threads". InnoDB uses background threads to service various types of I/O requests. The change provides another way to make InnoDB more scalable on high end systems.

Changes from the Percona, Inc. contribution were incorporated in the following source code files: ha_innodb.cc, os0file.c, os0file.h, srv0srv.c, srv0srv.h, and srv0start.c.

This contribution is incorporated subject to the conditions contained in the file COPYING.Percona, which are reproduced here.

Copyright (c) 2008, 2009, Percona Inc.All rights reserved.Redistribution and use in source and binary forms, with or withoutmodification, are permitted provided that the following conditionsare met: * Redistributions of source code must retain the above copyright  notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above  copyright notice, this list of conditions and the following  disclaimer in the documentation and/or other materials  provided with the distribution. * Neither the name of the Percona Inc. nor the names of its  contributors may be used to endorse or promote products  derived from this software without specific prior written  permission.THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOTLIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESSFOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THECOPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVERCAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICTLIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING INANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THEPOSSIBILITY OF SUCH DAMAGE.

14.4.13.3. Performance Patches from Sun Microsystems

Oracle gratefully acknowledges the following contributions from Sun Microsystems, Inc. to improve InnoDB performance:

Introducing the PAUSE instruction inside spin loops, as discussed in Section 14.4.7.13, "Using the PAUSE Instruction in InnoDB Spin Loops". This change increases performance in high concurrency, CPU-bound workloads.
Enabling inlining of functions and prefetch with Sun Studio.

Changes from the Sun Microsystems, Inc. contribution were incorporated in the following source code files: univ.i, ut0ut.c, and ut0ut.h.

This contribution is incorporated subject to the conditions contained in the file COPYING.Sun_Microsystems, which are reproduced here.

Copyright (c) 2009, Sun Microsystems, Inc.All rights reserved.Redistribution and use in source and binary forms, with or withoutmodification, are permitted provided that the following conditionsare met: * Redistributions of source code must retain the above copyright  notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above  copyright notice, this list of conditions and the following  disclaimer in the documentation and/or other materials  provided with the distribution. * Neither the name of Sun Microsystems, Inc. nor the names of its  contributors may be used to endorse or promote products  derived from this software without specific prior written  permission.THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOTLIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESSFOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THECOPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVERCAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICTLIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING INANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THEPOSSIBILITY OF SUCH DAMAGE.

14.4.14. List of Parameters Changed in InnoDB 1.1 and InnoDB Plugin 1.0

14.4.14.1. New Parameters
14.4.14.2. Deprecated Parameters
14.4.14.3. Parameters with New Defaults

14.4.14.1. New Parameters

Throughout the course of development, InnoDB 1.1 and its predecessor the InnoDB Plugin introduced new configuration parameters. The following table summarizes those parameters:

Table 14.8. InnoDB 1.1 New Parameter Summary

Name	Cmd-Line	Option File	System Var	Scope	Dynamic	Default
innodb_adaptive_flushing	YES	YES	YES	GLOBAL	YES	TRUE
innodb_buffer_pool_instances	YES	YES	YES	GLOBAL	YES	TRUE
innodb_change_buffering	YES	YES	YES	GLOBAL	YES	inserts
innodb_file_format	YES	YES	YES	GLOBAL	YES	Antelope
innodb_file_format_check	YES	YES	YES	GLOBAL	NO	1
innodb_file_format_max	YES	YES	YES	GLOBAL	YES	Antelope for a new database; Barracuda if any tables using that fileformat exist in the database
innodb_io_capacity	YES	YES	YES	GLOBAL	YES	200
innodb_old_blocks_pct	YES	YES	YES	GLOBAL	YES	37
innodb_old_blocks_time	YES	YES	YES	GLOBAL	YES	0
innodb_purge_batch_size	YES	YES	YES	GLOBAL	YES	0
innodb_purge_threads	YES	YES	YES	GLOBAL	YES	0
innodb_read_ahead_threshold	YES	YES	YES	GLOBAL	YES	56
innodb_read_io_threads	YES	YES	YES	GLOBAL	NO	4
innodb_spin_wait_delay	YES	YES	YES	GLOBAL	YES	6
innodb_stats_sample_pages	YES	YES	YES	GLOBAL	YES	8
innodb_strict_mode	YES	YES	YES	GLOBAL\|SESSION	YES	FALSE
innodb_use_native_aio	YES	YES	YES	GLOBAL	NO	TRUE
innodb_use_sys_malloc	YES	YES	YES	GLOBAL	NO	TRUE
innodb_write_io_threads	YES	YES	YES	GLOBAL	NO	4

innodb_adaptive_flushing
Whether InnoDB uses a new algorithm to estimate the required rate of flushing. The default value is TRUE. This parameter was added in InnoDB storage engine 1.0.4. See Section 14.4.7.12, "Controlling the Flushing Rate of Dirty Pages" for more information.
innodb_change_buffering
Whether InnoDB performs insert buffering. The default value is "inserts" (buffer insert operations). This parameter was added in InnoDB storage engine 1.0.3. See Section 14.4.7.4, "Controlling InnoDB Change Buffering" for more information.
innodb_file_format
The default file format for new InnoDB tables. The default is Antelope. To enable support for table compression, change it to Barracuda. This parameter was added in InnoDB storage engine 1.0.1. See Section 14.4.4.1, "Enabling File Formats" for more information.
innodb_file_format_check and innodb_file_format_max
Controls whether InnoDB performs file format compatibility checking when opening a database. The default value is innodb-file-format-check=1, with innodb_file_format_max set to the highest format that is used in the database (either Barracuda or Antelope). See Section 14.4.4.2.1, "Compatibility Check When InnoDB Is Started" for more information.
innodb_io_capacity
The number of I/O operations that can be performed per second. The allowable value range is any number 100 or greater, and the default value is 200. This parameter was added in InnoDB storage engine 1.0.4. To reproduce the earlier behavior, use a value of 100. See Section 14.4.7.11, "Controlling the Master Thread I/O Rate" for more information.
innodb_old_blocks_pct
Controls the desired percentage of "old" blocks in the LRU list of the buffer pool. The default value is 37 and the allowable value range is 5 to 95. This parameter was added in InnoDB storage engine 1.0.5. See Section 14.4.7.15, "Making the Buffer Pool Scan Resistant" for more information.
innodb_old_blocks_time
The time in milliseconds since the first access to a block during which it can be accessed again without being made "young". The default value is 0 which means that blocks are moved to the "young" end of the LRU list at the first access. This parameter was added in InnoDB storage engine 1.0.5. See Section 14.4.7.15, "Making the Buffer Pool Scan Resistant" for more information.
innodb_read_ahead_threshold
Control the sensitivity of the linear read ahead. The allowable value range is 0 to 64 and the default value is 56. This parameter was added in InnoDB storage engine 1.0.4. See Section 14.4.7.7, "Changes in the Read-Ahead Algorithm" for more information.
innodb_read_io_threads
The number of background I/O threads used for reads. The allowable value range is 1 to 64 and the default value is 4. This parameter was added in InnoDB storage engine 1.0.4. See Section 14.4.7.8, "Multiple Background I/O Threads" for more information.
innodb_spin_wait_delay
Maximum delay between polling for a spin lock. The allowable value range is 0 (meaning unlimited) or positive integers and the default value is 6. This parameter was added in InnoDB storage engine 1.0.4. See Section 14.4.7.14, "Control of Spin Lock Polling" for more information.
innodb_stats_sample_pages
The number of index pages to sample when calculating statistics. The allowable value range is 1-unlimited and the default value is 8. This parameter was added in InnoDB storage engine 1.0.2. See Section 14.4.8.5, "Controlling Optimizer Statistics Estimation" for more information.
innodb_strict_mode
Whether InnoDB raises error conditions in certain cases, rather than issuing a warning. This parameter was added in InnoDB storage engine 1.0.2. See Section 14.4.8.4, "InnoDB Strict Mode" for more information.
innodb_use_sys_malloc
Whether InnoDB uses its own memory allocator or an allocator of the operating system. The default value is ON (use an allocator of the underlying system). This parameter was added in InnoDB storage engine 1.0.3. See Section 14.4.7.3, "Using Operating System Memory Allocators" for more information.
innodb_write_io_threads
The number of background I/O threads used for writes. The allowable value range is 1 to 64 and the default value is 4. This parameter was added in InnoDB storage engine 1.0.4. See Section 14.4.7.8, "Multiple Background I/O Threads" for more information.

14.4.14.2. Deprecated Parameters

Beginning in InnoDB storage engine 1.0.4, the following configuration parameter has been removed:

innodb_file_io_threads
This parameter has been replaced by two new parameters innodb_read_io_threads and innodb_write_io_threads. See Section 14.4.7.8, "Multiple Background I/O Threads" for more information.

14.4.14.3. Parameters with New Defaults

For better out-of-the-box performance, the following InnoDB configuration parameters have new default values since MySQL 5.1:

Table 14.9. InnoDB Parameters with New Defaults

Name	Old Default	New Default
innodb_additional_mem_pool_size	1MB	8MB
innodb_buffer_pool_size	8MB	128MB
innodb_change_buffering	inserts	all
innodb_file_format_check	ON	1
innodb_log_buffer_size	1MB	8MB
innodb_max_dirty_pages_pct	90	75
innodb_sync_spin_loops	20	30
innodb_thread_concurrency	8	0