Cari di MySQL 
    MySQL Reference Manual
Daftar Isi
(Sebelumnya) 13. SQL Statement Syntax13.3. MySQL Transactional and ... (Berikutnya)

13.2. Data Manipulation Statements

13.2.1. CALL Syntax

CALL sp_name([parameter[,...]])CALL sp_name[()]

The CALL statement invokes a stored procedure that was defined previously with CREATE PROCEDURE.

Stored procedures that take no arguments can be invoked without parentheses. That is, CALL p() and CALL p are equivalent.

CALL can pass back values to its caller using parameters that are declared as OUT or INOUT parameters. When the procedure returns, a client program can also obtain the number of rows affected for the final statement executed within the routine: At the SQL level, call the ROW_COUNT() function; from the C API, call the mysql_affected_rows() function.

To get back a value from a procedure using an OUT or INOUT parameter, pass the parameter by means of a user variable, and then check the value of the variable after the procedure returns. (If you are calling the procedure from within another stored procedure or function, you can also pass a routine parameter or local routine variable as an IN or INOUT parameter.) For an INOUT parameter, initialize its value before passing it to the procedure. The following procedure has an OUT parameter that the procedure sets to the current server version, and an INOUT value that the procedure increments by one from its current value:

CREATE PROCEDURE p (OUT ver_param VARCHAR(25), INOUT incr_param INT)BEGIN  # Set value of OUT parameter  SELECT VERSION() INTO ver_param;  # Increment value of INOUT parameter  SET incr_param = incr_param + 1;END;

Before calling the procedure, initialize the variable to be passed as the INOUT parameter. After calling the procedure, the values of the two variables will have been set or modified:

mysql> SET @increment = 10;mysql> CALL p(@version, @increment);mysql> SELECT @version, @increment;+--------------+------------+| @version | @increment |+--------------+------------+| 5.5.3-m3-log | 11 |+--------------+------------+

In prepared CALL statements used with PREPARE and EXECUTE, placeholders can be used for IN parameters. For OUT and INOUT parameters, placeholder support is available as of MySQL 5.5.3. These types of parameters can be used as follows:

mysql> SET @increment = 10;mysql> PREPARE s FROM 'CALL p(?, ?)';mysql> EXECUTE s USING @version, @increment;mysql> SELECT @version, @increment;+--------------+------------+| @version | @increment |+--------------+------------+| 5.5.3-m3-log | 11 |+--------------+------------+

Before MySQL 5.5.3, placeholder support is not available for OUT or INOUT parameters. To work around this limitation for OUT and INOUT parameters, forego the use of placeholders; instead, refer to user variables in the CALL statement itself and do not specify them in the EXECUTE statement:

mysql> SET @increment = 10;mysql> PREPARE s FROM 'CALL p(@version, @increment)';mysql> EXECUTE s;mysql> SELECT @version, @increment;+--------------+------------+| @version | @increment |+--------------+------------+| 5.5.0-m2-log | 11 |+--------------+------------+

To write C programs that use the CALL SQL statement to execute stored procedures that produce result sets, the CLIENT_MULTI_RESULTS flag must be enabled. This is because each CALL returns a result to indicate the call status, in addition to any result sets that might be returned by statements executed within the procedure. CLIENT_MULTI_RESULTS must also be enabled if CALL is used to execute any stored procedure that contains prepared statements. It cannot be determined when such a procedure is loaded whether those statements will produce result sets, so it is necessary to assume that they will.

CLIENT_MULTI_RESULTS can be enabled when you call mysql_real_connect(), either explicitly by passing the CLIENT_MULTI_RESULTS flag itself, or implicitly by passing CLIENT_MULTI_STATEMENTS (which also enables CLIENT_MULTI_RESULTS). As of MySQL 5.5.3, CLIENT_MULTI_RESULTS is enabled by default.

To process the result of a CALL statement executed using mysql_query() or mysql_real_query(), use a loop that calls mysql_next_result() to determine whether there are more results. For an example, see Section 22.8.13, "C API Support for Multiple Statement Execution".

For programs written in a language that provides a MySQL interface, there is no native method prior to MySQL 5.5.3 for directly retrieving the results of OUT or INOUT parameters from CALL statements. To get the parameter values, pass user-defined variables to the procedure in the CALL statement and then execute a SELECT statement to produce a result set containing the variable values. To handle an INOUT parameter, execute a statement prior to the CALL that sets the corresponding user variable to the value to be passed to the procedure.

The following example illustrates the technique (without error checking) for the stored procedure p described earlier that has an OUT parameter and an INOUT parameter:

mysql_query(mysql, "SET @increment = 10");mysql_query(mysql, "CALL p(@version, @increment)");mysql_query(mysql, "SELECT @version, @increment");result = mysql_store_result(mysql);row = mysql_fetch_row(result);mysql_free_result(result);

After the preceding code executes, row[0] and row[1] contain the values of @version and @increment, respectively.

As of MySQL 5.5.3, C programs can use the prepared-statement interface to execute CALL statements and access OUT and INOUT parameters. This is done by processing the result of a CALL statement using a loop that calls mysql_stmt_next_result() to determine whether there are more results. For an example, see Section 22.8.16, "C API Support for Prepared CALL Statements". Languages that provide a MySQL interface can use prepared CALL statements to directly retrieve OUT and INOUT procedure parameters.

13.2.2. DELETE Syntax

Single-table syntax:

DELETE [LOW_PRIORITY] [QUICK] [IGNORE] FROM tbl_name [WHERE where_condition] [ORDER BY ...] [LIMIT row_count]

Multiple-table syntax:

DELETE [LOW_PRIORITY] [QUICK] [IGNORE] tbl_name[.*] [, tbl_name[.*]] ... FROM table_references [WHERE where_condition]

Or:

DELETE [LOW_PRIORITY] [QUICK] [IGNORE] FROM tbl_name[.*] [, tbl_name[.*]] ... USING table_references [WHERE where_condition]

For the single-table syntax, the DELETE statement deletes rows from tbl_name and returns a count of the number of deleted rows. This count can be obtained by calling the ROW_COUNT() function (see Section 12.14, "Information Functions"). The WHERE clause, if given, specifies the conditions that identify which rows to delete. With no WHERE clause, all rows are deleted. If the ORDER BY clause is specified, the rows are deleted in the order that is specified. The LIMIT clause places a limit on the number of rows that can be deleted.

For the multiple-table syntax, DELETE deletes from each tbl_name the rows that satisfy the conditions. In this case, ORDER BY and LIMIT cannot be used.

where_condition is an expression that evaluates to true for each row to be deleted. It is specified as described in Section 13.2.9, "SELECT Syntax".

Currently, you cannot delete from a table and select from the same table in a subquery.

You need the DELETE privilege on a table to delete rows from it. You need only the SELECT privilege for any columns that are only read, such as those named in the WHERE clause.

As stated, a DELETE statement with no WHERE clause deletes all rows. A faster way to do this, when you do not need to know the number of deleted rows, is to use TRUNCATE TABLE. However, within a transaction or if you have a lock on the table, TRUNCATE TABLE cannot be used whereas DELETE can. See Section 13.1.33, "TRUNCATE TABLE Syntax", and Section 13.3.5, "LOCK TABLES and UNLOCK TABLES Syntax".

If you delete the row containing the maximum value for an AUTO_INCREMENT column, the value is not reused for a MyISAM or InnoDB table. If you delete all rows in the table with DELETE FROM tbl_name (without a WHERE clause) in autocommit mode, the sequence starts over for all storage engines except InnoDB and MyISAM. There are some exceptions to this behavior for InnoDB tables, as discussed in Section 14.3.5.3, "AUTO_INCREMENT Handling in InnoDB".

For MyISAM tables, you can specify an AUTO_INCREMENT secondary column in a multiple-column key. In this case, reuse of values deleted from the top of the sequence occurs even for MyISAM tables. See Section 3.6.9, "Using AUTO_INCREMENT".

The DELETE statement supports the following modifiers:

  • If you specify LOW_PRIORITY, the server delays execution of the DELETE until no other clients are reading from the table. This affects only storage engines that use only table-level locking (such as MyISAM, MEMORY, and MERGE).

  • For MyISAM tables, if you use the QUICK keyword, the storage engine does not merge index leaves during delete, which may speed up some kinds of delete operations.

  • The IGNORE keyword causes MySQL to ignore all errors during the process of deleting rows. (Errors encountered during the parsing stage are processed in the usual manner.) Errors that are ignored due to the use of IGNORE are returned as warnings.

The speed of delete operations may also be affected by factors discussed in Section 8.2.2.3, "Speed of DELETE Statements".

In MyISAM tables, deleted rows are maintained in a linked list and subsequent INSERT operations reuse old row positions. To reclaim unused space and reduce file sizes, use the OPTIMIZE TABLE statement or the myisamchk utility to reorganize tables. OPTIMIZE TABLE is easier to use, but myisamchk is faster. See Section 13.7.2.4, "OPTIMIZE TABLE Syntax", and Section 4.6.3, "myisamchk - MyISAM Table-Maintenance Utility".

The QUICK modifier affects whether index leaves are merged for delete operations. DELETE QUICK is most useful for applications where index values for deleted rows are replaced by similar index values from rows inserted later. In this case, the holes left by deleted values are reused.

DELETE QUICK is not useful when deleted values lead to underfilled index blocks spanning a range of index values for which new inserts occur again. In this case, use of QUICK can lead to wasted space in the index that remains unreclaimed. Here is an example of such a scenario:

  1. Create a table that contains an indexed AUTO_INCREMENT column.

  2. Insert many rows into the table. Each insert results in an index value that is added to the high end of the index.

  3. Delete a block of rows at the low end of the column range using DELETE QUICK.

In this scenario, the index blocks associated with the deleted index values become underfilled but are not merged with other index blocks due to the use of QUICK. They remain underfilled when new inserts occur, because new rows do not have index values in the deleted range. Furthermore, they remain underfilled even if you later use DELETE without QUICK, unless some of the deleted index values happen to lie in index blocks within or adjacent to the underfilled blocks. To reclaim unused index space under these circumstances, use OPTIMIZE TABLE.

If you are going to delete many rows from a table, it might be faster to use DELETE QUICK followed by OPTIMIZE TABLE. This rebuilds the index rather than performing many index block merge operations.

The MySQL-specific LIMIT row_count option to DELETE tells the server the maximum number of rows to be deleted before control is returned to the client. This can be used to ensure that a given DELETE statement does not take too much time. You can simply repeat the DELETE statement until the number of affected rows is less than the LIMIT value.

If the DELETE statement includes an ORDER BY clause, rows are deleted in the order specified by the clause. This is useful primarily in conjunction with LIMIT. For example, the following statement finds rows matching the WHERE clause, sorts them by timestamp_column, and deletes the first (oldest) one:

DELETE FROM somelog WHERE user = 'jcole'ORDER BY timestamp_column LIMIT 1;

ORDER BY may also be useful in some cases to delete rows in an order required to avoid referential integrity violations.

If you are deleting many rows from a large table, you may exceed the lock table size for an InnoDB table. To avoid this problem, or simply to minimize the time that the table remains locked, the following strategy (which does not use DELETE at all) might be helpful:

  1. Select the rows not to be deleted into an empty table that has the same structure as the original table:

    INSERT INTO t_copy SELECT * FROM t WHERE ... ;
  2. Use RENAME TABLE to atomically move the original table out of the way and rename the copy to the original name:

    RENAME TABLE t TO t_old, t_copy TO t;
  3. Drop the original table:

    DROP TABLE t_old;

No other sessions can access the tables involved while RENAME TABLE executes, so the rename operation is not subject to concurrency problems. See Section 13.1.32, "RENAME TABLE Syntax".

You can specify multiple tables in a DELETE statement to delete rows from one or more tables depending on the particular condition in the WHERE clause. However, you cannot use ORDER BY or LIMIT in a multiple-table DELETE. The table_references clause lists the tables involved in the join. Its syntax is described in Section 13.2.9.2, "JOIN Syntax".

For the first multiple-table syntax, only matching rows from the tables listed before the FROM clause are deleted. For the second multiple-table syntax, only matching rows from the tables listed in the FROM clause (before the USING clause) are deleted. The effect is that you can delete rows from many tables at the same time and have additional tables that are used only for searching:

DELETE t1, t2 FROM t1 INNER JOIN t2 INNER JOIN t3WHERE t1.id=t2.id AND t2.id=t3.id;

Or:

DELETE FROM t1, t2 USING t1 INNER JOIN t2 INNER JOIN t3WHERE t1.id=t2.id AND t2.id=t3.id;

These statements use all three tables when searching for rows to delete, but delete matching rows only from tables t1 and t2.

The preceding examples use INNER JOIN, but multiple-table DELETE statements can use other types of join permitted in SELECT statements, such as LEFT JOIN. For example, to delete rows that exist in t1 that have no match in t2, use a LEFT JOIN:

DELETE t1 FROM t1 LEFT JOIN t2 ON t1.id=t2.id WHERE t2.id IS NULL;

The syntax permits .* after each tbl_name for compatibility with Access.

If you use a multiple-table DELETE statement involving InnoDB tables for which there are foreign key constraints, the MySQL optimizer might process tables in an order that differs from that of their parent/child relationship. In this case, the statement fails and rolls back. Instead, you should delete from a single table and rely on the ON DELETE capabilities that InnoDB provides to cause the other tables to be modified accordingly.

Note

If you declare an alias for a table, you must use the alias when referring to the table:

DELETE t1 FROM test AS t1, test2 WHERE ...

Table aliases in a multiple-table DELETE should be declared only in the table_references part of the statement.

Correct:

DELETE a1, a2 FROM t1 AS a1 INNER JOIN t2 AS a2WHERE a1.id=a2.id;DELETE FROM a1, a2 USING t1 AS a1 INNER JOIN t2 AS a2WHERE a1.id=a2.id;

Incorrect:

DELETE t1 AS a1, t2 AS a2 FROM t1 INNER JOIN t2WHERE a1.id=a2.id;DELETE FROM t1 AS a1, t2 AS a2 USING t1 INNER JOIN t2WHERE a1.id=a2.id;

Declaration of aliases other than in the table_references part of the statement should be avoided because that can lead to ambiguous statements that have unexpected results such as deleting rows from the wrong table. This is such a statement:

DELETE t1 AS a2 FROM t1 AS a1 INNER JOIN t2 AS a2;

Before MySQL 5.5.3, alias declarations outside the table_references part of the statement are disallowed for the USING variant of multiple-table DELETE syntax. As of MySQL 5.5.3, alias declarations outside table_references are disallowed for all multiple-table DELETE statements.

Before MySQL 5.5.3, for alias references in the list of tables from which to delete rows in a multiple-table delete, the default database is used unless one is specified explicitly. For example, if the default database is db1, the following statement does not work because the unqualified alias reference a2 is interpreted as having a database of db1:

DELETE a1, a2 FROM db1.t1 AS a1 INNER JOIN db2.t2 AS a2WHERE a1.id=a2.id;

To correctly match an alias that refers to a table outside the default database, you must explicitly qualify the reference with the name of the proper database:

DELETE a1, db2.a2 FROM db1.t1 AS a1 INNER JOIN db2.t2 AS a2WHERE a1.id=a2.id;

As of MySQL 5.5.3, alias resolution does not require qualification and alias references should not be qualified with the database name. Qualified names are interpreted as referring to tables, not aliases.

13.2.3. DO Syntax

DO expr [, expr] ...

DO executes the expressions but does not return any results. In most respects, DO is shorthand for SELECT expr, ..., but has the advantage that it is slightly faster when you do not care about the result.

DO is useful primarily with functions that have side effects, such as RELEASE_LOCK().

13.2.4. HANDLER Syntax

HANDLER tbl_name OPEN [ [AS] alias]HANDLER tbl_name READ index_name { = | <= | >= | < | > } (value1,value2,...) [ WHERE where_condition ] [LIMIT ... ]HANDLER tbl_name READ index_name { FIRST | NEXT | PREV | LAST } [ WHERE where_condition ] [LIMIT ... ]HANDLER tbl_name READ { FIRST | NEXT } [ WHERE where_condition ] [LIMIT ... ]HANDLER tbl_name CLOSE

The HANDLER statement provides direct access to table storage engine interfaces. It is available for InnoDB and MyISAM tables.

The HANDLER ... OPEN statement opens a table, making it accessible using subsequent HANDLER ... READ statements. This table object is not shared by other sessions and is not closed until the session calls HANDLER ... CLOSE or the session terminates. If you open the table using an alias, further references to the open table with other HANDLER statements must use the alias rather than the table name.

The first HANDLER ... READ syntax fetches a row where the index specified satisfies the given values and the WHERE condition is met. If you have a multiple-column index, specify the index column values as a comma-separated list. Either specify values for all the columns in the index, or specify values for a leftmost prefix of the index columns. Suppose that an index my_idx includes three columns named col_a, col_b, and col_c, in that order. The HANDLER statement can specify values for all three columns in the index, or for the columns in a leftmost prefix. For example:

HANDLER ... READ my_idx = (col_a_val,col_b_val,col_c_val) ...HANDLER ... READ my_idx = (col_a_val,col_b_val) ...HANDLER ... READ my_idx = (col_a_val) ...

To employ the HANDLER interface to refer to a table's PRIMARY KEY, use the quoted identifier `PRIMARY`:

HANDLER tbl_name READ `PRIMARY` ...

The second HANDLER ... READ syntax fetches a row from the table in index order that matches the WHERE condition.

The third HANDLER ... READ syntax fetches a row from the table in natural row order that matches the WHERE condition. It is faster than HANDLER tbl_name READ index_name when a full table scan is desired. Natural row order is the order in which rows are stored in a MyISAM table data file. This statement works for InnoDB tables as well, but there is no such concept because there is no separate data file.

Without a LIMIT clause, all forms of HANDLER ... READ fetch a single row if one is available. To return a specific number of rows, include a LIMIT clause. It has the same syntax as for the SELECT statement. See Section 13.2.9, "SELECT Syntax".

HANDLER ... CLOSE closes a table that was opened with HANDLER ... OPEN.

There are several reasons to use the HANDLER interface instead of normal SELECT statements:

  • HANDLER is faster than SELECT:

    • A designated storage engine handler object is allocated for the HANDLER ... OPEN. The object is reused for subsequent HANDLER statements for that table; it need not be reinitialized for each one.

    • There is less parsing involved.

    • There is no optimizer or query-checking overhead.

    • The handler interface does not have to provide a consistent look of the data (for example, dirty reads are permitted), so the storage engine can use optimizations that SELECT does not normally permit.

  • HANDLER makes it easier to port to MySQL applications that use a low-level ISAM-like interface.

  • HANDLER enables you to traverse a database in a manner that is difficult (or even impossible) to accomplish with SELECT. The HANDLER interface is a more natural way to look at data when working with applications that provide an interactive user interface to the database.

HANDLER is a somewhat low-level statement. For example, it does not provide consistency. That is, HANDLER ... OPEN does not take a snapshot of the table, and does not lock the table. This means that after a HANDLER ... OPEN statement is issued, table data can be modified (by the current session or other sessions) and these modifications might be only partially visible to HANDLER ... NEXT or HANDLER ... PREV scans.

An open handler can be closed and marked for reopen, in which case the handler loses its position in the table. This occurs when both of the following circumstances are true:

  • Any session executes FLUSH TABLES or DDL statements on the handler's table.

  • The session in which the handler is open executes non-HANDLER statements that use tables.

TRUNCATE TABLE for a table closes all handlers for the table that were opened with HANDLER OPEN.

If a table is flushed with FLUSH TABLES tbl_name WITH READ LOCK was opened with HANDLER, the handler is implicitly flushed and loses its position.

13.2.5. INSERT Syntax

INSERT [LOW_PRIORITY | DELAYED | HIGH_PRIORITY] [IGNORE] [INTO] tbl_name [(col_name,...)] {VALUES | VALUE} ({expr | DEFAULT},...),(...),... [ ON DUPLICATE KEY UPDATE  col_name=expr [, col_name=expr] ... ]

Or:

INSERT [LOW_PRIORITY | DELAYED | HIGH_PRIORITY] [IGNORE] [INTO] tbl_name SET col_name={expr | DEFAULT}, ... [ ON DUPLICATE KEY UPDATE  col_name=expr [, col_name=expr] ... ]

Or:

INSERT [LOW_PRIORITY | HIGH_PRIORITY] [IGNORE] [INTO] tbl_name [(col_name,...)] SELECT ... [ ON DUPLICATE KEY UPDATE  col_name=expr [, col_name=expr] ... ]

INSERT inserts new rows into an existing table. The INSERT ... VALUES and INSERT ... SET forms of the statement insert rows based on explicitly specified values. The INSERT ... SELECT form inserts rows selected from another table or tables. INSERT ... SELECT is discussed further in Section 13.2.5.1, "INSERT ... SELECT Syntax".

You can use REPLACE instead of INSERT to overwrite old rows. REPLACE is the counterpart to INSERT IGNORE in the treatment of new rows that contain unique key values that duplicate old rows: The new rows are used to replace the old rows rather than being discarded. See Section 13.2.8, "REPLACE Syntax".

tbl_name is the table into which rows should be inserted. The columns for which the statement provides values can be specified as follows:

  • You can provide a comma-separated list of column names following the table name. In this case, a value for each named column must be provided by the VALUES list or the SELECT statement.

  • If you do not specify a list of column names for INSERT ... VALUES or INSERT ... SELECT, values for every column in the table must be provided by the VALUES list or the SELECT statement. If you do not know the order of the columns in the table, use DESCRIBE tbl_name to find out.

  • The SET clause indicates the column names explicitly.

Column values can be given in several ways:

  • If you are not running in strict SQL mode, any column not explicitly given a value is set to its default (explicit or implicit) value. For example, if you specify a column list that does not name all the columns in the table, unnamed columns are set to their default values. Default value assignment is described in Section 11.5, "Data Type Default Values". See also Section 1.8.6.2, "Constraints on Invalid Data".

    If you want an INSERT statement to generate an error unless you explicitly specify values for all columns that do not have a default value, you should use strict mode. See Section 5.1.7, "Server SQL Modes".

  • Use the keyword DEFAULT to set a column explicitly to its default value. This makes it easier to write INSERT statements that assign values to all but a few columns, because it enables you to avoid writing an incomplete VALUES list that does not include a value for each column in the table. Otherwise, you would have to write out the list of column names corresponding to each value in the VALUES list.

    You can also use DEFAULT(col_name) as a more general form that can be used in expressions to produce a given column's default value.

  • If both the column list and the VALUES list are empty, INSERT creates a row with each column set to its default value:

    INSERT INTO tbl_name () VALUES();

    In strict mode, an error occurs if any column doesn't have a default value. Otherwise, MySQL uses the implicit default value for any column that does not have an explicitly defined default.

  • You can specify an expression expr to provide a column value. This might involve type conversion if the type of the expression does not match the type of the column, and conversion of a given value can result in different inserted values depending on the data type. For example, inserting the string '1999.0e-2' into an INT, FLOAT, DECIMAL(10,6), or YEAR column results in the values 1999, 19.9921, 19.992100, and 1999 being inserted, respectively. The reason the value stored in the INT and YEAR columns is 1999 is that the string-to-integer conversion looks only at as much of the initial part of the string as may be considered a valid integer or year. For the floating-point and fixed-point columns, the string-to-floating-point conversion considers the entire string a valid floating-point value.

    An expression expr can refer to any column that was set earlier in a value list. For example, you can do this because the value for col2 refers to col1, which has previously been assigned:

    INSERT INTO tbl_name (col1,col2) VALUES(15,col1*2);

    But the following is not legal, because the value for col1 refers to col2, which is assigned after col1:

    INSERT INTO tbl_name (col1,col2) VALUES(col2*2,15);

    One exception involves columns that contain AUTO_INCREMENT values. Because the AUTO_INCREMENT value is generated after other value assignments, any reference to an AUTO_INCREMENT column in the assignment returns a 0.

INSERT statements that use VALUES syntax can insert multiple rows. To do this, include multiple lists of column values, each enclosed within parentheses and separated by commas. Example:

INSERT INTO tbl_name (a,b,c) VALUES(1,2,3),(4,5,6),(7,8,9);

The values list for each row must be enclosed within parentheses. The following statement is illegal because the number of values in the list does not match the number of column names:

INSERT INTO tbl_name (a,b,c) VALUES(1,2,3,4,5,6,7,8,9);

VALUE is a synonym for VALUES in this context. Neither implies anything about the number of values lists, and either may be used whether there is a single values list or multiple lists.

The affected-rows value for an INSERT can be obtained using the ROW_COUNT() function (see Section 12.14, "Information Functions"), or the mysql_affected_rows() C API function (see Section 22.8.3.1, "mysql_affected_rows()").

If you use an INSERT ... VALUES statement with multiple value lists or INSERT ... SELECT, the statement returns an information string in this format:

Records: 100 Duplicates: 0 Warnings: 0

Records indicates the number of rows processed by the statement. (This is not necessarily the number of rows actually inserted because Duplicates can be nonzero.) Duplicates indicates the number of rows that could not be inserted because they would duplicate some existing unique index value. Warnings indicates the number of attempts to insert column values that were problematic in some way. Warnings can occur under any of the following conditions:

  • Inserting NULL into a column that has been declared NOT NULL. For multiple-row INSERT statements or INSERT INTO ... SELECT statements, the column is set to the implicit default value for the column data type. This is 0 for numeric types, the empty string ('') for string types, and the "zero" value for date and time types. INSERT INTO ... SELECT statements are handled the same way as multiple-row inserts because the server does not examine the result set from the SELECT to see whether it returns a single row. (For a single-row INSERT, no warning occurs when NULL is inserted into a NOT NULL column. Instead, the statement fails with an error.)

  • Setting a numeric column to a value that lies outside the column's range. The value is clipped to the closest endpoint of the range.

  • Assigning a value such as '10.34 a' to a numeric column. The trailing nonnumeric text is stripped off and the remaining numeric part is inserted. If the string value has no leading numeric part, the column is set to 0.

  • Inserting a string into a string column (CHAR, VARCHAR, TEXT, or BLOB) that exceeds the column's maximum length. The value is truncated to the column's maximum length.

  • Inserting a value into a date or time column that is illegal for the data type. The column is set to the appropriate zero value for the type.

If you are using the C API, the information string can be obtained by invoking the mysql_info() function. See Section 22.8.3.35, "mysql_info()".

If INSERT inserts a row into a table that has an AUTO_INCREMENT column, you can find the value used for that column by using the SQL LAST_INSERT_ID() function. From within the C API, use the mysql_insert_id() function. However, you should note that the two functions do not always behave identically. The behavior of INSERT statements with respect to AUTO_INCREMENT columns is discussed further in Section 12.14, "Information Functions", and Section 22.8.3.37, "mysql_insert_id()".

The INSERT statement supports the following modifiers:

  • If you use the DELAYED keyword, the server puts the row or rows to be inserted into a buffer, and the client issuing the INSERT DELAYED statement can then continue immediately. If the table is in use, the server holds the rows. When the table is free, the server begins inserting rows, checking periodically to see whether there are any new read requests for the table. If there are, the delayed row queue is suspended until the table becomes free again. See Section 13.2.5.2, "INSERT DELAYED Syntax".

    DELAYED is ignored with INSERT ... SELECT or INSERT ... ON DUPLICATE KEY UPDATE.

    DELAYED is also disregarded for an INSERT that uses functions accessing tables or triggers, or that is called from a function or a trigger.

  • If you use the LOW_PRIORITY keyword, execution of the INSERT is delayed until no other clients are reading from the table. This includes other clients that began reading while existing clients are reading, and while the INSERT LOW_PRIORITY statement is waiting. It is possible, therefore, for a client that issues an INSERT LOW_PRIORITY statement to wait for a very long time (or even forever) in a read-heavy environment. (This is in contrast to INSERT DELAYED, which lets the client continue at once. Note that LOW_PRIORITY should normally not be used with MyISAM tables because doing so disables concurrent inserts. See Section 8.10.3, "Concurrent Inserts".

    If you specify HIGH_PRIORITY, it overrides the effect of the --low-priority-updates option if the server was started with that option. It also causes concurrent inserts not to be used. See Section 8.10.3, "Concurrent Inserts".

    LOW_PRIORITY and HIGH_PRIORITY affect only storage engines that use only table-level locking (such as MyISAM, MEMORY, and MERGE).

  • If you use the IGNORE keyword, errors that occur while executing the INSERT statement are treated as warnings instead. For example, without IGNORE, a row that duplicates an existing UNIQUE index or PRIMARY KEY value in the table causes a duplicate-key error and the statement is aborted. With IGNORE, the row still is not inserted, but no error is issued.

    IGNORE has a similar effect on inserts into partitioned tables where no partition matching a given value is found. Without IGNORE, such INSERT statements are aborted with an error; however, when INSERT IGNORE is used, the insert operation fails silently for the row containing the unmatched value, but any rows that are matched are inserted. For an example, see Section 18.2.2, "LIST Partitioning".

    Data conversions that would trigger errors abort the statement if IGNORE is not specified. With IGNORE, invalid values are adjusted to the closest values and inserted; warnings are produced but the statement does not abort. You can determine with the mysql_info() C API function how many rows were actually inserted into the table.

  • If you specify ON DUPLICATE KEY UPDATE, and a row is inserted that would cause a duplicate value in a UNIQUE index or PRIMARY KEY, an UPDATE of the old row is performed. The affected-rows value per row is 1 if the row is inserted as a new row and 2 if an existing row is updated. See Section 13.2.5.3, "INSERT ... ON DUPLICATE KEY UPDATE Syntax".

Inserting into a table requires the INSERT privilege for the table. If the ON DUPLICATE KEY UPDATE clause is used and a duplicate key causes an UPDATE to be performed instead, the statement requires the UPDATE privilege for the columns to be updated. For columns that are read but not modified you need only the SELECT privilege (such as for a column referenced only on the right hand side of an col_name=expr assignment in an ON DUPLICATE KEY UPDATE clause).

An INSERTstatement that acts on a partitioned table using a storage engine such as MyISAM that employs table-level locks locks all partitions of the table. This does not occur with tables using storage engines such as InnoDB that employ row-level locking. This issue is resolved in MySQL 5.6. See Section 18.5.4, "Partitioning and Table-Level Locking", for more information.

13.2.5.1. INSERT ...SELECT Syntax

INSERT [LOW_PRIORITY | HIGH_PRIORITY] [IGNORE] [INTO] tbl_name [(col_name,...)] SELECT ... [ ON DUPLICATE KEY UPDATE col_name=expr, ... ]

With INSERT ... SELECT, you can quickly insert many rows into a table from one or many tables. For example:

INSERT INTO tbl_temp2 (fld_id)  SELECT tbl_temp1.fld_order_id  FROM tbl_temp1 WHERE tbl_temp1.fld_order_id > 100;

The following conditions hold for a INSERT ... SELECT statements:

  • Specify IGNORE to ignore rows that would cause duplicate-key violations.

  • DELAYED is ignored with INSERT ... SELECT.

  • The target table of the INSERT statement may appear in the FROM clause of the SELECT part of the query. (This was not possible in some older versions of MySQL.) However, you cannot insert into a table and select from the same table in a subquery.

    When selecting from and inserting into a table at the same time, MySQL creates a temporary table to hold the rows from the SELECT and then inserts those rows into the target table. However, it remains true that you cannot use INSERT INTO t ... SELECT ... FROM t when t is a TEMPORARY table, because TEMPORARY tables cannot be referred to twice in the same statement (see Section C.5.7.2, "TEMPORARY Table Problems").

  • AUTO_INCREMENT columns work as usual.

  • To ensure that the binary log can be used to re-create the original tables, MySQL does not permit concurrent inserts for INSERT ... SELECT statements.

  • To avoid ambiguous column reference problems when the SELECT and the INSERT refer to the same table, provide a unique alias for each table used in the SELECT part, and qualify column names in that part with the appropriate alias.

In the values part of ON DUPLICATE KEY UPDATE, you can refer to columns in other tables, as long as you do not use GROUP BY in the SELECT part. One side effect is that you must qualify nonunique column names in the values part.

The order in which rows are returned by a SELECT statement with no ORDER BY clause is not determined. This means that, when using replication, there is no guarantee that such a SELECT returns rows in the same order on the master and the slave; this can lead to inconsistencies between them. To prevent this from occurring, you should always write INSERT ... SELECT statements that are to be replicated as INSERT ... SELECT ... ORDER BY column. The choice of column does not matter as long as the same order for returning the rows is enforced on both the master and the slave. See also Section 16.4.1.14, "Replication and LIMIT".

Due to this issue, beginning with MySQL 5.5.18, INSERT ... SELECT ON DUPLICATE KEY UPDATE and INSERT IGNORE ... SELECT statements are flagged as unsafe for statement-based replication. With this change, such statements produce a warning in the log when using statement-based mode and are logged using the row-based format when using MIXED mode. (Bug #11758262, Bug #50439)

See also Section 16.1.2.1, "Advantages and Disadvantages of Statement-Based and Row-Based Replication".

An INSERT ... SELECT statement that acts on partitioned tables using a storage engine such as MyISAM that employs table-level locks locks all partitions of the source and target tables. This does not occur with tables using storage engines such as InnoDB that employ row-level locking. This issue is resolved in MySQL 5.6. See Section 18.5.4, "Partitioning and Table-Level Locking", for more information.

13.2.5.2. INSERT DELAYED Syntax

INSERT DELAYED ...

The DELAYED option for the INSERT statement is a MySQL extension to standard SQL that is very useful if you have clients that cannot or need not wait for the INSERT to complete. This is a common situation when you use MySQL for logging and you also periodically run SELECT and UPDATE statements that take a long time to complete.

When a client uses INSERT DELAYED, it gets an okay from the server at once, and the row is queued to be inserted when the table is not in use by any other thread.

Another major benefit of using INSERT DELAYED is that inserts from many clients are bundled together and written in one block. This is much faster than performing many separate inserts.

Note that INSERT DELAYED is slower than a normal INSERT if the table is not otherwise in use. There is also the additional overhead for the server to handle a separate thread for each table for which there are delayed rows. This means that you should use INSERT DELAYED only when you are really sure that you need it.

The queued rows are held only in memory until they are inserted into the table. This means that if you terminate mysqld forcibly (for example, with kill -9) or if mysqld dies unexpectedly, any queued rows that have not been written to disk are lost.

There are some constraints on the use of DELAYED:

  • INSERT DELAYED works only with MyISAM, MEMORY, ARCHIVE, and BLACKHOLE tables. For engines that do not support DELAYED, an error occurs.

  • An error occurs for INSERT DELAYED if used with a table that has been locked with LOCK TABLES because the insert must be handled by a separate thread, not by the session that holds the lock.

  • For MyISAM tables, if there are no free blocks in the middle of the data file, concurrent SELECT and INSERT statements are supported. Under these circumstances, you very seldom need to use INSERT DELAYED with MyISAM.

  • INSERT DELAYED should be used only for INSERT statements that specify value lists. The server ignores DELAYED for INSERT ... SELECT or INSERT ... ON DUPLICATE KEY UPDATE statements.

  • Because the INSERT DELAYED statement returns immediately, before the rows are inserted, you cannot use LAST_INSERT_ID() to get the AUTO_INCREMENT value that the statement might generate.

  • DELAYED rows are not visible to SELECT statements until they actually have been inserted.

  • Prior to MySQL 5.5.7, INSERT DELAYED was treated as a normal INSERT if the statement inserted multiple rows, binary logging was enabled, and the global logging format was statement-based (that is, whenever binlog_format was set to STATEMENT). Beginning with MySQL 5.5.7, INSERT DELAYED is always handled as a simple INSERT (that is, without the DELAYED option) whenever the value of binlog_format is STATEMENT or MIXED. (In the latter case, the statement no longer triggers a switch to row-based logging, and so is logged using the statement-based format.)

    This does not apply when using row-based binary logging mode (binlog_format set to ROW), in which INSERT DELAYED statements are always executed using the DELAYED option as specified, and logged as row-update events.

  • DELAYED is ignored on slave replication servers, so that INSERT DELAYED is treated as a normal INSERT on slaves. This is because DELAYED could cause the slave to have different data than the master.

  • Pending INSERT DELAYED statements are lost if a table is write locked and ALTER TABLE is used to modify the table structure.

  • INSERT DELAYED is not supported for views.

  • INSERT DELAYED is not supported for partitioned tables.

The following describes in detail what happens when you use the DELAYED option to INSERT or REPLACE. In this description, the "thread" is the thread that received an INSERT DELAYED statement and "handler" is the thread that handles all INSERT DELAYED statements for a particular table.

  • When a thread executes a DELAYED statement for a table, a handler thread is created to process all DELAYED statements for the table, if no such handler already exists.

  • The thread checks whether the handler has previously acquired a DELAYED lock; if not, it tells the handler thread to do so. The DELAYED lock can be obtained even if other threads have a READ or WRITE lock on the table. However, the handler waits for all ALTER TABLE locks or FLUSH TABLES statements to finish, to ensure that the table structure is up to date.

  • The thread executes the INSERT statement, but instead of writing the row to the table, it puts a copy of the final row into a queue that is managed by the handler thread. Any syntax errors are noticed by the thread and reported to the client program.

  • The client cannot obtain from the server the number of duplicate rows or the AUTO_INCREMENT value for the resulting row, because the INSERT returns before the insert operation has been completed. (If you use the C API, the mysql_info() function does not return anything meaningful, for the same reason.)

  • The binary log is updated by the handler thread when the row is inserted into the table. In case of multiple-row inserts, the binary log is updated when the first row is inserted.

  • Each time that delayed_insert_limit rows are written, the handler checks whether any SELECT statements are still pending. If so, it permits these to execute before continuing.

  • When the handler has no more rows in its queue, the table is unlocked. If no new INSERT DELAYED statements are received within delayed_insert_timeout seconds, the handler terminates.

  • If more than delayed_queue_size rows are pending in a specific handler queue, the thread requesting INSERT DELAYED waits until there is room in the queue. This is done to ensure that mysqld does not use all memory for the delayed memory queue.

  • The handler thread shows up in the MySQL process list with delayed_insert in the Command column. It is killed if you execute a FLUSH TABLES statement or kill it with KILL thread_id. However, before exiting, it first stores all queued rows into the table. During this time it does not accept any new INSERT statements from other threads. If you execute an INSERT DELAYED statement after this, a new handler thread is created.

    Note that this means that INSERT DELAYED statements have higher priority than normal INSERT statements if there is an INSERT DELAYED handler running. Other update statements have to wait until the INSERT DELAYED queue is empty, someone terminates the handler thread (with KILL thread_id), or someone executes a FLUSH TABLES.

  • The following status variables provide information about INSERT DELAYED statements.

    Status VariableMeaning
    Delayed_insert_threadsNumber of handler threads
    Delayed_writesNumber of rows written with INSERTDELAYED
    Not_flushed_delayed_rowsNumber of rows waiting to be written

    You can view these variables by issuing a SHOW STATUS statement or by executing a mysqladmin extended-status command.

13.2.5.3. INSERT ... ONDUPLICATE KEY UPDATE Syntax

If you specify ON DUPLICATE KEY UPDATE, and a row is inserted that would cause a duplicate value in a UNIQUE index or PRIMARY KEY, MySQL performs an UPDATE of the old row. For example, if column a is declared as UNIQUE and contains the value 1, the following two statements have similar effect:

INSERT INTO table (a,b,c) VALUES (1,2,3)  ON DUPLICATE KEY UPDATE c=c+1;UPDATE table SET c=c+1 WHERE a=1;

(The effects are not identical for an InnoDB table where a is an auto-increment column. With an auto-increment column, an INSERT statement increases the auto-increment value but UPDATE does not.)

The ON DUPLICATE KEY UPDATE clause can contain multiple column assignments, separated by commas.

With ON DUPLICATE KEY UPDATE, the affected-rows value per row is 1 if the row is inserted as a new row, and 2 if an existing row is updated.

If column b is also unique, the INSERT is equivalent to this UPDATE statement instead:

UPDATE table SET c=c+1 WHERE a=1 OR b=2 LIMIT 1;

If a=1 OR b=2 matches several rows, only one row is updated. In general, you should try to avoid using an ON DUPLICATE KEY UPDATE clause on tables with multiple unique indexes.

You can use the VALUES(col_name) function in the UPDATE clause to refer to column values from the INSERT portion of the INSERT ... ON DUPLICATE KEY UPDATE statement. In other words, VALUES(col_name) in the ON DUPLICATE KEY UPDATE clause refers to the value of col_name that would be inserted, had no duplicate-key conflict occurred. This function is especially useful in multiple-row inserts. The VALUES() function is meaningful only in INSERT ... UPDATE statements and returns NULL otherwise. Example:

INSERT INTO table (a,b,c) VALUES (1,2,3),(4,5,6)  ON DUPLICATE KEY UPDATE c=VALUES(a)+VALUES(b);

That statement is identical to the following two statements:

INSERT INTO table (a,b,c) VALUES (1,2,3)  ON DUPLICATE KEY UPDATE c=3;INSERT INTO table (a,b,c) VALUES (4,5,6)  ON DUPLICATE KEY UPDATE c=9;

If a table contains an AUTO_INCREMENT column and INSERT ... ON DUPLICATE KEY UPDATE inserts or updates a row, the LAST_INSERT_ID() function returns the AUTO_INCREMENT value.

The DELAYED option is ignored when you use ON DUPLICATE KEY UPDATE.

Because the results of INSERT ... SELECT statements depend on the ordering of rows from the SELECT and this order cannot always be guaranteed, it is possible when logging INSERT ... SELECT ON DUPLICATE KEY UPDATE statements for the master and the slave to diverge. Thus, in MySQL 5.5.18 and later, INSERT ... SELECT ON DUPLICATE KEY UPDATE statements are flagged as unsafe for statement-based replication. With this change, such statements produce a warning in the log when using statement-based mode and are logged using the row-based format when using MIXED mode. In addition, beginning with MySQL 5.5.24, an INSERT ... ON DUPLICATE KEY UPDATE statement against a table having more than one unique or primary key is also marked as unsafe. (Bug #11765650, Bug #58637) See also Section 16.1.2.1, "Advantages and Disadvantages of Statement-Based and Row-Based Replication".

An INSERT ... ON DUPLICATE KEY UPDATE on a partitioned table using a storage engine such as MyISAM that employs table-level locks locks all partitions of the table. This does not occur with tables using storage engines such as InnoDB that employ row-level locking. This issue is resolved in MySQL 5.6. See Section 18.5.4, "Partitioning and Table-Level Locking", for more information.

13.2.6. LOAD DATA INFILESyntax

LOAD DATA [LOW_PRIORITY | CONCURRENT] [LOCAL] INFILE 'file_name' [REPLACE | IGNORE] INTO TABLE tbl_name [CHARACTER SET charset_name] [{FIELDS | COLUMNS} [TERMINATED BY 'string'] [[OPTIONALLY] ENCLOSED BY 'char'] [ESCAPED BY 'char'] ] [LINES [STARTING BY 'string'] [TERMINATED BY 'string'] ] [IGNORE number {LINES | ROWS}] [(col_name_or_user_var,...)] [SET col_name = expr,...]

The LOAD DATA INFILE statement reads rows from a text file into a table at a very high speed. The file name must be given as a literal string.

LOAD DATA INFILE is the complement of SELECT ... INTO OUTFILE. (See Section 13.2.9.1, "SELECT ... INTO Syntax".) To write data from a table to a file, use SELECT ... INTO OUTFILE. To read the file back into a table, use LOAD DATA INFILE. The syntax of the FIELDS and LINES clauses is the same for both statements. Both clauses are optional, but FIELDS must precede LINES if both are specified.

For more information about the efficiency of INSERT versus LOAD DATA INFILE and speeding up LOAD DATA INFILE, see Section 8.2.2.1, "Speed of INSERT Statements".

The character set indicated by the character_set_database system variable is used to interpret the information in the file. SET NAMES and the setting of character_set_client do not affect interpretation of input. If the contents of the input file use a character set that differs from the default, it is usually preferable to specify the character set of the file by using the CHARACTER SET clause. A character set of binary specifies "no conversion."

LOAD DATA INFILE interprets all fields in the file as having the same character set, regardless of the data types of the columns into which field values are loaded. For proper interpretation of file contents, you must ensure that it was written with the correct character set. For example, if you write a data file with mysqldump -T or by issuing a SELECT ... INTO OUTFILE statement in mysql, be sure to use a --default-character-set option with mysqldump or mysql so that output is written in the character set to be used when the file is loaded with LOAD DATA INFILE.

Note

It is not possible to load data files that use the ucs2, utf16, or utf32 character set.

The character_set_filesystem system variable controls the interpretation of the file name.

You can also load data files by using the mysqlimport utility; it operates by sending a LOAD DATA INFILE statement to the server. The --local option causes mysqlimport to read data files from the client host. You can specify the --compress option to get better performance over slow networks if the client and server support the compressed protocol. See Section 4.5.5, "mysqlimport - A Data Import Program".

If you use LOW_PRIORITY, execution of the LOAD DATA statement is delayed until no other clients are reading from the table. This affects only storage engines that use only table-level locking (such as MyISAM, MEMORY, and MERGE).

If you specify CONCURRENT with a MyISAM table that satisfies the condition for concurrent inserts (that is, it contains no free blocks in the middle), other threads can retrieve data from the table while LOAD DATA is executing. Using this option affects the performance of LOAD DATA a bit, even if no other thread is using the table at the same time.

Prior to MySQL 5.5.1, CONCURRENT was not replicated when using statement-based replication (see Bug #34628). However, it is replicated when using row-based replication, regardless of the version. See Section 16.4.1.15, "Replication and LOAD DATA INFILE", for more information.

The LOCAL keyword, if specified, is interpreted with respect to the client end of the connection:

  • If LOCAL is specified, the file is read by the client program on the client host and sent to the server. The file can be given as a full path name to specify its exact location. If given as a relative path name, the name is interpreted relative to the directory in which the client program was started.

    When using LOCAL with LOAD DATA, a copy of the file is created in the server's temporary directory. This is not the directory determined by the value of tmpdir or slave_load_tmpdir, but rather the operating system's temporary directory, and is not configurable in the MySQL Server. (Typically the system temporary directory is /tmp on Linux systems and C:\WINDOWS\TEMP on Windows.) Lack of sufficient space for the copy in this directory can cause the LOAD DATA LOCAL statement to fail.

  • If LOCAL is not specified, the file must be located on the server host and is read directly by the server. The server uses the following rules to locate the file:

    • If the file name is an absolute path name, the server uses it as given.

    • If the file name is a relative path name with one or more leading components, the server searches for the file relative to the server's data directory.

    • If a file name with no leading components is given, the server looks for the file in the database directory of the default database.

Note that, in the non-LOCAL case, these rules mean that a file named as ./myfile.txt is read from the server's data directory, whereas the file named as myfile.txt is read from the database directory of the default database. For example, if db1 is the default database, the following LOAD DATA statement reads the file data.txt from the database directory for db1, even though the statement explicitly loads the file into a table in the db2 database:

LOAD DATA INFILE 'data.txt' INTO TABLE db2.my_table;

Windows path names are specified using forward slashes rather than backslashes. If you do use backslashes, you must double them.

For security reasons, when reading text files located on the server, the files must either reside in the database directory or be readable by all. Also, to use LOAD DATA INFILE on server files, you must have the FILE privilege. See Section 6.2.1, "Privileges Provided by MySQL". For non-LOCAL load operations, if the secure_file_priv system variable is set to a nonempty directory name, the file to be loaded must be located in that directory.

Using LOCAL is a bit slower than letting the server access the files directly, because the contents of the file must be sent over the connection by the client to the server. On the other hand, you do not need the FILE privilege to load local files.

With LOCAL, the default duplicate-key handling behavior is the same as if IGNORE is specified; this is because the server has no way to stop transmission of the file in the middle of the operation. IGNORE is explained further later in this section.

LOCAL works only if your server and your client both have been configured to permit it. For example, if mysqld was started with --local-infile=0, LOCAL does not work. See Section 6.1.6, "Security Issues with LOAD DATA LOCAL".

On Unix, if you need LOAD DATA to read from a pipe, you can use the following technique (the example loads a listing of the / directory into the table db1.t1):

mkfifo /mysql/data/db1/ls.datchmod 666 /mysql/data/db1/ls.datfind / -ls > /mysql/data/db1/ls.dat &mysql -e "LOAD DATA INFILE 'ls.dat' INTO TABLE t1" db1

Note that you must run the command that generates the data to be loaded and the mysql commands either on separate terminals, or run the data generation process in the background (as shown in the preceding example). If you do not do this, the pipe will block until data is read by the mysql process.

The REPLACE and IGNORE keywords control handling of input rows that duplicate existing rows on unique key values:

  • If you specify REPLACE, input rows replace existing rows. In other words, rows that have the same value for a primary key or unique index as an existing row. See Section 13.2.8, "REPLACE Syntax".

  • If you specify IGNORE, input rows that duplicate an existing row on a unique key value are skipped. If you do not specify either option, the behavior depends on whether the LOCAL keyword is specified. Without LOCAL, an error occurs when a duplicate key value is found, and the rest of the text file is ignored. With LOCAL, the default behavior is the same as if IGNORE is specified; this is because the server has no way to stop transmission of the file in the middle of the operation.

If you want to ignore foreign key constraints during the load operation, you can issue a SET foreign_key_checks = 0 statement before executing LOAD DATA.

If you use LOAD DATA INFILE on an empty MyISAM table, all nonunique indexes are created in a separate batch (as for REPAIR TABLE). Normally, this makes LOAD DATA INFILE much faster when you have many indexes. In some extreme cases, you can create the indexes even faster by turning them off with ALTER TABLE ... DISABLE KEYS before loading the file into the table and using ALTER TABLE ... ENABLE KEYS to re-create the indexes after loading the file. See Section 8.2.2.1, "Speed of INSERT Statements".

For both the LOAD DATA INFILE and SELECT ... INTO OUTFILE statements, the syntax of the FIELDS and LINES clauses is the same. Both clauses are optional, but FIELDS must precede LINES if both are specified.

If you specify a FIELDS clause, each of its subclauses (TERMINATED BY, [OPTIONALLY] ENCLOSED BY, and ESCAPED BY) is also optional, except that you must specify at least one of them.

If you specify no FIELDS or LINES clause, the defaults are the same as if you had written this:

FIELDS TERMINATED BY '\t' ENCLOSED BY '' ESCAPED BY '\\'LINES TERMINATED BY '\n' STARTING BY ''

(Backslash is the MySQL escape character within strings in SQL statements, so to specify a literal backslash, you must specify two backslashes for the value to be interpreted as a single backslash. The escape sequences '\t' and '\n' specify tab and newline characters, respectively.)

In other words, the defaults cause LOAD DATA INFILE to act as follows when reading input:

  • Look for line boundaries at newlines.

  • Do not skip over any line prefix.

  • Break lines into fields at tabs.

  • Do not expect fields to be enclosed within any quoting characters.

  • Interpret characters preceded by the escape character "\" as escape sequences. For example, "\t", "\n", and "\\" signify tab, newline, and backslash, respectively. See the discussion of FIELDS ESCAPED BY later for the full list of escape sequences.

Conversely, the defaults cause SELECT ... INTO OUTFILE to act as follows when writing output:

  • Write tabs between fields.

  • Do not enclose fields within any quoting characters.

  • Use "\" to escape instances of tab, newline, or "\" that occur within field values.

  • Write newlines at the ends of lines.

Note

If you have generated the text file on a Windows system, you might have to use LINES TERMINATED BY '\r\n' to read the file properly, because Windows programs typically use two characters as a line terminator. Some programs, such as WordPad, might use \r as a line terminator when writing files. To read such files, use LINES TERMINATED BY '\r'.

If all the lines you want to read in have a common prefix that you want to ignore, you can use LINES STARTING BY 'prefix_string' to skip over the prefix, and anything before it. If a line does not include the prefix, the entire line is skipped. Suppose that you issue the following statement:

LOAD DATA INFILE '/tmp/test.txt' INTO TABLE test  FIELDS TERMINATED BY ','  LINES STARTING BY 'xxx';

If the data file looks like this:

xxx"abc",1something xxx"def",2"ghi",3

The resulting rows will be ("abc",1) and ("def",2). The third row in the file is skipped because it does not contain the prefix.

The IGNORE number LINES option can be used to ignore lines at the start of the file. For example, you can use IGNORE 1 LINES to skip over an initial header line containing column names:

LOAD DATA INFILE '/tmp/test.txt' INTO TABLE test IGNORE 1 LINES;

When you use SELECT ... INTO OUTFILE in tandem with LOAD DATA INFILE to write data from a database into a file and then read the file back into the database later, the field- and line-handling options for both statements must match. Otherwise, LOAD DATA INFILE will not interpret the contents of the file properly. Suppose that you use SELECT ... INTO OUTFILE to write a file with fields delimited by commas:

SELECT * INTO OUTFILE 'data.txt'  FIELDS TERMINATED BY ','  FROM table2;

To read the comma-delimited file back in, the correct statement would be:

LOAD DATA INFILE 'data.txt' INTO TABLE table2  FIELDS TERMINATED BY ',';

If instead you tried to read in the file with the statement shown following, it wouldn't work because it instructs LOAD DATA INFILE to look for tabs between fields:

LOAD DATA INFILE 'data.txt' INTO TABLE table2  FIELDS TERMINATED BY '\t';

The likely result is that each input line would be interpreted as a single field.

LOAD DATA INFILE can be used to read files obtained from external sources. For example, many programs can export data in comma-separated values (CSV) format, such that lines have fields separated by commas and enclosed within double quotation marks, with an initial line of column names. If the lines in such a file are terminated by carriage return/newline pairs, the statement shown here illustrates the field- and line-handling options you would use to load the file:

LOAD DATA INFILE 'data.txt' INTO TABLE tbl_name  FIELDS TERMINATED BY ',' ENCLOSED BY '"'  LINES TERMINATED BY '\r\n'  IGNORE 1 LINES;

If the input values are not necessarily enclosed within quotation marks, use OPTIONALLY before the ENCLOSED BY keywords.

Any of the field- or line-handling options can specify an empty string (''). If not empty, the FIELDS [OPTIONALLY] ENCLOSED BY and FIELDS ESCAPED BY values must be a single character. The FIELDS TERMINATED BY, LINES STARTING BY, and LINES TERMINATED BY values can be more than one character. For example, to write lines that are terminated by carriage return/linefeed pairs, or to read a file containing such lines, specify a LINES TERMINATED BY '\r\n' clause.

To read a file containing jokes that are separated by lines consisting of %%, you can do this

CREATE TABLE jokes  (a INT NOT NULL AUTO_INCREMENT PRIMARY KEY,  joke TEXT NOT NULL);LOAD DATA INFILE '/tmp/jokes.txt' INTO TABLE jokes  FIELDS TERMINATED BY ''  LINES TERMINATED BY '\n%%\n' (joke);

FIELDS [OPTIONALLY] ENCLOSED BY controls quoting of fields. For output (SELECT ... INTO OUTFILE), if you omit the word OPTIONALLY, all fields are enclosed by the ENCLOSED BY character. An example of such output (using a comma as the field delimiter) is shown here:

"1","a string","100.20""2","a string containing a , comma","102.20""3","a string containing a \" quote","102.20""4","a string containing a \", quote and comma","102.20"

If you specify OPTIONALLY, the ENCLOSED BY character is used only to enclose values from columns that have a string data type (such as CHAR, BINARY, TEXT, or ENUM):

1,"a string",100.202,"a string containing a , comma",102.203,"a string containing a \" quote",102.204,"a string containing a \", quote and comma",102.20

Note that occurrences of the ENCLOSED BY character within a field value are escaped by prefixing them with the ESCAPED BY character. Also note that if you specify an empty ESCAPED BY value, it is possible to inadvertently generate output that cannot be read properly by LOAD DATA INFILE. For example, the preceding output just shown would appear as follows if the escape character is empty. Observe that the second field in the fourth line contains a comma following the quote, which (erroneously) appears to terminate the field:

1,"a string",100.202,"a string containing a , comma",102.203,"a string containing a " quote",102.204,"a string containing a ", quote and comma",102.20

For input, the ENCLOSED BY character, if present, is stripped from the ends of field values. (This is true regardless of whether OPTIONALLY is specified; OPTIONALLY has no effect on input interpretation.) Occurrences of the ENCLOSED BY character preceded by the ESCAPED BY character are interpreted as part of the current field value.

If the field begins with the ENCLOSED BY character, instances of that character are recognized as terminating a field value only if followed by the field or line TERMINATED BY sequence. To avoid ambiguity, occurrences of the ENCLOSED BY character within a field value can be doubled and are interpreted as a single instance of the character. For example, if ENCLOSED BY '"' is specified, quotation marks are handled as shown here:

"The ""BIG"" boss"  -> The "BIG" bossThe "BIG" boss  -> The "BIG" bossThe ""BIG"" boss -> The ""BIG"" boss

FIELDS ESCAPED BY controls how to read or write special characters:

  • For input, if the FIELDS ESCAPED BY character is not empty, occurrences of that character are stripped and the following character is taken literally as part of a field value. Some two-character sequences that are exceptions, where the first character is the escape character. These sequences are shown in the following table (using "\" for the escape character). The rules for NULL handling are described later in this section.

    CharacterEscape Sequence
    \0 An ASCII NUL (0x00) character
    \b A backspace character
    \n A newline (linefeed) character
    \r A carriage return character
    \t A tab character.
    \Z ASCII 26 (Control+Z)
    \NNULL

    For more information about "\"-escape syntax, see Section 9.1.1, "String Literals".

    If the FIELDS ESCAPED BY character is empty, escape-sequence interpretation does not occur.

  • For output, if the FIELDS ESCAPED BY character is not empty, it is used to prefix the following characters on output:

    • The FIELDS ESCAPED BY character

    • The FIELDS [OPTIONALLY] ENCLOSED BY character

    • The first character of the FIELDS TERMINATED BY and LINES TERMINATED BY values

    • ASCII 0 (what is actually written following the escape character is ASCII "0", not a zero-valued byte)

    If the FIELDS ESCAPED BY character is empty, no characters are escaped and NULL is output as NULL, not \N. It is probably not a good idea to specify an empty escape character, particularly if field values in your data contain any of the characters in the list just given.

In certain cases, field- and line-handling options interact:

  • If LINES TERMINATED BY is an empty string and FIELDS TERMINATED BY is nonempty, lines are also terminated with FIELDS TERMINATED BY.

  • If the FIELDS TERMINATED BY and FIELDS ENCLOSED BY values are both empty (''), a fixed-row (nondelimited) format is used. With fixed-row format, no delimiters are used between fields (but you can still have a line terminator). Instead, column values are read and written using a field width wide enough to hold all values in the field. For TINYINT, SMALLINT, MEDIUMINT, INT, and BIGINT, the field widths are 4, 6, 8, 11, and 20, respectively, no matter what the declared display width is.

    LINES TERMINATED BY is still used to separate lines. If a line does not contain all fields, the rest of the columns are set to their default values. If you do not have a line terminator, you should set this to ''. In this case, the text file must contain all fields for each row.

    Fixed-row format also affects handling of NULL values, as described later. Note that fixed-size format does not work if you are using a multi-byte character set.

Handling of NULL values varies according to the FIELDS and LINES options in use:

  • For the default FIELDS and LINES values, NULL is written as a field value of \N for output, and a field value of \N is read as NULL for input (assuming that the ESCAPED BY character is "\").

  • If FIELDS ENCLOSED BY is not empty, a field containing the literal word NULL as its value is read as a NULL value. This differs from the word NULL enclosed within FIELDS ENCLOSED BY characters, which is read as the string 'NULL'.

  • If FIELDS ESCAPED BY is empty, NULL is written as the word NULL.

  • With fixed-row format (which is used when FIELDS TERMINATED BY and FIELDS ENCLOSED BY are both empty), NULL is written as an empty string. Note that this causes both NULL values and empty strings in the table to be indistinguishable when written to the file because both are written as empty strings. If you need to be able to tell the two apart when reading the file back in, you should not use fixed-row format.

An attempt to load NULL into a NOT NULL column causes assignment of the implicit default value for the column's data type and a warning, or an error in strict SQL mode. Implicit default values are discussed in Section 11.5, "Data Type Default Values".

Some cases are not supported by LOAD DATA INFILE:

  • Fixed-size rows (FIELDS TERMINATED BY and FIELDS ENCLOSED BY both empty) and BLOB or TEXT columns.

  • If you specify one separator that is the same as or a prefix of another, LOAD DATA INFILE cannot interpret the input properly. For example, the following FIELDS clause would cause problems:

    FIELDS TERMINATED BY '"' ENCLOSED BY '"'
  • If FIELDS ESCAPED BY is empty, a field value that contains an occurrence of FIELDS ENCLOSED BY or LINES TERMINATED BY followed by the FIELDS TERMINATED BY value causes LOAD DATA INFILE to stop reading a field or line too early. This happens because LOAD DATA INFILE cannot properly determine where the field or line value ends.

The following example loads all columns of the persondata table:

LOAD DATA INFILE 'persondata.txt' INTO TABLE persondata;

By default, when no column list is provided at the end of the LOAD DATA INFILE statement, input lines are expected to contain a field for each table column. If you want to load only some of a table's columns, specify a column list:

LOAD DATA INFILE 'persondata.txt' INTO TABLE persondata (col1,col2,...);

You must also specify a column list if the order of the fields in the input file differs from the order of the columns in the table. Otherwise, MySQL cannot tell how to match input fields with table columns.

The column list can contain either column names or user variables. With user variables, the SET clause enables you to perform transformations on their values before assigning the result to columns.

User variables in the SET clause can be used in several ways. The following example uses the first input column directly for the value of t1.column1, and assigns the second input column to a user variable that is subjected to a division operation before being used for the value of t1.column2:

LOAD DATA INFILE 'file.txt'  INTO TABLE t1  (column1, @var1)  SET column2 = @var1/100;

The SET clause can be used to supply values not derived from the input file. The following statement sets column3 to the current date and time:

LOAD DATA INFILE 'file.txt'  INTO TABLE t1  (column1, column2)  SET column3 = CURRENT_TIMESTAMP;

You can also discard an input value by assigning it to a user variable and not assigning the variable to a table column:

LOAD DATA INFILE 'file.txt'  INTO TABLE t1  (column1, @dummy, column2, @dummy, column3);

Use of the column/variable list and SET clause is subject to the following restrictions:

  • Assignments in the SET clause should have only column names on the left hand side of assignment operators.

  • You can use subqueries in the right hand side of SET assignments. A subquery that returns a value to be assigned to a column may be a scalar subquery only. Also, you cannot use a subquery to select from the table that is being loaded.

  • Lines ignored by an IGNORE clause are not processed for the column/variable list or SET clause.

  • User variables cannot be used when loading data with fixed-row format because user variables do not have a display width.

When processing an input line, LOAD DATA splits it into fields and uses the values according to the column/variable list and the SET clause, if they are present. Then the resulting row is inserted into the table. If there are BEFORE INSERT or AFTER INSERT triggers for the table, they are activated before or after inserting the row, respectively.

If an input line has too many fields, the extra fields are ignored and the number of warnings is incremented.

If an input line has too few fields, the table columns for which input fields are missing are set to their default values. Default value assignment is described in Section 11.5, "Data Type Default Values".

An empty field value is interpreted differently than if the field value is missing:

  • For string types, the column is set to the empty string.

  • For numeric types, the column is set to 0.

  • For date and time types, the column is set to the appropriate "zero" value for the type. See Section 11.3, "Date and Time Types".

These are the same values that result if you assign an empty string explicitly to a string, numeric, or date or time type explicitly in an INSERT or UPDATE statement.

TIMESTAMP columns are set to the current date and time only if there is a NULL value for the column (that is, \N) and the column is not declared to permit NULL values, or if the TIMESTAMP column's default value is the current timestamp and it is omitted from the field list when a field list is specified.

LOAD DATA INFILE regards all input as strings, so you cannot use numeric values for ENUM or SET columns the way you can with INSERT statements. All ENUM and SET values must be specified as strings.

BIT values cannot be loaded using binary notation (for example, b'011010'). To work around this, specify the values as regular integers and use the SET clause to convert them so that MySQL performs a numeric type conversion and loads them into the BIT column properly:

shell> cat /tmp/bit_test.txt2127shell> mysql testmysql> LOAD DATA INFILE '/tmp/bit_test.txt' -> INTO TABLE bit_test (@var1) SET b= CAST(@var1 AS UNSIGNED);Query OK, 2 rows affected (0.00 sec)Records: 2  Deleted: 0  Skipped: 0  Warnings: 0mysql> SELECT BIN(b+0) FROM bit_test;+----------+| bin(b+0) |+----------+| 10   || 1111111  |+----------+2 rows in set (0.00 sec)

When the LOAD DATA INFILE statement finishes, it returns an information string in the following format:

Records: 1  Deleted: 0  Skipped: 0  Warnings: 0

If you are using the C API, you can get information about the statement by calling the mysql_info() function. See Section 22.8.3.35, "mysql_info()".

Warnings occur under the same circumstances as when values are inserted using the INSERT statement (see Section 13.2.5, "INSERT Syntax"), except that LOAD DATA INFILE also generates warnings when there are too few or too many fields in the input row. The warnings are not stored anywhere; the number of warnings can be used only as an indication of whether everything went well.

You can use SHOW WARNINGS to get a list of the first max_error_count warnings as information about what went wrong. See Section 13.7.5.41, "SHOW WARNINGS Syntax".

For partitioned tables using storage engines that employ table locks, such as MyISAM, any locks caused by LOAD DATA perform locks on all partitions of the table. This does not apply to tables using storage engines which employ row-level locking, such as InnoDB. For more information, see Section 18.5.4, "Partitioning and Table-Level Locking".

13.2.7. LOAD XML Syntax

LOAD XML [LOW_PRIORITY | CONCURRENT] [LOCAL] INFILE 'file_name' [REPLACE | IGNORE] INTO TABLE [db_name.]tbl_name [CHARACTER SET charset_name] [ROWS IDENTIFIED BY '<tagname>'] [IGNORE number {LINES | ROWS}] [(column_or_user_var,...)] [SET col_name = expr,...]

The LOAD XML statement reads data from an XML file into a table. The file_name must be given as a literal string. The tagname in the optional ROWS IDENTIFIED BY clause must also be given as a literal string, and must be surrounded by angle brackets (< and >).

LOAD XML acts as the complement of running the mysql client in XML output mode (that is, starting the client with the --xml option). To write data from a table to an XML file, use a command such as the following one from the system shell:

shell> mysql --xml -e 'SELECT * FROM mytable' > file.xml

To read the file back into a table, use LOAD XML INFILE. By default, the <row> element is considered to be the equivalent of a database table row; this can be changed using the ROWS IDENTIFIED BY clause.

This statement supports three different XML formats:

  • Column names as attributes and column values as attribute values:

    <row column1="value1" column2="value2" .../>
  • Column names as tags and column values as the content of these tags:

    <row>  <column1>value1</column1>  <column2>value2</column2></row>
  • Column names are the name attributes of <field> tags, and values are the contents of these tags:

    <row>  <field name='column1'>value1</field>  <field name='column2'>value2</field></row>

    This is the format used by other MySQL tools, such as mysqldump.

All 3 formats can be used in the same XML file; the import routine automatically detects the format for each row and interprets it correctly. Tags are matched based on the tag or attribute name and the column name.

The following clauses work essentially the same way for LOAD XML as they do for LOAD DATA:

  • LOW_PRIORITY or CONCURRENT

  • LOCAL

  • REPLACE or IGNORE

  • CHARACTER SET

  • (column_or_user_var,...)

  • SET

See Section 13.2.6, "LOAD DATA INFILE Syntax", for more information about these clauses.

The IGNORE number LINES or IGNORE number ROWS clause causes the first number rows in the XML file to be skipped. It is analogous to the LOAD DATA statement's IGNORE ... LINES clause.

To illustrate how this statement is used, suppose that we have a table created as follows:

USE test;CREATE TABLE person ( person_id INT NOT NULL PRIMARY KEY, fname VARCHAR(40) NULL, lname VARCHAR(40) NULL, created TIMESTAMP);

Suppose further that this table is initially empty.

Now suppose that we have a simple XML file person.xml, whose contents are as shown here:

<?xml version="1.0"?><list>  <person person_id="1" fname="Pekka" lname="Nousiainen"/>  <person person_id="2" fname="Jonas" lname="Oreland"/>  <person person_id="3"><fname>Mikael</fname><lname>Ronstr�m</lname></person>  <person person_id="4"><fname>Lars</fname><lname>Thalmann</lname></person>  <person><field name="person_id">5</field><field name="fname">Tomas</field>  <field name="lname">Ulin</field></person>  <person><field name="person_id">6</field><field name="fname">Martin</field>  <field name="lname">Sk�ld</field></person></list>

Each of the permissible XML formats discussed previously is represented in this example file.

To import the data in person.xml into the person table, you can use this statement:

mysql> LOAD XML LOCAL INFILE 'person.xml' ->   INTO TABLE person ->   ROWS IDENTIFIED BY '<person>';Query OK, 6 rows affected (0.00 sec)Records: 6  Deleted: 0  Skipped: 0  Warnings: 0

Here, we assume that person.xml is located in the MySQL data directory. If the file cannot be found, the following error results:

ERROR 2 (HY000): File '/person.xml' not found (Errcode: 2)

The ROWS IDENTIFIED BY '<person>' clause means that each <person> element in the XML file is considered equivalent to a row in the table into which the data is to be imported. In this case, this is the person table in the test database.

As can be seen by the response from the server, 6 rows were imported into the test.person table. This can be verified by a simple SELECT statement:

mysql> SELECT * FROM person;+-----------+--------+------------+---------------------+| person_id | fname  | lname  | created |+-----------+--------+------------+---------------------+| 1 | Pekka  | Nousiainen | 2007-07-13 16:18:47 || 2 | Jonas  | Oreland | 2007-07-13 16:18:47 || 3 | Mikael | Ronstr�m   | 2007-07-13 16:18:47 || 4 | Lars   | Thalmann   | 2007-07-13 16:18:47 || 5 | Tomas  | Ulin   | 2007-07-13 16:18:47 || 6 | Martin | Sk�ld  | 2007-07-13 16:18:47 |+-----------+--------+------------+---------------------+6 rows in set (0.00 sec)

This shows, as stated earlier in this section, that any or all of the 3 permitted XML formats may appear in a single file and be read in using LOAD XML.

The inverse of the above operation-that is, dumping MySQL table data into an XML file-can be accomplished using the mysql client from the system shell, as shown here:

Note

The --xml option causes the mysql client to use XML formatting for its output; the -e option causes the client to execute the SQL statement immediately following the option.

shell> mysql --xml -e "SELECT * FROM test.person" > person-dump.xmlshell> cat person-dump.xml<?xml version="1.0"?><resultset statement="SELECT * FROM test.person" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">  <row> <field name="person_id">1</field> <field name="fname">Pekka</field> <field name="lname">Nousiainen</field> <field name="created">2007-07-13 16:18:47</field>  </row>  <row> <field name="person_id">2</field> <field name="fname">Jonas</field> <field name="lname">Oreland</field> <field name="created">2007-07-13 16:18:47</field>  </row>  <row> <field name="person_id">3</field> <field name="fname">Mikael</field> <field name="lname">Ronstr�m</field> <field name="created">2007-07-13 16:18:47</field>  </row>  <row> <field name="person_id">4</field> <field name="fname">Lars</field> <field name="lname">Thalmann</field> <field name="created">2007-07-13 16:18:47</field>  </row>  <row> <field name="person_id">5</field> <field name="fname">Tomas</field> <field name="lname">Ulin</field> <field name="created">2007-07-13 16:18:47</field>  </row>  <row> <field name="person_id">6</field> <field name="fname">Martin</field> <field name="lname">Sk�ld</field> <field name="created">2007-07-13 16:18:47</field>  </row></resultset>

You can verify that the dump is valid by creating a copy of the person and then importing the dump file into the new table, like this:

mysql> USE test;mysql> CREATE TABLE person2 LIKE person;Query OK, 0 rows affected (0.00 sec)mysql> LOAD XML LOCAL INFILE 'person-dump.xml' ->   INTO TABLE person2;Query OK, 6 rows affected (0.01 sec)Records: 6  Deleted: 0  Skipped: 0  Warnings: 0mysql> SELECT * FROM person2;+-----------+--------+------------+---------------------+| person_id | fname  | lname  | created |+-----------+--------+------------+---------------------+| 1 | Pekka  | Nousiainen | 2007-07-13 16:18:47 || 2 | Jonas  | Oreland | 2007-07-13 16:18:47 || 3 | Mikael | Ronstr�m   | 2007-07-13 16:18:47 || 4 | Lars   | Thalmann   | 2007-07-13 16:18:47 || 5 | Tomas  | Ulin   | 2007-07-13 16:18:47 || 6 | Martin | Sk�ld  | 2007-07-13 16:18:47 |+-----------+--------+------------+---------------------+6 rows in set (0.00 sec)

Using a ROWS IDENTIFIED BY '<tagname>' clause, it is possible to import data from the same XML file into database tables with different definitions. For this example, suppose that you have a file named address.xml which contains the following XML:

<?xml version="1.0"?><list>  <person person_id="1"> <fname>Robert</fname> <lname>Jones</lname> <address address_id="1" street="Mill Creek Road" zip="45365" city="Sidney"/> <address address_id="2" street="Main Street" zip="28681" city="Taylorsville"/>  </person>  <person person_id="2"> <fname>Mary</fname> <lname>Smith</lname> <address address_id="3" street="River Road" zip="80239" city="Denver"/> <!-- <address address_id="4" street="North Street" zip="37920" city="Knoxville"/> -->  </person></list>

You can again use the test.person table as defined previously in this section, after clearing all the existing records from the table and then showing its structure as shown here:

mysql< TRUNCATE person;Query OK, 0 rows affected (0.04 sec)mysql< SHOW CREATE TABLE person\G*************************** 1. row ***************************   Table: personCreate Table: CREATE TABLE `person` (  `person_id` int(11) NOT NULL,  `fname` varchar(40) DEFAULT NULL,  `lname` varchar(40) DEFAULT NULL,  `created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON   UPDATE CURRENT_TIMESTAMP,  PRIMARY KEY (`person_id`)) ENGINE=MyISAM DEFAULT CHARSET=latin11 row in set (0.00 sec)

Now create an address table in the test database using the following CREATE TABLE statement:

CREATE TABLE address ( address_id INT NOT NULL PRIMARY KEY, person_id INT NULL, street VARCHAR(40) NULL, zip INT NULL, city VARCHAR(40) NULL, created TIMESTAMP);

To import the data from the XML file into the person table, execute the following LOAD XML statement, which specifies that rows are to be specified by the <person> element, as shown here;

mysql> LOAD XML LOCAL INFILE 'address.xml' ->   INTO TABLE person ->   ROWS IDENTIFIED BY '<person>';Query OK, 2 rows affected (0.00 sec)Records: 2  Deleted: 0  Skipped: 0  Warnings: 0

You can verify that the records were imported using a SELECT statement:

mysql> SELECT * FROM person;+-----------+--------+-------+---------------------+| person_id | fname  | lname | created |+-----------+--------+-------+---------------------+| 1 | Robert | Jones | 2007-07-24 17:37:06 || 2 | Mary   | Smith | 2007-07-24 17:37:06 |+-----------+--------+-------+---------------------+2 rows in set (0.00 sec)

Since the <address> elements in the XML file have no corresponding columns in the person table, they are skipped.

To import the data from the <address> elements into the address table, use the LOAD XML statement shown here:

mysql> LOAD XML LOCAL INFILE 'address.xml' ->   INTO TABLE address ->   ROWS IDENTIFIED BY '<address>';Query OK, 3 rows affected (0.00 sec)Records: 3  Deleted: 0  Skipped: 0  Warnings: 0

You can see that the data was imported using a SELECT statement such as this one:

mysql> SELECT * FROM address;+----------+---------+-----------------+-----+--------------+---------------------+|address_id|person_id| street  |zip  | city | created |+----------+---------+-----------------+-----+--------------+---------------------+| 1 |   1 | Mill Creek Road |45365| Sidney   | 2007-07-24 17:37:37 || 2 |   1 | Main Street |28681| Taylorsville | 2007-07-24 17:37:37 || 3 |   2 | River Road  |80239| Denver   | 2007-07-24 17:37:37 |+----------+---------+-----------------+-----+--------------+---------------------+3 rows in set (0.00 sec)

The data from the <address> element that is enclosed in XML comments is not imported. However, since there is a person_id column in the address table, the value of the person_id attribute from the parent <person> element for each <address> is imported into the address table.

Security Considerations. As with the LOAD DATA statement, the transfer of the XML file from the client host to the server host is initiated by the MySQL server. In theory, a patched server could be built that would tell the client program to transfer a file of the server's choosing rather than the file named by the client in the LOAD XML statement. Such a server could access any file on the client host to which the client user has read access.

In a Web environment, clients usually connect to MySQL from a Web server. A user that can run any command against the MySQL server can use LOAD XML LOCAL to read any files to which the Web server process has read access. In this environment, the client with respect to the MySQL server is actually the Web server, not the remote program being run by the user who connects to the Web server.

You can disable loading of XML files from clients by starting the server with --local-infile=0 or --local-infile=OFF. This option can also be used when starting the mysql client to disable LOAD XML for the duration of the client session.

To prevent a client from loading XML files from the server, do not grant the FILE privilege to the corresponding MySQL user account, or revoke this privilege if the client user account already has it.

Important

Revoking the FILE privilege (or not granting it in the first place) keeps the user only from executing the LOAD XML INFILE statement (as well as the LOAD_FILE() function; it does not prevent the user from executing LOAD XML LOCAL INFILE. To disallow this statement, you must start the server or the client with --local-infile=OFF.

In other words, the FILE privilege affects only whether the client can read files on the server; it has no bearing on whether the client can read files on the local file system.

For partitioned tables using storage engines that employ table locks, such as MyISAM, any locks caused by LOAD XML perform locks on all partitions of the table. This does not apply to tables using storage engines which employ row-level locking, such as InnoDB. For more information, see Section 18.5.4, "Partitioning and Table-Level Locking".

13.2.8. REPLACE Syntax

REPLACE [LOW_PRIORITY | DELAYED] [INTO] tbl_name [(col_name,...)] {VALUES | VALUE} ({expr | DEFAULT},...),(...),...

Or:

REPLACE [LOW_PRIORITY | DELAYED] [INTO] tbl_name SET col_name={expr | DEFAULT}, ...

Or:

REPLACE [LOW_PRIORITY | DELAYED] [INTO] tbl_name [(col_name,...)] SELECT ...

REPLACE works exactly like INSERT, except that if an old row in the table has the same value as a new row for a PRIMARY KEY or a UNIQUE index, the old row is deleted before the new row is inserted. See Section 13.2.5, "INSERT Syntax".

REPLACE is a MySQL extension to the SQL standard. It either inserts, or deletes and inserts. For another MySQL extension to standard SQL-that either inserts or updates-see Section 13.2.5.3, "INSERT ... ON DUPLICATE KEY UPDATE Syntax".

Note that unless the table has a PRIMARY KEY or UNIQUE index, using a REPLACE statement makes no sense. It becomes equivalent to INSERT, because there is no index to be used to determine whether a new row duplicates another.

Values for all columns are taken from the values specified in the REPLACE statement. Any missing columns are set to their default values, just as happens for INSERT. You cannot refer to values from the current row and use them in the new row. If you use an assignment such as SET col_name = col_name + 1, the reference to the column name on the right hand side is treated as DEFAULT(col_name), so the assignment is equivalent to SET col_name = DEFAULT(col_name) + 1.

To use REPLACE, you must have both the INSERT and DELETE privileges for the table.

The REPLACE statement returns a count to indicate the number of rows affected. This is the sum of the rows deleted and inserted. If the count is 1 for a single-row REPLACE, a row was inserted and no rows were deleted. If the count is greater than 1, one or more old rows were deleted before the new row was inserted. It is possible for a single row to replace more than one old row if the table contains multiple unique indexes and the new row duplicates values for different old rows in different unique indexes.

The affected-rows count makes it easy to determine whether REPLACE only added a row or whether it also replaced any rows: Check whether the count is 1 (added) or greater (replaced).

If you are using the C API, the affected-rows count can be obtained using the mysql_affected_rows() function.

Currently, you cannot replace into a table and select from the same table in a subquery.

MySQL uses the following algorithm for REPLACE (and LOAD DATA ... REPLACE):

  1. Try to insert the new row into the table

  2. While the insertion fails because a duplicate-key error occurs for a primary key or unique index:

    1. Delete from the table the conflicting row that has the duplicate key value

    2. Try again to insert the new row into the table

It is possible that in the case of a duplicate-key error, a storage engine may perform the REPLACE as an update rather than a delete plus insert, but the semantics are the same. There are no user-visible effects other than a possible difference in how the storage engine increments Handler_xxx status variables.

Because the results of REPLACE ... SELECT statements depend on the ordering of rows from the SELECT and this order cannot always be guaranteed, it is possible when logging these statements for the master and the slave to diverge. For this reason, in MySQL 5.5.18 and later, REPLACE ... SELECT statements are flagged as unsafe for statement-based replication. With this change, such statements produce a warning in the log when using the STATEMENT binary logging mode, and are logged using the row-based format when using MIXED mode. See also Section 16.1.2.1, "Advantages and Disadvantages of Statement-Based and Row-Based Replication".

A REPLACEstatement that acts on a partitioned table using a storage engine such as MyISAM that employs table-level locks locks all partitions of the table. This does not occur with tables using storage engines such as InnoDB that employ row-level locking. This issue is resolved in MySQL 5.6. See Section 18.5.4, "Partitioning and Table-Level Locking", for more information.

13.2.9. SELECT Syntax

SELECT [ALL | DISTINCT | DISTINCTROW ]  [HIGH_PRIORITY]  [STRAIGHT_JOIN]  [SQL_SMALL_RESULT] [SQL_BIG_RESULT] [SQL_BUFFER_RESULT]  [SQL_CACHE | SQL_NO_CACHE] [SQL_CALC_FOUND_ROWS] select_expr [, select_expr ...] [FROM table_references [WHERE where_condition] [GROUP BY {col_name | expr | position}  [ASC | DESC], ... [WITH ROLLUP]] [HAVING where_condition] [ORDER BY {col_name | expr | position}  [ASC | DESC], ...] [LIMIT {[offset,] row_count | row_count OFFSET offset}] [PROCEDURE procedure_name(argument_list)] [INTO OUTFILE 'file_name' [CHARACTER SET charset_name] export_options  | INTO DUMPFILE 'file_name'  | INTO var_name [, var_name]] [FOR UPDATE | LOCK IN SHARE MODE]]

SELECT is used to retrieve rows selected from one or more tables, and can include UNION statements and subqueries. See Section 13.2.9.4, "UNION Syntax", and Section 13.2.10, "Subquery Syntax".

The most commonly used clauses of SELECT statements are these:

  • Each select_expr indicates a column that you want to retrieve. There must be at least one select_expr.

  • table_references indicates the table or tables from which to retrieve rows. Its syntax is described in Section 13.2.9.2, "JOIN Syntax".

  • The WHERE clause, if given, indicates the condition or conditions that rows must satisfy to be selected. where_condition is an expression that evaluates to true for each row to be selected. The statement selects all rows if there is no WHERE clause.

    In the WHERE expression, you can use any of the functions and operators that MySQL supports, except for aggregate (summary) functions. See Section 9.5, "Expression Syntax", and Chapter 12, Functions and Operators.

SELECT can also be used to retrieve rows computed without reference to any table.

For example:

mysql> SELECT 1 + 1; -> 2

You are permitted to specify DUAL as a dummy table name in situations where no tables are referenced:

mysql> SELECT 1 + 1 FROM DUAL; -> 2

DUAL is purely for the convenience of people who require that all SELECT statements should have FROM and possibly other clauses. MySQL may ignore the clauses. MySQL does not require FROM DUAL if no tables are referenced.

In general, clauses used must be given in exactly the order shown in the syntax description. For example, a HAVING clause must come after any GROUP BY clause and before any ORDER BY clause. The exception is that the INTO clause can appear either as shown in the syntax description or immediately following the select_expr list. For more information about INTO, see Section 13.2.9.1, "SELECT ... INTO Syntax".

The list of select_expr terms comprises the select list that indicates which columns to retrieve. Terms specify a column or expression or can use *-shorthand:

  • A select list consisting only of a single unqualified * can be used as shorthand to select all columns from all tables:

    SELECT * FROM t1 INNER JOIN t2 ...
  • tbl_name.* can be used as a qualified shorthand to select all columns from the named table:

    SELECT t1.*, t2.* FROM t1 INNER JOIN t2 ...
  • Use of an unqualified * with other items in the select list may produce a parse error. To avoid this problem, use a qualified tbl_name.* reference

    SELECT AVG(score), t1.* FROM t1 ...

The following list provides additional information about other SELECT clauses:

  • A select_expr can be given an alias using AS alias_name. The alias is used as the expression's column name and can be used in GROUP BY, ORDER BY, or HAVING clauses. For example:

    SELECT CONCAT(last_name,', ',first_name) AS full_name  FROM mytable ORDER BY full_name;

    The AS keyword is optional when aliasing a select_expr with an identifier. The preceding example could have been written like this:

    SELECT CONCAT(last_name,', ',first_name) full_name  FROM mytable ORDER BY full_name;

    However, because the AS is optional, a subtle problem can occur if you forget the comma between two select_expr expressions: MySQL interprets the second as an alias name. For example, in the following statement, columnb is treated as an alias name:

    SELECT columna columnb FROM mytable;

    For this reason, it is good practice to be in the habit of using AS explicitly when specifying column aliases.

    It is not permissible to refer to a column alias in a WHERE clause, because the column value might not yet be determined when the WHERE clause is executed. See Section C.5.5.4, "Problems with Column Aliases".

  • The FROM table_references clause indicates the table or tables from which to retrieve rows. If you name more than one table, you are performing a join. For information on join syntax, see Section 13.2.9.2, "JOIN Syntax". For each table specified, you can optionally specify an alias.

    tbl_name [[AS] alias] [index_hint]

    The use of index hints provides the optimizer with information about how to choose indexes during query processing. For a description of the syntax for specifying these hints, see Section 13.2.9.3, "Index Hint Syntax".

    You can use SET max_seeks_for_key=value as an alternative way to force MySQL to prefer key scans instead of table scans. See Section 5.1.4, "Server System Variables".

  • You can refer to a table within the default database as tbl_name, or as db_name.tbl_name to specify a database explicitly. You can refer to a column as col_name, tbl_name.col_name, or db_name.tbl_name.col_name. You need not specify a tbl_name or db_name.tbl_name prefix for a column reference unless the reference would be ambiguous. See Section 9.2.1, "Identifier Qualifiers", for examples of ambiguity that require the more explicit column reference forms.

  • A table reference can be aliased using tbl_name AS alias_name or tbl_name alias_name:

    SELECT t1.name, t2.salary FROM employee AS t1, info AS t2  WHERE t1.name = t2.name;SELECT t1.name, t2.salary FROM employee t1, info t2  WHERE t1.name = t2.name;
  • Columns selected for output can be referred to in ORDER BY and GROUP BY clauses using column names, column aliases, or column positions. Column positions are integers and begin with 1:

    SELECT college, region, seed FROM tournament  ORDER BY region, seed;SELECT college, region AS r, seed AS s FROM tournament  ORDER BY r, s;SELECT college, region, seed FROM tournament  ORDER BY 2, 3;

    To sort in reverse order, add the DESC (descending) keyword to the name of the column in the ORDER BY clause that you are sorting by. The default is ascending order; this can be specified explicitly using the ASC keyword.

    If ORDER BY occurs within a subquery and also is applied in the outer query, the outermost ORDER BY takes precedence. For example, results for the following statement are sorted in descending order, not ascending order:

    (SELECT ... ORDER BY a) ORDER BY a DESC;

    Use of column positions is deprecated because the syntax has been removed from the SQL standard.

  • If you use GROUP BY, output rows are sorted according to the GROUP BY columns as if you had an ORDER BY for the same columns. To avoid the overhead of sorting that GROUP BY produces, add ORDER BY NULL:

    SELECT a, COUNT(b) FROM test_table GROUP BY a ORDER BY NULL;
  • MySQL extends the GROUP BY clause so that you can also specify ASC and DESC after columns named in the clause:

    SELECT a, COUNT(b) FROM test_table GROUP BY a DESC;
  • MySQL extends the use of GROUP BY to permit selecting fields that are not mentioned in the GROUP BY clause. If you are not getting the results that you expect from your query, please read the description of GROUP BY found in Section 12.16, "Functions and Modifiers for Use with GROUP BY Clauses".

  • GROUP BY permits a WITH ROLLUP modifier. See Section 12.16.2, "GROUP BY Modifiers".

  • The HAVING clause is applied nearly last, just before items are sent to the client, with no optimization. (LIMIT is applied after HAVING.)

    The SQL standard requires that HAVING must reference only columns in the GROUP BY clause or columns used in aggregate functions. However, MySQL supports an extension to this behavior, and permits HAVING to refer to columns in the SELECT list and columns in outer subqueries as well.

    If the HAVING clause refers to a column that is ambiguous, a warning occurs. In the following statement, col2 is ambiguous because it is used as both an alias and a column name:

    SELECT COUNT(col1) AS col2 FROM t GROUP BY col2 HAVING col2 = 2;

    Preference is given to standard SQL behavior, so if a HAVING column name is used both in GROUP BY and as an aliased column in the output column list, preference is given to the column in the GROUP BY column.

  • Do not use HAVING for items that should be in the WHERE clause. For example, do not write the following:

    SELECT col_name FROM tbl_name HAVING col_name > 0;

    Write this instead:

    SELECT col_name FROM tbl_name WHERE col_name > 0;
  • The HAVING clause can refer to aggregate functions, which the WHERE clause cannot:

    SELECT user, MAX(salary) FROM users  GROUP BY user HAVING MAX(salary) > 10;

    (This did not work in some older versions of MySQL.)

  • MySQL permits duplicate column names. That is, there can be more than one select_expr with the same name. This is an extension to standard SQL. Because MySQL also permits GROUP BY and HAVING to refer to select_expr values, this can result in an ambiguity:

    SELECT 12 AS a, a FROM t GROUP BY a;

    In that statement, both columns have the name a. To ensure that the correct column is used for grouping, use different names for each select_expr.

  • MySQL resolves unqualified column or alias references in ORDER BY clauses by searching in the select_expr values, then in the columns of the tables in the FROM clause. For GROUP BY or HAVING clauses, it searches the FROM clause before searching in the select_expr values. (For GROUP BY and HAVING, this differs from the pre-MySQL 5.0 behavior that used the same rules as for ORDER BY.)

  • The LIMIT clause can be used to constrain the number of rows returned by the SELECT statement. LIMIT takes one or two numeric arguments, which must both be nonnegative integer constants, with these exceptions:

    • Within prepared statements, LIMIT parameters can be specified using ? placeholder markers.

    • Within stored programs, LIMIT parameters can be specified using integer-valued routine parameters or local variables as of MySQL 5.5.6.

    With two arguments, the first argument specifies the offset of the first row to return, and the second specifies the maximum number of rows to return. The offset of the initial row is 0 (not 1):

    SELECT * FROM tbl LIMIT 5,10;  # Retrieve rows 6-15

    To retrieve all rows from a certain offset up to the end of the result set, you can use some large number for the second parameter. This statement retrieves all rows from the 96th row to the last:

    SELECT * FROM tbl LIMIT 95,18446744073709551615;

    With one argument, the value specifies the number of rows to return from the beginning of the result set:

    SELECT * FROM tbl LIMIT 5; # Retrieve first 5 rows

    In other words, LIMIT row_count is equivalent to LIMIT 0, row_count.

    For prepared statements, you can use placeholders. The following statements will return one row from the tbl table:

    SET @a=1;PREPARE STMT FROM 'SELECT * FROM tbl LIMIT ?';EXECUTE STMT USING @a;

    The following statements will return the second to sixth row from the tbl table:

    SET @skip=1; SET @numrows=5;PREPARE STMT FROM 'SELECT * FROM tbl LIMIT ?, ?';EXECUTE STMT USING @skip, @numrows;

    For compatibility with PostgreSQL, MySQL also supports the LIMIT row_count OFFSET offset syntax.

    If LIMIT occurs within a subquery and also is applied in the outer query, the outermost LIMIT takes precedence. For example, the following statement produces two rows, not one:

    (SELECT ... LIMIT 1) LIMIT 2;
  • A PROCEDURE clause names a procedure that should process the data in the result set. For an example, see Section 8.4.2.4, "Using PROCEDURE ANALYSE", which describes ANALYSE, a procedure that can be used to obtain suggestions for optimal column data types that may help reduce table sizes.

  • The SELECT ... INTO form of SELECT enables the query result to be written to a file or stored in variables. For more information, see Section 13.2.9.1, "SELECT ... INTO Syntax".

  • If you use FOR UPDATE with a storage engine that uses page or row locks, rows examined by the query are write-locked until the end of the current transaction. Using LOCK IN SHARE MODE sets a shared lock that permits other transactions to read the examined rows but not to update or delete them. See Section 14.3.9.3, "SELECT ... FOR UPDATE and SELECT ... LOCK IN SHARE MODE Locking Reads".

Following the SELECT keyword, you can use a number of options that affect the operation of the statement. HIGH_PRIORITY, STRAIGHT_JOIN, and options beginning with SQL_ are MySQL extensions to standard SQL.

  • The ALL and DISTINCT options specify whether duplicate rows should be returned. ALL (the default) specifies that all matching rows should be returned, including duplicates. DISTINCT specifies removal of duplicate rows from the result set. It is an error to specify both options. DISTINCTROW is a synonym for DISTINCT.

  • HIGH_PRIORITY gives the SELECT higher priority than a statement that updates a table. You should use this only for queries that are very fast and must be done at once. A SELECT HIGH_PRIORITY query that is issued while the table is locked for reading runs even if there is an update statement waiting for the table to be free. This affects only storage engines that use only table-level locking (such as MyISAM, MEMORY, and MERGE).

    HIGH_PRIORITY cannot be used with SELECT statements that are part of a UNION.

  • STRAIGHT_JOIN forces the optimizer to join the tables in the order in which they are listed in the FROM clause. You can use this to speed up a query if the optimizer joins the tables in nonoptimal order. STRAIGHT_JOIN also can be used in the table_references list. See Section 13.2.9.2, "JOIN Syntax".

    STRAIGHT_JOIN does not apply to any table that the optimizer treats as a const or system table. Such a table produces a single row, is read during the optimization phase of query execution, and references to its columns are replaced with the appropriate column values before query execution proceeds. These tables will appear first in the query plan displayed by EXPLAIN. See Section 8.8.1, "Optimizing Queries with EXPLAIN". This exception may not apply to const or system tables that are used on the NULL-complemented side of an outer join (that is, the right-side table of a LEFT JOIN or the left-side table of a RIGHT JOIN.

  • SQL_BIG_RESULT or SQL_SMALL_RESULT can be used with GROUP BY or DISTINCT to tell the optimizer that the result set has many rows or is small, respectively. For SQL_BIG_RESULT, MySQL directly uses disk-based temporary tables if needed, and prefers sorting to using a temporary table with a key on the GROUP BY elements. For SQL_SMALL_RESULT, MySQL uses fast temporary tables to store the resulting table instead of using sorting. This should not normally be needed.

  • SQL_BUFFER_RESULT forces the result to be put into a temporary table. This helps MySQL free the table locks early and helps in cases where it takes a long time to send the result set to the client. This option can be used only for top-level SELECT statements, not for subqueries or following UNION.

  • SQL_CALC_FOUND_ROWS tells MySQL to calculate how many rows there would be in the result set, disregarding any LIMIT clause. The number of rows can then be retrieved with SELECT FOUND_ROWS(). See Section 12.14, "Information Functions".

  • The SQL_CACHE and SQL_NO_CACHE options affect caching of query results in the query cache (see Section 8.9.3, "The MySQL Query Cache"). SQL_CACHE tells MySQL to store the result in the query cache if it is cacheable and the value of the query_cache_type system variable is 2 or DEMAND. SQL_NO_CACHE tells MySQL not to store the result in the query cache.

    For views, SQL_NO_CACHE applies if it appears in any SELECT in the query. For a cacheable query, SQL_CACHE applies if it appears in the first SELECT of a view referred to by the query.

    As of MySQL 5.5.3, these two options are mutually exclusive and an error occurs if they are both specified. Also, these options are not permitted in subqueries (including subqueries in the FROM clause), and SELECT statements in unions other than the first SELECT.

    Before MySQL 5.5.3, for a query that uses UNION or subqueries, the following rules apply:

    • SQL_NO_CACHE applies if it appears in any SELECT in the query.

    • For a cacheable query, SQL_CACHE applies if it appears in the first SELECT of the query.

A SELECT from a partitioned table using a storage engine such as MyISAM that employs table-level locks locks all partitions of the table. This does not occur with tables using storage engines such as InnoDB that employ row-level locking. This issue is resolved in MySQL 5.6. See Section 18.5.4, "Partitioning and Table-Level Locking", for more information.

13.2.9.1. SELECT ... INTOSyntax

The SELECT ... INTO form of SELECT enables a query result to be written to a file or stored in variables:

  • SELECT ... INTO OUTFILE writes the selected rows to a file. Column and line terminators can be specified to produce a specific output format.

  • SELECT ... INTO DUMPFILE writes a single row to a file without any formatting.

  • SELECT ... INTO var_list selects column values and into variables.

The SELECT syntax description (see Section 13.2.9, "SELECT Syntax") shows the INTO clause near the end of the statement. It is also possible to use INTO immediately following the select_expr list.

The SELECT ... INTO OUTFILE 'file_name' form of SELECT writes the selected rows to a file. The file is created on the server host, so you must have the FILE privilege to use this syntax. file_name cannot be an existing file, which among other things prevents files such as /etc/passwd and database tables from being destroyed. The character_set_filesystem system variable controls the interpretation of the file name.

The SELECT ... INTO OUTFILE statement is intended primarily to let you very quickly dump a table to a text file on the server machine. If you want to create the resulting file on some other host than the server host, you normally cannot use SELECT ... INTO OUTFILE since there is no way to write a path to the file relative to the server host's file system.

However, if the MySQL client software is installed on the remote machine, you can instead use a client command such as mysql -e "SELECT ..." > file_name to generate the file on the client host.

It is also possible to create the resulting file on a different host other than the server host, if the location of the file on the remote host can be accessed using a network-mapped path on the server's file system. In this case, the presence of mysql (or some other MySQL client program) is not required on the target host.

SELECT ... INTO OUTFILE is the complement of LOAD DATA INFILE. Column values are written converted to the character set specified in the CHARACTER SET clause. If no such clause is present, values are dumped using the binary character set. In effect, there is no character set conversion. If a result set contains columns in several character sets, the output data file will as well and you may not be able to reload the file correctly.

The syntax for the export_options part of the statement consists of the same FIELDS and LINES clauses that are used with the LOAD DATA INFILE statement. See Section 13.2.6, "LOAD DATA INFILE Syntax", for information about the FIELDS and LINES clauses, including their default values and permissible values.

FIELDS ESCAPED BY controls how to write special characters. If the FIELDS ESCAPED BY character is not empty, it is used when necessary to avoid ambiguity as a prefix that precedes following characters on output:

  • The FIELDS ESCAPED BY character

  • The FIELDS [OPTIONALLY] ENCLOSED BY character

  • The first character of the FIELDS TERMINATED BY and LINES TERMINATED BY values

  • ASCII NUL (the zero-valued byte; what is actually written following the escape character is ASCII "0", not a zero-valued byte)

The FIELDS TERMINATED BY, ENCLOSED BY, ESCAPED BY, or LINES TERMINATED BY characters must be escaped so that you can read the file back in reliably. ASCII NUL is escaped to make it easier to view with some pagers.

The resulting file does not have to conform to SQL syntax, so nothing else need be escaped.

If the FIELDS ESCAPED BY character is empty, no characters are escaped and NULL is output as NULL, not \N. It is probably not a good idea to specify an empty escape character, particularly if field values in your data contain any of the characters in the list just given.

Here is an example that produces a file in the comma-separated values (CSV) format used by many programs:

SELECT a,b,a+b INTO OUTFILE '/tmp/result.txt'  FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'  LINES TERMINATED BY '\n'  FROM test_table;

If you use INTO DUMPFILE instead of INTO OUTFILE, MySQL writes only one row into the file, without any column or line termination and without performing any escape processing. This is useful if you want to store a BLOB value in a file.

Note

Any file created by INTO OUTFILE or INTO DUMPFILE is writable by all users on the server host. The reason for this is that the MySQL server cannot create a file that is owned by anyone other than the user under whose account it is running. (You should never run mysqld as root for this and other reasons.) The file thus must be world-writable so that you can manipulate its contents.

If the secure_file_priv system variable is set to a nonempty directory name, the file to be written must be located in that directory.

The INTO clause can name a list of one or more variables, which can be user-defined variables, stored procedure or function parameters, or stored program local variables (see Section 13.6.4, "Variables in Stored Programs"). The selected values are assigned to the variables. The number of variables must match the number of columns. The query should return a single row. If the query returns no rows, a warning with error code 1329 occurs (No data), and the variable values remain unchanged. If the query returns multiple rows, error 1172 occurs (Result consisted of more than one row). If it is possible that the statement may retrieve multiple rows, you can use LIMIT 1 to limit the result set to a single row.

SELECT id, data INTO @x, @y FROM test.t1 LIMIT 1;

User variable names are not case sensitive. See Section 9.4, "User-Defined Variables".

In the context of such statements that occur as part of events executed by the Event Scheduler, diagnostics messages (not only errors, but also warnings) are written to the error log, and, on Windows, to the application event log. For additional information, see Section 19.4.5, "Event Scheduler Status".

An INTO clause should not be used in a nested SELECT because such a SELECT must return its result to the outer context.

13.2.9.2. JOIN Syntax

MySQL supports the following JOIN syntaxes for the table_references part of SELECT statements and multiple-table DELETE and UPDATE statements:

table_references: table_reference [, table_reference] ...table_reference: table_factor  | join_tabletable_factor: tbl_name [[AS] alias] [index_hint_list]  | table_subquery [AS] alias  | ( table_references )  | { OJ table_reference LEFT OUTER JOIN table_reference ON conditional_expr }join_table: table_reference [INNER | CROSS] JOIN table_factor [join_condition]  | table_reference STRAIGHT_JOIN table_factor  | table_reference STRAIGHT_JOIN table_factor ON conditional_expr  | table_reference {LEFT|RIGHT} [OUTER] JOIN table_reference join_condition  | table_reference NATURAL [{LEFT|RIGHT} [OUTER]] JOIN table_factorjoin_condition: ON conditional_expr  | USING (column_list)index_hint_list: index_hint [, index_hint] ...index_hint: USE {INDEX|KEY}  [FOR {JOIN|ORDER BY|GROUP BY}] ([index_list])  | IGNORE {INDEX|KEY}  [FOR {JOIN|ORDER BY|GROUP BY}] (index_list)  | FORCE {INDEX|KEY}  [FOR {JOIN|ORDER BY|GROUP BY}] (index_list)index_list: index_name [, index_name] ...

A table reference is also known as a join expression.

The syntax of table_factor is extended in comparison with the SQL Standard. The latter accepts only table_reference, not a list of them inside a pair of parentheses.

This is a conservative extension if we consider each comma in a list of table_reference items as equivalent to an inner join. For example:

SELECT * FROM t1 LEFT JOIN (t2, t3, t4) ON (t2.a=t1.a AND t3.b=t1.b AND t4.c=t1.c)

is equivalent to:

SELECT * FROM t1 LEFT JOIN (t2 CROSS JOIN t3 CROSS JOIN t4) ON (t2.a=t1.a AND t3.b=t1.b AND t4.c=t1.c)

In MySQL, JOIN, CROSS JOIN, and INNER JOIN are syntactic equivalents (they can replace each other). In standard SQL, they are not equivalent. INNER JOIN is used with an ON clause, CROSS JOIN is used otherwise.

In general, parentheses can be ignored in join expressions containing only inner join operations. MySQL also supports nested joins (see Section 8.13.7, "Nested Join Optimization").

Index hints can be specified to affect how the MySQL optimizer makes use of indexes. For more information, see Section 13.2.9.3, "Index Hint Syntax".

The following list describes general factors to take into account when writing joins.

  • A table reference can be aliased using tbl_name AS alias_name or tbl_name alias_name:

    SELECT t1.name, t2.salary  FROM employee AS t1 INNER JOIN info AS t2 ON t1.name = t2.name;SELECT t1.name, t2.salary  FROM employee t1 INNER JOIN info t2 ON t1.name = t2.name;
  • A table_subquery is also known as a subquery in the FROM clause. Such subqueries must include an alias to give the subquery result a table name. A trivial example follows; see also Section 13.2.10.8, "Subqueries in the FROM Clause".

    SELECT * FROM (SELECT 1, 2, 3) AS t1;
  • INNER JOIN and , (comma) are semantically equivalent in the absence of a join condition: both produce a Cartesian product between the specified tables (that is, each and every row in the first table is joined to each and every row in the second table).

    However, the precedence of the comma operator is less than of INNER JOIN, CROSS JOIN, LEFT JOIN, and so on. If you mix comma joins with the other join types when there is a join condition, an error of the form Unknown column 'col_name' in 'on clause' may occur. Information about dealing with this problem is given later in this section.

  • The conditional_expr used with ON is any conditional expression of the form that can be used in a WHERE clause. Generally, you should use the ON clause for conditions that specify how to join tables, and the WHERE clause to restrict which rows you want in the result set.

  • If there is no matching row for the right table in the ON or USING part in a LEFT JOIN, a row with all columns set to NULL is used for the right table. You can use this fact to find rows in a table that have no counterpart in another table:

    SELECT left_tbl.*  FROM left_tbl LEFT JOIN right_tbl ON left_tbl.id = right_tbl.id  WHERE right_tbl.id IS NULL;

    This example finds all rows in left_tbl with an id value that is not present in right_tbl (that is, all rows in left_tbl with no corresponding row in right_tbl). This assumes that right_tbl.id is declared NOT NULL. See Section 8.13.5, "LEFT JOIN and RIGHT JOIN Optimization".

  • The USING(column_list) clause names a list of columns that must exist in both tables. If tables a and b both contain columns c1, c2, and c3, the following join compares corresponding columns from the two tables:

    a LEFT JOIN b USING (c1,c2,c3)
  • The NATURAL [LEFT] JOIN of two tables is defined to be semantically equivalent to an INNER JOIN or a LEFT JOIN with a USING clause that names all columns that exist in both tables.

  • RIGHT JOIN works analogously to LEFT JOIN. To keep code portable across databases, it is recommended that you use LEFT JOIN instead of RIGHT JOIN.

  • The { OJ ... LEFT OUTER JOIN ...} syntax shown in the join syntax description exists only for compatibility with ODBC. The curly braces in the syntax should be written literally; they are not metasyntax as used elsewhere in syntax descriptions.

    SELECT left_tbl.* FROM { OJ left_tbl LEFT OUTER JOIN right_tbl ON left_tbl.id = right_tbl.id } WHERE right_tbl.id IS NULL;

    You can use other types of joins within { OJ ... }, such as INNER JOIN or RIGHT OUTER JOIN. This helps with compatibility with some third-party applications, but is not official ODBC syntax.

    The parser does not permit nested { OJ ... } constructs (which are not legal ODBC syntax, anyway). Queries that use such constructs should be rewritten. For an example, see JOIN Syntax.

  • STRAIGHT_JOIN is similar to JOIN, except that the left table is always read before the right table. This can be used for those (few) cases for which the join optimizer puts the tables in the wrong order.

Some join examples:

SELECT * FROM table1, table2;SELECT * FROM table1 INNER JOIN table2 ON table1.id=table2.id;SELECT * FROM table1 LEFT JOIN table2 ON table1.id=table2.id;SELECT * FROM table1 LEFT JOIN table2 USING (id);SELECT * FROM table1 LEFT JOIN table2 ON table1.id=table2.id  LEFT JOIN table3 ON table2.id=table3.id;

Join Processing Changes in MySQL 5.0.12

Note

Natural joins and joins with USING, including outer join variants, are processed according to the SQL:2003 standard. The goal was to align the syntax and semantics of MySQL with respect to NATURAL JOIN and JOIN ... USING according to SQL:2003. However, these changes in join processing can result in different output columns for some joins. Also, some queries that appeared to work correctly in older versions (prior to 5.0.12) must be rewritten to comply with the standard.

These changes have five main aspects:

  • The way that MySQL determines the result columns of NATURAL or USING join operations (and thus the result of the entire FROM clause).

  • Expansion of SELECT * and SELECT tbl_name.* into a list of selected columns.

  • Resolution of column names in NATURAL or USING joins.

  • Transformation of NATURAL or USING joins into JOIN ... ON.

  • Resolution of column names in the ON condition of a JOIN ... ON.

The following list provides more detail about several effects of current join processing versus join processing in older versions. The term "previously" means "prior to MySQL 5.0.12."

  • The columns of a NATURAL join or a USING join may be different from previously. Specifically, redundant output columns no longer appear, and the order of columns for SELECT * expansion may be different from before.

    Consider this set of statements:

    CREATE TABLE t1 (i INT, j INT);CREATE TABLE t2 (k INT, j INT);INSERT INTO t1 VALUES(1,1);INSERT INTO t2 VALUES(1,1);SELECT * FROM t1 NATURAL JOIN t2;SELECT * FROM t1 JOIN t2 USING (j);

    Previously, the statements produced this output:

    +------+------+------+------+| i | j | k | j |+------+------+------+------+| 1 | 1 | 1 | 1 |+------+------+------+------++------+------+------+------+| i | j | k | j |+------+------+------+------+| 1 | 1 | 1 | 1 |+------+------+------+------+

    In the first SELECT statement, column j appears in both tables and thus becomes a join column, so, according to standard SQL, it should appear only once in the output, not twice. Similarly, in the second SELECT statement, column j is named in the USING clause and should appear only once in the output, not twice. But in both cases, the redundant column is not eliminated. Also, the order of the columns is not correct according to standard SQL.

    Now the statements produce this output:

    +------+------+------+| j | i | k |+------+------+------+| 1 | 1 | 1 |+------+------+------++------+------+------+| j | i | k |+------+------+------+| 1 | 1 | 1 |+------+------+------+

    The redundant column is eliminated and the column order is correct according to standard SQL:

    • First, coalesced common columns of the two joined tables, in the order in which they occur in the first table

    • Second, columns unique to the first table, in order in which they occur in that table

    • Third, columns unique to the second table, in order in which they occur in that table

    The single result column that replaces two common columns is defined using the coalesce operation. That is, for two t1.a and t2.a the resulting single join column a is defined as a = COALESCE(t1.a, t2.a), where:

    COALESCE(x, y) = (CASE WHEN V1 IS NOT NULL THEN V1 ELSE V2 END)

    If the join operation is any other join, the result columns of the join consists of the concatenation of all columns of the joined tables. This is the same as previously.

    A consequence of the definition of coalesced columns is that, for outer joins, the coalesced column contains the value of the non-NULL column if one of the two columns is always NULL. If neither or both columns are NULL, both common columns have the same value, so it doesn't matter which one is chosen as the value of the coalesced column. A simple way to interpret this is to consider that a coalesced column of an outer join is represented by the common column of the inner table of a JOIN. Suppose that the tables t1(a,b) and t2(a,c) have the following contents:

    t1 t2----  ----1 x   2 z2 y   3 w

    Then:

    mysql> SELECT * FROM t1 NATURAL LEFT JOIN t2;+------+------+------+| a | b | c |+------+------+------+| 1 | x | NULL || 2 | y | z |+------+------+------+

    Here column a contains the values of t1.a.

    mysql> SELECT * FROM t1 NATURAL RIGHT JOIN t2;+------+------+------+| a | c | b |+------+------+------+| 2 | z | y || 3 | w | NULL |+------+------+------+

    Here column a contains the values of t2.a.

    Compare these results to the otherwise equivalent queries with JOIN ... ON:

    mysql> SELECT * FROM t1 LEFT JOIN t2 ON (t1.a = t2.a);+------+------+------+------+| a | b | a | c |+------+------+------+------+| 1 | x | NULL | NULL || 2 | y | 2 | z |+------+------+------+------+
    mysql> SELECT * FROM t1 RIGHT JOIN t2 ON (t1.a = t2.a);+------+------+------+------+| a | b | a | c |+------+------+------+------+| 2 | y | 2 | z || NULL | NULL | 3 | w |+------+------+------+------+
  • Previously, a USING clause could be rewritten as an ON clause that compares corresponding columns. For example, the following two clauses were semantically identical:

    a LEFT JOIN b USING (c1,c2,c3)a LEFT JOIN b ON a.c1=b.c1 AND a.c2=b.c2 AND a.c3=b.c3

    Now the two clauses no longer are quite the same:

    • With respect to determining which rows satisfy the join condition, both joins remain semantically identical.

    • With respect to determining which columns to display for SELECT * expansion, the two joins are not semantically identical. The USING join selects the coalesced value of corresponding columns, whereas the ON join selects all columns from all tables. For the preceding USING join, SELECT * selects these values:

      COALESCE(a.c1,b.c1), COALESCE(a.c2,b.c2), COALESCE(a.c3,b.c3)

      For the ON join, SELECT * selects these values:

      a.c1, a.c2, a.c3, b.c1, b.c2, b.c3

      With an inner join, COALESCE(a.c1,b.c1) is the same as either a.c1 or b.c1 because both columns will have the same value. With an outer join (such as LEFT JOIN), one of the two columns can be NULL. That column will be omitted from the result.

  • The evaluation of multi-way natural joins differs in a very important way that affects the result of NATURAL or USING joins and that can require query rewriting. Suppose that you have three tables t1(a,b), t2(c,b), and t3(a,c) that each have one row: t1(1,2), t2(10,2), and t3(7,10). Suppose also that you have this NATURAL JOIN on the three tables:

    SELECT ... FROM t1 NATURAL JOIN t2 NATURAL JOIN t3;

    Previously, the left operand of the second join was considered to be t2, whereas it should be the nested join (t1 NATURAL JOIN t2). As a result, the columns of t3 are checked for common columns only in t2, and, if t3 has common columns with t1, these columns are not used as equi-join columns. Thus, previously, the preceding query was transformed to the following equi-join:

    SELECT ... FROM t1, t2, t3  WHERE t1.b = t2.b AND t2.c = t3.c;

    That join is missing one more equi-join predicate (t1.a = t3.a). As a result, it produces one row, not the empty result that it should. The correct equivalent query is this:

    SELECT ... FROM t1, t2, t3  WHERE t1.b = t2.b AND t2.c = t3.c AND t1.a = t3.a;

    If you require the same query result in current versions of MySQL as in older versions, rewrite the natural join as the first equi-join.

  • Previously, the comma operator (,) and JOIN both had the same precedence, so the join expression t1, t2 JOIN t3 was interpreted as ((t1, t2) JOIN t3). Now JOIN has higher precedence, so the expression is interpreted as (t1, (t2 JOIN t3)). This change affects statements that use an ON clause, because that clause can refer only to columns in the operands of the join, and the change in precedence changes interpretation of what those operands are.

    Example:

    CREATE TABLE t1 (i1 INT, j1 INT);CREATE TABLE t2 (i2 INT, j2 INT);CREATE TABLE t3 (i3 INT, j3 INT);INSERT INTO t1 VALUES(1,1);INSERT INTO t2 VALUES(1,1);INSERT INTO t3 VALUES(1,1);SELECT * FROM t1, t2 JOIN t3 ON (t1.i1 = t3.i3);

    Previously, the SELECT was legal due to the implicit grouping of t1,t2 as (t1,t2). Now the JOIN takes precedence, so the operands for the ON clause are t2 and t3. Because t1.i1 is not a column in either of the operands, the result is an Unknown column 't1.i1' in 'on clause' error. To allow the join to be processed, group the first two tables explicitly with parentheses so that the operands for the ON clause are (t1,t2) and t3:

    SELECT * FROM (t1, t2) JOIN t3 ON (t1.i1 = t3.i3);

    Alternatively, avoid the use of the comma operator and use JOIN instead:

    SELECT * FROM t1 JOIN t2 JOIN t3 ON (t1.i1 = t3.i3);

    This change also applies to statements that mix the comma operator with INNER JOIN, CROSS JOIN, LEFT JOIN, and RIGHT JOIN, all of which now have higher precedence than the comma operator.

  • Previously, the ON clause could refer to columns in tables named to its right. Now an ON clause can refer only to its operands.

    Example:

    CREATE TABLE t1 (i1 INT);CREATE TABLE t2 (i2 INT);CREATE TABLE t3 (i3 INT);SELECT * FROM t1 JOIN t2 ON (i1 = i3) JOIN t3;

    Previously, the SELECT statement was legal. Now the statement fails with an Unknown column 'i3' in 'on clause' error because i3 is a column in t3, which is not an operand of the ON clause. The statement should be rewritten as follows:

    SELECT * FROM t1 JOIN t2 JOIN t3 ON (i1 = i3);
  • Resolution of column names in NATURAL or USING joins is different than previously. For column names that are outside the FROM clause, MySQL now handles a superset of the queries compared to previously. That is, in cases when MySQL formerly issued an error that some column is ambiguous, the query now is handled correctly. This is due to the fact that MySQL now treats the common columns of NATURAL or USING joins as a single column, so when a query refers to such columns, the query compiler does not consider them as ambiguous.

    Example:

    SELECT * FROM t1 NATURAL JOIN t2 WHERE b > 1;

    Previously, this query would produce an error ERROR 1052 (23000): Column 'b' in where clause is ambiguous. Now the query produces the correct result:

    +------+------+------+| b | c | y |+------+------+------+| 4 | 2 | 3 |+------+------+------+

    One extension of MySQL compared to the SQL:2003 standard is that MySQL enables you to qualify the common (coalesced) columns of NATURAL or USING joins (just as previously), while the standard disallows that.

13.2.9.3. Index Hint Syntax

You can provide hints to give the optimizer information about how to choose indexes during query processing. Section 13.2.9.2, "JOIN Syntax", describes the general syntax for specifying tables in a SELECT statement. The syntax for an individual table, including that for index hints, looks like this:

tbl_name [[AS] alias] [index_hint_list]index_hint_list: index_hint [, index_hint] ...index_hint: USE {INDEX|KEY}  [FOR {JOIN|ORDER BY|GROUP BY}] ([index_list])  | IGNORE {INDEX|KEY}  [FOR {JOIN|ORDER BY|GROUP BY}] (index_list)  | FORCE {INDEX|KEY}  [FOR {JOIN|ORDER BY|GROUP BY}] (index_list)index_list: index_name [, index_name] ...

By specifying USE INDEX (index_list), you can tell MySQL to use only one of the named indexes to find rows in the table. The alternative syntax IGNORE INDEX (index_list) can be used to tell MySQL to not use some particular index or indexes. These hints are useful if EXPLAIN shows that MySQL is using the wrong index from the list of possible indexes.

You can also use FORCE INDEX, which acts like USE INDEX (index_list) but with the addition that a table scan is assumed to be very expensive. In other words, a table scan is used only if there is no way to use one of the given indexes to find rows in the table.

Each hint requires the names of indexes, not the names of columns. The name of a PRIMARY KEY is PRIMARY. To see the index names for a table, use SHOW INDEX.

An index_name value need not be a full index name. It can be an unambiguous prefix of an index name. If a prefix is ambiguous, an error occurs.

Examples:

SELECT * FROM table1 USE INDEX (col1_index,col2_index)  WHERE col1=1 AND col2=2 AND col3=3;SELECT * FROM table1 IGNORE INDEX (col3_index)  WHERE col1=1 AND col2=2 AND col3=3;

The syntax for index hints has the following characteristics:

  • It is syntactically valid to specify an empty index_list for USE INDEX, which means "use no indexes." Specifying an empty index_list for FORCE INDEX or IGNORE INDEX is a syntax error.

  • You can specify the scope of a index hint by adding a FOR clause to the hint. This provides more fine-grained control over the optimizer's selection of an execution plan for various phases of query processing. To affect only the indexes used when MySQL decides how to find rows in the table and how to process joins, use FOR JOIN. To influence index usage for sorting or grouping rows, use FOR ORDER BY or FOR GROUP BY. (However, if there is a covering index for the table and it is used to access the table, the optimizer will ignore IGNORE INDEX FOR {ORDER BY|GROUP BY} hints that disable that index.)

  • You can specify multiple index hints:

    SELECT * FROM t1 USE INDEX (i1) IGNORE INDEX FOR ORDER BY (i2) ORDER BY a;

    It is not a error to name the same index in several hints (even within the same hint):

    SELECT * FROM t1 USE INDEX (i1) USE INDEX (i1,i1);

    However, it is an error to mix USE INDEX and FORCE INDEX for the same table:

    SELECT * FROM t1 USE INDEX FOR JOIN (i1) FORCE INDEX FOR JOIN (i2);

if you specify no FOR clause for an index hint, the hint by default applies to all parts of the statement. For example, this hint:

IGNORE INDEX (i1)

is equivalent to this combination of hints:

IGNORE INDEX FOR JOIN (i1)IGNORE INDEX FOR ORDER BY (i1)IGNORE INDEX FOR GROUP BY (i1)

To cause the server to use the older behavior for hint scope when no FOR clause is present (so that hints apply only to row retrieval), enable the old system variable at server startup. Take care about enabling this variable in a replication setup. With statement-based binary logging, having different modes for the master and slaves might lead to replication errors.

When index hints are processed, they are collected in a single list by type (USE, FORCE, IGNORE) and by scope (FOR JOIN, FOR ORDER BY, FOR GROUP BY). For example:

SELECT * FROM t1  USE INDEX () IGNORE INDEX (i2) USE INDEX (i1) USE INDEX (i2);

is equivalent to:

SELECT * FROM t1   USE INDEX (i1,i2) IGNORE INDEX (i2);

The index hints then are applied for each scope in the following order:

  1. {USE|FORCE} INDEX is applied if present. (If not, the optimizer-determined set of indexes is used.)

  2. IGNORE INDEX is applied over the result of the previous step. For example, the following two queries are equivalent:

    SELECT * FROM t1 USE INDEX (i1) IGNORE INDEX (i2) USE INDEX (i2);SELECT * FROM t1 USE INDEX (i1);

For FULLTEXT searches, index hints work as follows:

  • For natural language mode searches, index hints are silently ignored. For example, IGNORE INDEX(i) is ignored with no warning and the index is still used.

    For boolean mode searches, index hints with FOR ORDER BY or FOR GROUP BY are silently ignored. Index hints with FOR JOIN or no FOR modifier are honored. In contrast to how hints apply for non-FULLTEXT searches, the hint is used for all phases of query execution (finding rows and retrieval, grouping, and ordering). This is true even if the hint is given for a non-FULLTEXT index.

For example, the following two queries are equivalent:

SELECT * FROM t  USE INDEX (index1)  IGNORE INDEX (index1) FOR ORDER BY  IGNORE INDEX (index1) FOR GROUP BY  WHERE ... IN BOOLEAN MODE ... ;SELECT * FROM t  USE INDEX (index1)  WHERE ... IN BOOLEAN MODE ... ;

13.2.9.4. UNION Syntax

SELECT ...UNION [ALL | DISTINCT] SELECT ...[UNION [ALL | DISTINCT] SELECT ...]

UNION is used to combine the result from multiple SELECT statements into a single result set.

The column names from the first SELECT statement are used as the column names for the results returned. Selected columns listed in corresponding positions of each SELECT statement should have the same data type. (For example, the first column selected by the first statement should have the same type as the first column selected by the other statements.)

If the data types of corresponding SELECT columns do not match, the types and lengths of the columns in the UNION result take into account the values retrieved by all of the SELECT statements. For example, consider the following:

mysql> SELECT REPEAT('a',1) UNION SELECT REPEAT('b',10);+---------------+| REPEAT('a',1) |+---------------+| a || bbbbbbbbbb |+---------------+

The SELECT statements are normal select statements, but with the following restrictions:

  • Only the last SELECT statement can use INTO OUTFILE. (However, the entire UNION result is written to the file.)

  • HIGH_PRIORITY cannot be used with SELECT statements that are part of a UNION. If you specify it for the first SELECT, it has no effect. If you specify it for any subsequent SELECT statements, a syntax error results.

The default behavior for UNION is that duplicate rows are removed from the result. The optional DISTINCT keyword has no effect other than the default because it also specifies duplicate-row removal. With the optional ALL keyword, duplicate-row removal does not occur and the result includes all matching rows from all the SELECT statements.

You can mix UNION ALL and UNION DISTINCT in the same query. Mixed UNION types are treated such that a DISTINCT union overrides any ALL union to its left. A DISTINCT union can be produced explicitly by using UNION DISTINCT or implicitly by using UNION with no following DISTINCT or ALL keyword.

To apply ORDER BY or LIMIT to an individual SELECT, place the clause inside the parentheses that enclose the SELECT:

(SELECT a FROM t1 WHERE a=10 AND B=1 ORDER BY a LIMIT 10)UNION(SELECT a FROM t2 WHERE a=11 AND B=2 ORDER BY a LIMIT 10);

However, use of ORDER BY for individual SELECT statements implies nothing about the order in which the rows appear in the final result because UNION by default produces an unordered set of rows. Therefore, the use of ORDER BY in this context is typically in conjunction with LIMIT, so that it is used to determine the subset of the selected rows to retrieve for the SELECT, even though it does not necessarily affect the order of those rows in the final UNION result. If ORDER BY appears without LIMIT in a SELECT, it is optimized away because it will have no effect anyway.

To use an ORDER BY or LIMIT clause to sort or limit the entire UNION result, parenthesize the individual SELECT statements and place the ORDER BY or LIMIT after the last one. The following example uses both clauses:

(SELECT a FROM t1 WHERE a=10 AND B=1)UNION(SELECT a FROM t2 WHERE a=11 AND B=2)ORDER BY a LIMIT 10;

A statement without parentheses is equivalent to one parenthesized as just shown.

This kind of ORDER BY cannot use column references that include a table name (that is, names in tbl_name.col_name format). Instead, provide a column alias in the first SELECT statement and refer to the alias in the ORDER BY. (Alternatively, refer to the column in the ORDER BY using its column position. However, use of column positions is deprecated.)

Also, if a column to be sorted is aliased, the ORDER BY clause must refer to the alias, not the column name. The first of the following statements will work, but the second will fail with an Unknown column 'a' in 'order clause' error:

(SELECT a AS b FROM t) UNION (SELECT ...) ORDER BY b;(SELECT a AS b FROM t) UNION (SELECT ...) ORDER BY a;

To cause rows in a UNION result to consist of the sets of rows retrieved by each SELECT one after the other, select an additional column in each SELECT to use as a sort column and add an ORDER BY following the last SELECT:

(SELECT 1 AS sort_col, col1a, col1b, ... FROM t1)UNION(SELECT 2, col2a, col2b, ... FROM t2) ORDER BY sort_col;

To additionally maintain sort order within individual SELECT results, add a secondary column to the ORDER BY clause:

(SELECT 1 AS sort_col, col1a, col1b, ... FROM t1)UNION(SELECT 2, col2a, col2b, ... FROM t2) ORDER BY sort_col, col1a;

Use of an additional column also enables you to determine which SELECT each row comes from. Extra columns can provide other identifying information as well, such as a string that indicates a table name.

13.2.10. Subquery Syntax

A subquery is a SELECT statement within another statement.

Starting with MySQL 4.1, all subquery forms and operations that the SQL standard requires are supported, as well as a few features that are MySQL-specific.

Here is an example of a subquery:

SELECT * FROM t1 WHERE column1 = (SELECT column1 FROM t2);

In this example, SELECT * FROM t1 ... is the outer query (or outer statement), and (SELECT column1 FROM t2) is the subquery. We say that the subquery is nested within the outer query, and in fact it is possible to nest subqueries within other subqueries, to a considerable depth. A subquery must always appear within parentheses.

The main advantages of subqueries are:

  • They allow queries that are structured so that it is possible to isolate each part of a statement.

  • They provide alternative ways to perform operations that would otherwise require complex joins and unions.

  • Many people find subqueries more readable than complex joins or unions. Indeed, it was the innovation of subqueries that gave people the original idea of calling the early SQL "Structured Query Language."

Here is an example statement that shows the major points about subquery syntax as specified by the SQL standard and supported in MySQL:

DELETE FROM t1WHERE s11 > ANY (SELECT COUNT(*) /* no hint */ FROM t2  WHERE NOT EXISTS   (SELECT * FROM t3 WHERE ROW(5*t2.s1,77)= (SELECT 50,11*s1 FROM t4 UNION SELECT 50,77 FROM  (SELECT * FROM t5) AS t5)));

A subquery can return a scalar (a single value), a single row, a single column, or a table (one or more rows of one or more columns). These are called scalar, column, row, and table subqueries. Subqueries that return a particular kind of result often can be used only in certain contexts, as described in the following sections.

There are few restrictions on the type of statements in which subqueries can be used. A subquery can contain many of the keywords or clauses that an ordinary SELECT can contain: DISTINCT, GROUP BY, ORDER BY, LIMIT, joins, index hints, UNION constructs, comments, functions, and so on.

A subquery's outer statement can be any one of: SELECT, INSERT, UPDATE, DELETE, SET, or DO.

In MySQL, you cannot modify a table and select from the same table in a subquery. This applies to statements such as DELETE, INSERT, REPLACE, UPDATE, and (because subqueries can be used in the SET clause) LOAD DATA INFILE.

For information about how the optimizer handles subqueries, see Section 8.13.12, "Optimizing Subqueries with EXISTS Strategy". For a discussion of restrictions on subquery use, including performance issues for certain forms of subquery syntax, see Section E.4, "Restrictions on Subqueries".

13.2.10.1. The Subquery as Scalar Operand

In its simplest form, a subquery is a scalar subquery that returns a single value. A scalar subquery is a simple operand, and you can use it almost anywhere a single column value or literal is legal, and you can expect it to have those characteristics that all operands have: a data type, a length, an indication that it can be NULL, and so on. For example:

CREATE TABLE t1 (s1 INT, s2 CHAR(5) NOT NULL);INSERT INTO t1 VALUES(100, 'abcde');SELECT (SELECT s2 FROM t1);

The subquery in this SELECT returns a single value ('abcde') that has a data type of CHAR, a length of 5, a character set and collation equal to the defaults in effect at CREATE TABLE time, and an indication that the value in the column can be NULL. Nullability of the value selected by a scalar subquery is not copied because if the subquery result is empty, the result is NULL. For the subquery just shown, if t1 were empty, the result would be NULL even though s2 is NOT NULL.

There are a few contexts in which a scalar subquery cannot be used. If a statement permits only a literal value, you cannot use a subquery. For example, LIMIT requires literal integer arguments, and LOAD DATA INFILE requires a literal string file name. You cannot use subqueries to supply these values.

When you see examples in the following sections that contain the rather spartan construct (SELECT column1 FROM t1), imagine that your own code contains much more diverse and complex constructions.

Suppose that we make two tables:

CREATE TABLE t1 (s1 INT);INSERT INTO t1 VALUES (1);CREATE TABLE t2 (s1 INT);INSERT INTO t2 VALUES (2);

Then perform a SELECT:

SELECT (SELECT s1 FROM t2) FROM t1;

The result is 2 because there is a row in t2 containing a column s1 that has a value of 2.

A scalar subquery can be part of an expression, but remember the parentheses, even if the subquery is an operand that provides an argument for a function. For example:

SELECT UPPER((SELECT s1 FROM t1)) FROM t2;

13.2.10.2. Comparisons Using Subqueries

The most common use of a subquery is in the form:

non_subquery_operand comparison_operator (subquery)

Where comparison_operator is one of these operators:

=  >  <  >=  <=  <>  !=  <=>

For example:

... WHERE 'a' = (SELECT column1 FROM t1)

MySQL also permits this construct:

non_subquery_operand LIKE (subquery)

At one time the only legal place for a subquery was on the right side of a comparison, and you might still find some old DBMSs that insist on this.

Here is an example of a common-form subquery comparison that you cannot do with a join. It finds all the rows in table t1 for which the column1 value is equal to a maximum value in table t2:

SELECT * FROM t1  WHERE column1 = (SELECT MAX(column2) FROM t2);

Here is another example, which again is impossible with a join because it involves aggregating for one of the tables. It finds all rows in table t1 containing a value that occurs twice in a given column:

SELECT * FROM t1 AS t  WHERE 2 = (SELECT COUNT(*) FROM t1 WHERE t1.id = t.id);

For a comparison of the subquery to a scalar, the subquery must return a scalar. For a comparison of the subquery to a row constructor, the subquery must be a row subquery that returns a row with the same number of values as the row constructor. See Section 13.2.10.5, "Row Subqueries".

13.2.10.3. Subqueries with ANY, IN, orSOME

Syntax:

operand comparison_operator ANY (subquery)operand IN (subquery)operand comparison_operator SOME (subquery)

Where comparison_operator is one of these operators:

=  >  <  >=  <=  <>  !=

The ANY keyword, which must follow a comparison operator, means "return TRUE if the comparison is TRUE for ANY of the values in the column that the subquery returns." For example:

SELECT s1 FROM t1 WHERE s1 > ANY (SELECT s1 FROM t2);

Suppose that there is a row in table t1 containing (10). The expression is TRUE if table t2 contains (21,14,7) because there is a value 7 in t2 that is less than 10. The expression is FALSE if table t2 contains (20,10), or if table t2 is empty. The expression is unknown (that is, NULL) if table t2 contains (NULL,NULL,NULL).

When used with a subquery, the word IN is an alias for = ANY. Thus, these two statements are the same:

SELECT s1 FROM t1 WHERE s1 = ANY (SELECT s1 FROM t2);SELECT s1 FROM t1 WHERE s1 IN (SELECT s1 FROM t2);

IN and = ANY are not synonyms when used with an expression list. IN can take an expression list, but = ANY cannot. See Section 12.3.2, "Comparison Functions and Operators".

NOT IN is not an alias for <> ANY, but for <> ALL. See Section 13.2.10.4, "Subqueries with ALL".

The word SOME is an alias for ANY. Thus, these two statements are the same:

SELECT s1 FROM t1 WHERE s1 <> ANY  (SELECT s1 FROM t2);SELECT s1 FROM t1 WHERE s1 <> SOME (SELECT s1 FROM t2);

Use of the word SOME is rare, but this example shows why it might be useful. To most people, the English phrase "a is not equal to any b" means "there is no b which is equal to a," but that is not what is meant by the SQL syntax. The syntax means "there is some b to which a is not equal." Using <> SOME instead helps ensure that everyone understands the true meaning of the query.

13.2.10.4. Subqueries with ALL

Syntax:

operand comparison_operator ALL (subquery)

The word ALL, which must follow a comparison operator, means "return TRUE if the comparison is TRUE for ALL of the values in the column that the subquery returns." For example:

SELECT s1 FROM t1 WHERE s1 > ALL (SELECT s1 FROM t2);

Suppose that there is a row in table t1 containing (10). The expression is TRUE if table t2 contains (-5,0,+5) because 10 is greater than all three values in t2. The expression is FALSE if table t2 contains (12,6,NULL,-100) because there is a single value 12 in table t2 that is greater than 10. The expression is unknown (that is, NULL) if table t2 contains (0,NULL,1).

Finally, the expression is TRUE if table t2 is empty. So, the following expression is TRUE when table t2 is empty:

SELECT * FROM t1 WHERE 1 > ALL (SELECT s1 FROM t2);

But this expression is NULL when table t2 is empty:

SELECT * FROM t1 WHERE 1 > (SELECT s1 FROM t2);

In addition, the following expression is NULL when table t2 is empty:

SELECT * FROM t1 WHERE 1 > ALL (SELECT MAX(s1) FROM t2);

In general, tables containing NULL values and empty tables are "edge cases." When writing subqueries, always consider whether you have taken those two possibilities into account.

NOT IN is an alias for <> ALL. Thus, these two statements are the same:

SELECT s1 FROM t1 WHERE s1 <> ALL (SELECT s1 FROM t2);SELECT s1 FROM t1 WHERE s1 NOT IN (SELECT s1 FROM t2);

13.2.10.5. Row Subqueries

The discussion to this point has been of scalar or column subqueries; that is, subqueries that return a single value or a column of values. A row subquery is a subquery variant that returns a single row and can thus return more than one column value. Legal operators for row subquery comparisons are:

=  >  <  >=  <=  <>  !=  <=>

Here are two examples:

SELECT * FROM t1  WHERE (col1,col2) = (SELECT col3, col4 FROM t2 WHERE id = 10);SELECT * FROM t1  WHERE ROW(col1,col2) = (SELECT col3, col4 FROM t2 WHERE id = 10);

For both queries, if the table t2 contains a single row with id = 10, the subquery returns a single row. If this row has col3 and col4 values equal to the col1 and col2 values of any rows in t1, the WHERE expression is TRUE and each query returns those t1 rows. If the t2 row col3 and col4 values are not equal the col1 and col2 values of any t1 row, the expression is FALSE and the query returns an empty result set. The expression is unknown (that is, NULL) if the subquery produces no rows. An error occurs if the subquery produces multiple rows because a row subquery can return at most one row.

The expressions (1,2) and ROW(1,2) are sometimes called row constructors. The two are equivalent. The row constructor and the row returned by the subquery must contain the same number of values.

A row constructor is used for comparisons with subqueries that return two or more columns. When a subquery returns a single column, this is regarded as a scalar value and not as a row, so a row constructor cannot be used with a subquery that does not return at least two columns. Thus, the following query fails with a syntax error:

SELECT * FROM t1 WHERE ROW(1) = (SELECT column1 FROM t2)

Row constructors are legal in other contexts. For example, the following two statements are semantically equivalent (and are handled in the same way by the optimizer):

SELECT * FROM t1 WHERE (column1,column2) = (1,1);SELECT * FROM t1 WHERE column1 = 1 AND column2 = 1;

The following query answers the request, "find all rows in table t1 that also exist in table t2":

SELECT column1,column2,column3  FROM t1  WHERE (column1,column2,column3) IN (SELECT column1,column2,column3 FROM t2);

13.2.10.6. Subqueries with EXISTS or NOTEXISTS

If a subquery returns any rows at all, EXISTS subquery is TRUE, and NOT EXISTS subquery is FALSE. For example:

SELECT column1 FROM t1 WHERE EXISTS (SELECT * FROM t2);

Traditionally, an EXISTS subquery starts with SELECT *, but it could begin with SELECT 5 or SELECT column1 or anything at all. MySQL ignores the SELECT list in such a subquery, so it makes no difference.

For the preceding example, if t2 contains any rows, even rows with nothing but NULL values, the EXISTS condition is TRUE. This is actually an unlikely example because a [NOT] EXISTS subquery almost always contains correlations. Here are some more realistic examples:

  • What kind of store is present in one or more cities?

    SELECT DISTINCT store_type FROM stores  WHERE EXISTS (SELECT * FROM cities_stores WHERE cities_stores.store_type = stores.store_type);
  • What kind of store is present in no cities?

    SELECT DISTINCT store_type FROM stores  WHERE NOT EXISTS (SELECT * FROM cities_stores WHERE cities_stores.store_type = stores.store_type);
  • What kind of store is present in all cities?

    SELECT DISTINCT store_type FROM stores s1  WHERE NOT EXISTS ( SELECT * FROM cities WHERE NOT EXISTS (  SELECT * FROM cities_stores   WHERE cities_stores.city = cities.city   AND cities_stores.store_type = stores.store_type));

The last example is a double-nested NOT EXISTS query. That is, it has a NOT EXISTS clause within a NOT EXISTS clause. Formally, it answers the question "does a city exist with a store that is not in Stores"? But it is easier to say that a nested NOT EXISTS answers the question "is x TRUE for all y?"

13.2.10.7. Correlated Subqueries

A correlated subquery is a subquery that contains a reference to a table that also appears in the outer query. For example:

SELECT * FROM t1  WHERE column1 = ANY (SELECT column1 FROM t2   WHERE t2.column2 = t1.column2);

Notice that the subquery contains a reference to a column of t1, even though the subquery's FROM clause does not mention a table t1. So, MySQL looks outside the subquery, and finds t1 in the outer query.

Suppose that table t1 contains a row where column1 = 5 and column2 = 6; meanwhile, table t2 contains a row where column1 = 5 and column2 = 7. The simple expression ... WHERE column1 = ANY (SELECT column1 FROM t2) would be TRUE, but in this example, the WHERE clause within the subquery is FALSE (because (5,6) is not equal to (5,7)), so the expression as a whole is FALSE.

Scoping rule: MySQL evaluates from inside to outside. For example:

SELECT column1 FROM t1 AS x  WHERE x.column1 = (SELECT column1 FROM t2 AS x WHERE x.column1 = (SELECT column1 FROM t3  WHERE x.column2 = t3.column1));

In this statement, x.column2 must be a column in table t2 because SELECT column1 FROM t2 AS x ... renames t2. It is not a column in table t1 because SELECT column1 FROM t1 ... is an outer query that is farther out.

For subqueries in HAVING or ORDER BY clauses, MySQL also looks for column names in the outer select list.

For certain cases, a correlated subquery is optimized. For example:

val IN (SELECT key_val FROM tbl_name WHERE correlated_condition)

Otherwise, they are inefficient and likely to be slow. Rewriting the query as a join might improve performance.

Aggregate functions in correlated subqueries may contain outer references, provided the function contains nothing but outer references, and provided the function is not contained in another function or expression.

13.2.10.8. Subqueries in the FROM Clause

Subqueries are legal in a SELECT statement's FROM clause. The actual syntax is:

SELECT ... FROM (subquery) [AS] name ...

The [AS] name clause is mandatory, because every table in a FROM clause must have a name. Any columns in the subquery select list must have unique names.

For the sake of illustration, assume that you have this table:

CREATE TABLE t1 (s1 INT, s2 CHAR(5), s3 FLOAT);

Here is how to use a subquery in the FROM clause, using the example table:

INSERT INTO t1 VALUES (1,'1',1.0);INSERT INTO t1 VALUES (2,'2',2.0);SELECT sb1,sb2,sb3  FROM (SELECT s1 AS sb1, s2 AS sb2, s3*2 AS sb3 FROM t1) AS sb  WHERE sb1 > 1;

Result: 2, '2', 4.0.

Here is another example: Suppose that you want to know the average of a set of sums for a grouped table. This does not work:

SELECT AVG(SUM(column1)) FROM t1 GROUP BY column1;

However, this query provides the desired information:

SELECT AVG(sum_column1)  FROM (SELECT SUM(column1) AS sum_column1 FROM t1 GROUP BY column1) AS t1;

Notice that the column name used within the subquery (sum_column1) is recognized in the outer query.

Subqueries in the FROM clause can return a scalar, column, row, or table. Subqueries in the FROM clause cannot be correlated subqueries, unless used within the ON clause of a JOIN operation.

Subqueries in the FROM clause are executed even for the EXPLAIN statement (that is, derived temporary tables are materialized). This occurs because upper-level queries need information about all tables during the optimization phase, and the table represented by a subquery in the FROM clause is unavailable unless the subquery is executed.

It is possible under certain circumstances to modify table data using EXPLAIN SELECT. This can occur if the outer query accesses any tables and an inner query invokes a stored function that changes one or more rows of a table. Suppose that there are two tables t1 and t2 in database d1, created as shown here:

mysql> CREATE DATABASE d1;Query OK, 1 row affected (0.00 sec)mysql> USE d1;Database changedmysql> CREATE TABLE t1 (c1 INT);Query OK, 0 rows affected (0.15 sec)mysql> CREATE TABLE t2 (c1 INT);Query OK, 0 rows affected (0.08 sec)

Now we create a stored function f1 which modifies t2:

mysql> DELIMITER //mysql> CREATE FUNCTION f1(p1 INT) RETURNS INTmysql>   BEGINmysql> INSERT INTO t2 VALUES (p1);mysql> RETURN p1;mysql>   END //Query OK, 0 rows affected (0.01 sec)mysql> DELIMITER ;

Referencing the function directly in an EXPLAIN SELECT does not have any effect on t2, as shown here:

mysql> SELECT * FROM t2;Empty set (0.00 sec)mysql> EXPLAIN SELECT f1(5);+--+-----------+-----+----+-------------+----+-------+----+----+--------------+|id|select_type|table|type|possible_keys|key |key_len|ref |rows|Extra |+--+-----------+-----+----+-------------+----+-------+----+----+--------------+| 1|SIMPLE |NULL |NULL|NULL |NULL|NULL   |NULL|NULL|No tables used|+--+-----------+-----+----+-------------+----+-------+----+----+--------------+1 row in set (0.00 sec)mysql> SELECT * FROM t2;Empty set (0.00 sec)

This is because the SELECT statement did not reference any tables, as can be seen in the table and Extra columns of the output. This is also true of the following nested SELECT:

mysql> EXPLAIN SELECT NOW() AS a1, (SELECT f1(5)) AS a2;+--+-----------+-----+----+-------------+----+-------+----+----+--------------+|id|select_type|table|type|possible_keys|key |key_len|ref |rows|Extra |+--+-----------+-----+----+-------------+----+-------+----+----+--------------+| 1|PRIMARY |NULL |NULL|NULL |NULL|NULL   |NULL|NULL|No tables used|+--+-----------+-----+----+-------------+----+-------+----+----+--------------+1 row in set, 1 warning (0.00 sec)mysql> SHOW WARNINGS;+-------+------+------------------------------------------+| Level | Code | Message  |+-------+------+------------------------------------------+| Note  | 1249 | Select 2 was reduced during optimization |+-------+------+------------------------------------------+1 row in set (0.00 sec)mysql> SELECT * FROM t2;Empty set (0.00 sec)

However, if the outer SELECT references any tables, the optimizer executes the statement in the subquery as well:

mysql> EXPLAIN SELECT * FROM t1 AS a1, (SELECT f1(5)) AS a2;+----+-------------+------------+--------+---------------+------+---------+------+| id | select_type | table  | type   | possible_keys | key  | key_len | ref  |+----+-------------+------------+--------+---------------+------+---------+------+|  1 | PRIMARY | a1 | system | NULL  | NULL | NULL | NULL ||  1 | PRIMARY | <derived2> | system | NULL  | NULL | NULL | NULL ||  2 | DERIVED | NULL   | NULL   | NULL  | NULL | NULL | NULL |+----+-------------+------------+--------+---------------+------+---------+------+------+---------------------+ rows | Extra   |------+---------------------+ 0 | const row not found | 1 | | NULL | No tables used  |------+---------------------+3 rows in set (0.00 sec)mysql> SELECT * FROM t2;+------+| c1   |+------+| 5 |+------+1 row in set (0.00 sec)

This also means that an EXPLAIN SELECT statement such as the one shown here may take a long time to execute because the BENCHMARK() function is executed once for each row in t1:

EXPLAIN SELECT * FROM t1 AS a1, (SELECT BENCHMARK(1000000, MD5(NOW())));

13.2.10.9. Subquery Errors

There are some errors that apply only to subqueries. This section describes them.

  • Unsupported subquery syntax:

    ERROR 1235 (ER_NOT_SUPPORTED_YET)SQLSTATE = 42000Message = "This version of MySQL doesn't yet support'LIMIT & IN/ALL/ANY/SOME subquery'"

    This means that MySQL does not support statements of the following form:

    SELECT * FROM t1 WHERE s1 IN (SELECT s2 FROM t2 ORDER BY s1 LIMIT 1)
  • Incorrect number of columns from subquery:

    ERROR 1241 (ER_OPERAND_COL)SQLSTATE = 21000Message = "Operand should contain 1 column(s)"

    This error occurs in cases like this:

    SELECT (SELECT column1, column2 FROM t2) FROM t1;

    You may use a subquery that returns multiple columns, if the purpose is row comparison. In other contexts, the subquery must be a scalar operand. See Section 13.2.10.5, "Row Subqueries".

  • Incorrect number of rows from subquery:

    ERROR 1242 (ER_SUBSELECT_NO_1_ROW)SQLSTATE = 21000Message = "Subquery returns more than 1 row"

    This error occurs for statements where the subquery must return at most one row but returns multiple rows. Consider the following example:

    SELECT * FROM t1 WHERE column1 = (SELECT column1 FROM t2);

    If SELECT column1 FROM t2 returns just one row, the previous query will work. If the subquery returns more than one row, error 1242 will occur. In that case, the query should be rewritten as:

    SELECT * FROM t1 WHERE column1 = ANY (SELECT column1 FROM t2);
  • Incorrectly used table in subquery:

    Error 1093 (ER_UPDATE_TABLE_USED)SQLSTATE = HY000Message = "You can't specify target table 'x'for update in FROM clause"

    This error occurs in cases such as the following, which attempts to modify a table and select from the same table in the subquery:

    UPDATE t1 SET column2 = (SELECT MAX(column1) FROM t1);

    You can use a subquery for assignment within an UPDATE statement because subqueries are legal in UPDATE and DELETE statements as well as in SELECT statements. However, you cannot use the same table (in this case, table t1) for both the subquery FROM clause and the update target.

For transactional storage engines, the failure of a subquery causes the entire statement to fail. For nontransactional storage engines, data modifications made before the error was encountered are preserved.

13.2.10.10. Optimizing Subqueries

Development is ongoing, so no optimization tip is reliable for the long term. The following list provides some interesting tricks that you might want to play with:

  • Use subquery clauses that affect the number or order of the rows in the subquery. For example:

    SELECT * FROM t1 WHERE t1.column1 IN  (SELECT column1 FROM t2 ORDER BY column1);SELECT * FROM t1 WHERE t1.column1 IN  (SELECT DISTINCT column1 FROM t2);SELECT * FROM t1 WHERE EXISTS  (SELECT * FROM t2 LIMIT 1);
  • Replace a join with a subquery. For example, try this:

    SELECT DISTINCT column1 FROM t1 WHERE t1.column1 IN (  SELECT column1 FROM t2);

    Instead of this:

    SELECT DISTINCT t1.column1 FROM t1, t2  WHERE t1.column1 = t2.column1;
  • Some subqueries can be transformed to joins for compatibility with older versions of MySQL that do not support subqueries. However, in some cases, converting a subquery to a join may improve performance. See Section 13.2.10.11, "Rewriting Subqueries as Joins".

  • Move clauses from outside to inside the subquery. For example, use this query:

    SELECT * FROM t1  WHERE s1 IN (SELECT s1 FROM t1 UNION ALL SELECT s1 FROM t2);

    Instead of this query:

    SELECT * FROM t1  WHERE s1 IN (SELECT s1 FROM t1) OR s1 IN (SELECT s1 FROM t2);

    For another example, use this query:

    SELECT (SELECT column1 + 5 FROM t1) FROM t2;

    Instead of this query:

    SELECT (SELECT column1 FROM t1) + 5 FROM t2;
  • Use a row subquery instead of a correlated subquery. For example, use this query:

    SELECT * FROM t1  WHERE (column1,column2) IN (SELECT column1,column2 FROM t2);

    Instead of this query:

    SELECT * FROM t1  WHERE EXISTS (SELECT * FROM t2 WHERE t2.column1=t1.column1 AND t2.column2=t1.column2);
  • Use NOT (a = ANY (...)) rather than a <> ALL (...).

  • Use x = ANY (table containing (1,2)) rather than x=1 OR x=2.

  • Use = ANY rather than EXISTS.

  • For uncorrelated subqueries that always return one row, IN is always slower than =. For example, use this query:

    SELECT * FROM t1  WHERE t1.col_name = (SELECT a FROM t2 WHERE b = some_const);

    Instead of this query:

    SELECT * FROM t1  WHERE t1.col_name IN (SELECT a FROM t2 WHERE b = some_const);

These tricks might cause programs to go faster or slower. Using MySQL facilities like the BENCHMARK() function, you can get an idea about what helps in your own situation. See Section 12.14, "Information Functions".

Some optimizations that MySQL itself makes are:

  • MySQL executes uncorrelated subqueries only once. Use EXPLAIN to make sure that a given subquery really is uncorrelated.

  • MySQL rewrites IN, ALL, ANY, and SOME subqueries in an attempt to take advantage of the possibility that the select-list columns in the subquery are indexed.

  • MySQL replaces subqueries of the following form with an index-lookup function, which EXPLAIN describes as a special join type (unique_subquery or index_subquery):

    ... IN (SELECT indexed_column FROM single_table ...)
  • MySQL enhances expressions of the following form with an expression involving MIN() or MAX(), unless NULL values or empty sets are involved:

    value {ALL|ANY|SOME} {> | < | >= | <=} (uncorrelated subquery)

    For example, this WHERE clause:

    WHERE 5 > ALL (SELECT x FROM t)

    might be treated by the optimizer like this:

    WHERE 5 > (SELECT MAX(x) FROM t)

See also MySQL Internals: How MySQL Transforms Subqueries.

13.2.10.11. Rewriting Subqueries as Joins

Sometimes there are other ways to test membership in a set of values than by using a subquery. Also, on some occasions, it is not only possible to rewrite a query without a subquery, but it can be more efficient to make use of some of these techniques rather than to use subqueries. One of these is the IN() construct:

For example, this query:

SELECT * FROM t1 WHERE id IN (SELECT id FROM t2);

Can be rewritten as:

SELECT DISTINCT t1.* FROM t1, t2 WHERE t1.id=t2.id;

The queries:

SELECT * FROM t1 WHERE id NOT IN (SELECT id FROM t2);SELECT * FROM t1 WHERE NOT EXISTS (SELECT id FROM t2 WHERE t1.id=t2.id);

Can be rewritten as:

SELECT table1.*  FROM table1 LEFT JOIN table2 ON table1.id=table2.id  WHERE table2.id IS NULL;

A LEFT [OUTER] JOIN can be faster than an equivalent subquery because the server might be able to optimize it better-a fact that is not specific to MySQL Server alone. Prior to SQL-92, outer joins did not exist, so subqueries were the only way to do certain things. Today, MySQL Server and many other modern database systems offer a wide range of outer join types.

MySQL Server supports multiple-table DELETE statements that can be used to efficiently delete rows based on information from one table or even from many tables at the same time. Multiple-table UPDATE statements are also supported. See Section 13.2.2, "DELETE Syntax", and Section 13.2.11, "UPDATE Syntax".

13.2.11. UPDATE Syntax

Single-table syntax:

UPDATE [LOW_PRIORITY] [IGNORE] table_reference SET col_name1={expr1|DEFAULT} [, col_name2={expr2|DEFAULT}] ... [WHERE where_condition] [ORDER BY ...] [LIMIT row_count]

Multiple-table syntax:

UPDATE [LOW_PRIORITY] [IGNORE] table_references SET col_name1={expr1|DEFAULT} [, col_name2={expr2|DEFAULT}] ... [WHERE where_condition]

For the single-table syntax, the UPDATE statement updates columns of existing rows in the named table with new values. The SET clause indicates which columns to modify and the values they should be given. Each value can be given as an expression, or the keyword DEFAULT to set a column explicitly to its default value. The WHERE clause, if given, specifies the conditions that identify which rows to update. With no WHERE clause, all rows are updated. If the ORDER BY clause is specified, the rows are updated in the order that is specified. The LIMIT clause places a limit on the number of rows that can be updated.

For the multiple-table syntax, UPDATE updates rows in each table named in table_references that satisfy the conditions. In this case, ORDER BY and LIMIT cannot be used.

where_condition is an expression that evaluates to true for each row to be updated. For expression syntax, see Section 9.5, "Expression Syntax".

table_references and where_condition are is specified as described in Section 13.2.9, "SELECT Syntax".

You need the UPDATE privilege only for columns referenced in an UPDATE that are actually updated. You need only the SELECT privilege for any columns that are read but not modified.

The UPDATE statement supports the following modifiers:

  • With the LOW_PRIORITY keyword, execution of the UPDATE is delayed until no other clients are reading from the table. This affects only storage engines that use only table-level locking (such as MyISAM, MEMORY, and MERGE).

  • With the IGNORE keyword, the update statement does not abort even if errors occur during the update. Rows for which duplicate-key conflicts occur are not updated. Rows for which columns are updated to values that would cause data conversion errors are updated to the closest valid values instead.

In MySQL 5.5.18 and later, UPDATE IGNORE statements, including those having an ORDER BY clause, are flagged as unsafe for statement-based replication. (This is because the order in which the rows are updated determines which rows are ignored.) With this change, such statements produce a warning in the log when using statement-based mode and are logged using the row-based format when using MIXED mode. (Bug #11758262, Bug #50439) See Section 16.1.2.3, "Determination of Safe and Unsafe Statements in Binary Logging", for more information.

If you access a column from the table to be updated in an expression, UPDATE uses the current value of the column. For example, the following statement sets col1 to one more than its current value:

UPDATE t1 SET col1 = col1 + 1;

The second assignment in the following statement sets col2 to the current (updated) col1 value, not the original col1 value. The result is that col1 and col2 have the same value. This behavior differs from standard SQL.

UPDATE t1 SET col1 = col1 + 1, col2 = col1;

Single-table UPDATE assignments are generally evaluated from left to right. For multiple-table updates, there is no guarantee that assignments are carried out in any particular order.

If you set a column to the value it currently has, MySQL notices this and does not update it.

If you update a column that has been declared NOT NULL by setting to NULL, an error occurs if strict SQL mode is enabled; otherwise, the column is set to the implicit default value for the column data type and the warning count is incremented. The implicit default value is 0 for numeric types, the empty string ('') for string types, and the "zero" value for date and time types. See Section 11.5, "Data Type Default Values".

UPDATE returns the number of rows that were actually changed. The mysql_info() C API function returns the number of rows that were matched and updated and the number of warnings that occurred during the UPDATE.

You can use LIMIT row_count to restrict the scope of the UPDATE. A LIMIT clause is a rows-matched restriction. The statement stops as soon as it has found row_count rows that satisfy the WHERE clause, whether or not they actually were changed.

If an UPDATE statement includes an ORDER BY clause, the rows are updated in the order specified by the clause. This can be useful in certain situations that might otherwise result in an error. Suppose that a table t contains a column id that has a unique index. The following statement could fail with a duplicate-key error, depending on the order in which rows are updated:

UPDATE t SET id = id + 1;

For example, if the table contains 1 and 2 in the id column and 1 is updated to 2 before 2 is updated to 3, an error occurs. To avoid this problem, add an ORDER BY clause to cause the rows with larger id values to be updated before those with smaller values:

UPDATE t SET id = id + 1 ORDER BY id DESC;

You can also perform UPDATE operations covering multiple tables. However, you cannot use ORDER BY or LIMIT with a multiple-table UPDATE. The table_references clause lists the tables involved in the join. Its syntax is described in Section 13.2.9.2, "JOIN Syntax". Here is an example:

UPDATE items,month SET items.price=month.priceWHERE items.id=month.id;

The preceding example shows an inner join that uses the comma operator, but multiple-table UPDATE statements can use any type of join permitted in SELECT statements, such as LEFT JOIN.

If you use a multiple-table UPDATE statement involving InnoDB tables for which there are foreign key constraints, the MySQL optimizer might process tables in an order that differs from that of their parent/child relationship. In this case, the statement fails and rolls back. Instead, update a single table and rely on the ON UPDATE capabilities that InnoDB provides to cause the other tables to be modified accordingly. See Section 14.3.5.4, "FOREIGN KEY Constraints".

Currently, you cannot update a table and select from the same table in a subquery.

Index hints (see Section 13.2.9.3, "Index Hint Syntax") are accepted but ignored for UPDATE statements.

An UPDATE on a partitioned table using a storage engine such as MyISAM that employs table-level locks locks all partitions of the table. This does not occur with tables using storage engines such as InnoDB that employ row-level locking. This issue is resolved in MySQL 5.6. See Section 18.5.4, "Partitioning and Table-Level Locking", for more information.

Copyright © 1997, 2013, Oracle and/or its affiliates. All rights reserved. Legal Notices
(Sebelumnya) 13. SQL Statement Syntax13.3. MySQL Transactional and ... (Berikutnya)