Cari di MySQL 
    MySQL Manual
Daftar Isi
(Sebelumnya) 12.5. String Functions12.17. Spatial Extensions (Berikutnya)

12.9. Full-Text Search Functions

MATCH (col1,col2,...) AGAINST (expr [search_modifier])

search_modifier:  {   IN NATURAL LANGUAGE MODE | IN NATURAL LANGUAGE MODE WITH QUERY EXPANSION | IN BOOLEAN MODE | WITH QUERY EXPANSION  }

MySQL has support for full-text indexing and searching:

  • A full-text index in MySQL is an index of type FULLTEXT.

  • Full-text indexes can be used only with MyISAM tables, and can be created only for CHAR, VARCHAR, or TEXT columns.

  • A FULLTEXT index definition can be given in the CREATE TABLE statement when a table is created, or added later using ALTER TABLE or CREATE INDEX.

  • For large data sets, it is much faster to load your data into a table that has no FULLTEXT index and then create the index after that, than to load data into a table that has an existing FULLTEXT index.

Full-text searching is performed using MATCH() ... AGAINST syntax. MATCH() takes a comma-separated list that names the columns to be searched. AGAINST takes a string to search for, and an optional modifier that indicates what type of search to perform. The search string must be a string value that is constant during query evaluation. This rules out, for example, a table column because that can differ for each row.

There are three types of full-text searches:

  • A natural language search interprets the search string as a phrase in natural human language (a phrase in free text). There are no special operators. The stopword list applies. In addition, words that are present in 50% or more of the rows are considered common and do not match.

    Full-text searches are natural language searches if the IN NATURAL LANGUAGE MODE modifier is given or if no modifier is given. For more information, see Section 12.9.1, "Natural Language Full-Text Searches".

  • A boolean search interprets the search string using the rules of a special query language. The string contains the words to search for. It can also contain operators that specify requirements such that a word must be present or absent in matching rows, or that it should be weighted higher or lower than usual. Common words such as "some" or "then" are stopwords and do not match if present in the search string. The IN BOOLEAN MODE modifier specifies a boolean search. For more information, see Section 12.9.2, "Boolean Full-Text Searches".

  • A query expansion search is a modification of a natural language search. The search string is used to perform a natural language search. Then words from the most relevant rows returned by the search are added to the search string and the search is done again. The query returns the rows from the second search. The IN NATURAL LANGUAGE MODE WITH QUERY EXPANSION or WITH QUERY EXPANSION modifier specifies a query expansion search. For more information, see Section 12.9.3, "Full-Text Searches with Query Expansion".

Constraints on full-text searching are listed in Section 12.9.5, "Full-Text Restrictions".

The myisam_ftdump utility can be used to dump the contents of a full-text index. This may be helpful for debugging full-text queries. See Section 4.6.2, "myisam_ftdump - Display Full-Text Index information".

12.9.1. Natural Language Full-Text Searches

By default or with the IN NATURAL LANGUAGE MODE modifier, the MATCH() function performs a natural language search for a string against a text collection. A collection is a set of one or more columns included in a FULLTEXT index. The search string is given as the argument to AGAINST(). For each row in the table, MATCH() returns a relevance value; that is, a similarity measure between the search string and the text in that row in the columns named in the MATCH() list.

mysql> CREATE TABLE articles ( ->   id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY, ->   title VARCHAR(200), ->   body TEXT, ->   FULLTEXT (title,body) -> ) ENGINE=MyISAM;Query OK, 0 rows affected (0.00 sec)mysql> INSERT INTO articles (title,body) VALUES -> ('MySQL Tutorial','DBMS stands for DataBase ...'), -> ('How To Use MySQL Well','After you went through a ...'), -> ('Optimizing MySQL','In this tutorial we will show ...'), -> ('1001 MySQL Tricks','1. Never run mysqld as root. 2. ...'), -> ('MySQL vs. YourSQL','In the following database comparison ...'), -> ('MySQL Security','When configured properly, MySQL ...');Query OK, 6 rows affected (0.00 sec)Records: 6  Duplicates: 0  Warnings: 0mysql> SELECT * FROM articles -> WHERE MATCH (title,body) -> AGAINST ('database' IN NATURAL LANGUAGE MODE);+----+-------------------+------------------------------------------+| id | title | body |+----+-------------------+------------------------------------------+|  5 | MySQL vs. YourSQL | In the following database comparison ... ||  1 | MySQL Tutorial | DBMS stands for DataBase ... |+----+-------------------+------------------------------------------+2 rows in set (0.00 sec)

By default, the search is performed in case-insensitive fashion. However, you can perform a case-sensitive full-text search by using a binary collation for the indexed columns. For example, a column that uses the latin1 character set of can be assigned a collation of latin1_bin to make it case sensitive for full-text searches.

When MATCH() is used in a WHERE clause, as in the example shown earlier, the rows returned are automatically sorted with the highest relevance first. Relevance values are nonnegative floating-point numbers. Zero relevance means no similarity. Relevance is computed based on the number of words in the row, the number of unique words in that row, the total number of words in the collection, and the number of documents (rows) that contain a particular word.

To simply count matches, you could use a query like this:

mysql> SELECT COUNT(*) FROM articles -> WHERE MATCH (title,body) -> AGAINST ('database' IN NATURAL LANGUAGE MODE);+----------+| COUNT(*) |+----------+| 2 |+----------+1 row in set (0.00 sec)

However, you might find it quicker to rewrite the query as follows:

mysql> SELECT -> COUNT(IF(MATCH (title,body) AGAINST ('database'   IN NATURAL LANGUAGE MODE), 1, NULL)) -> AS count -> FROM articles;+-------+| count |+-------+| 2 |+-------+1 row in set (0.03 sec)

The first query sorts the results by relevance whereas the second does not. However, the second query performs a full table scan and the first does not. The first may be faster if the search matches few rows; otherwise, the second may be faster because it would read many rows anyway.

For natural-language full-text searches, it is a requirement that the columns named in the MATCH() function be the same columns included in some FULLTEXT index in your table. For the preceding query, note that the columns named in the MATCH() function (title and body) are the same as those named in the definition of the article table's FULLTEXT index. If you wanted to search the title or body separately, you would need to create separate FULLTEXT indexes for each column.

It is also possible to perform a boolean search or a search with query expansion. These search types are described in Section 12.9.2, "Boolean Full-Text Searches", and Section 12.9.3, "Full-Text Searches with Query Expansion".

A full-text search that uses an index can name columns only from a single table in the MATCH() clause because an index cannot span multiple tables. A boolean search can be done in the absence of an index (albeit more slowly), in which case it is possible to name columns from multiple tables.

The preceding example is a basic illustration that shows how to use the MATCH() function where rows are returned in order of decreasing relevance. The next example shows how to retrieve the relevance values explicitly. Returned rows are not ordered because the SELECT statement includes neither WHERE nor ORDER BY clauses:

mysql> SELECT id, MATCH (title,body) -> AGAINST ('Tutorial' IN NATURAL LANGUAGE MODE) AS score -> FROM articles;+----+------------------+| id | score |+----+------------------+|  1 | 0.65545833110809 ||  2 | 0 ||  3 | 0.66266459226608 ||  4 | 0 ||  5 | 0 ||  6 | 0 |+----+------------------+6 rows in set (0.00 sec)

The following example is more complex. The query returns the relevance values and it also sorts the rows in order of decreasing relevance. To achieve this result, you should specify MATCH() twice: once in the SELECT list and once in the WHERE clause. This causes no additional overhead, because the MySQL optimizer notices that the two MATCH() calls are identical and invokes the full-text search code only once.

mysql> SELECT id, body, MATCH (title,body) AGAINST -> ('Security implications of running MySQL as root' -> IN NATURAL LANGUAGE MODE) AS score -> FROM articles WHERE MATCH (title,body) AGAINST -> ('Security implications of running MySQL as root' -> IN NATURAL LANGUAGE MODE);+----+-------------------------------------+-----------------+| id | body | score   |+----+-------------------------------------+-----------------+|  4 | 1. Never run mysqld as root. 2. ... | 1.5219271183014 ||  6 | When configured properly, MySQL ... | 1.3114095926285 |+----+-------------------------------------+-----------------+2 rows in set (0.00 sec)

The MySQL FULLTEXT implementation regards any sequence of true word characters (letters, digits, and underscores) as a word. That sequence may also contain apostrophes ("'"), but not more than one in a row. This means that aaa'bbb is regarded as one word, but aaa''bbb is regarded as two words. Apostrophes at the beginning or the end of a word are stripped by the FULLTEXT parser; 'aaa'bbb' would be parsed as aaa'bbb.

The FULLTEXT parser determines where words start and end by looking for certain delimiter characters; for example, " " (space), "," (comma), and "." (period). If words are not separated by delimiters (as in, for example, Chinese), the FULLTEXT parser cannot determine where a word begins or ends. To be able to add words or other indexed terms in such languages to a FULLTEXT index, you must preprocess them so that they are separated by some arbitrary delimiter such as """.

In MySQL 5.5, it is possible to write a plugin that replaces the built-in full-text parser. For details, see Section 23.2, "The MySQL Plugin API". For example parser plugin source code, see the plugin/fulltext directory of a MySQL source distribution.

Some words are ignored in full-text searches:

  • Any word that is too short is ignored. The default minimum length of words that are found by full-text searches is four characters.

  • Words in the stopword list are ignored. A stopword is a word such as "the" or "some" that is so common that it is considered to have zero semantic value. There is a built-in stopword list, but it can be overwritten by a user-defined list.

The default stopword list is given in Section 12.9.4, "Full-Text Stopwords". The default minimum word length and stopword list can be changed as described in Section 12.9.6, "Fine-Tuning MySQL Full-Text Search".

Every correct word in the collection and in the query is weighted according to its significance in the collection or query. Consequently, a word that is present in many documents has a lower weight (and may even have a zero weight), because it has lower semantic value in this particular collection. Conversely, if the word is rare, it receives a higher weight. The weights of the words are combined to compute the relevance of the row.

Such a technique works best with large collections (in fact, it was carefully tuned this way). For very small tables, word distribution does not adequately reflect their semantic value, and this model may sometimes produce bizarre results. For example, although the word "MySQL" is present in every row of the articles table shown earlier, a search for the word produces no results:

mysql> SELECT * FROM articles -> WHERE MATCH (title,body) -> AGAINST ('MySQL' IN NATURAL LANGUAGE MODE);Empty set (0.00 sec)

The search result is empty because the word "MySQL" is present in at least 50% of the rows. As such, it is effectively treated as a stopword. For large data sets, this is the most desirable behavior: A natural language query should not return every second row from a 1GB table. For small data sets, it may be less desirable.

A word that matches half of the rows in a table is less likely to locate relevant documents. In fact, it most likely finds plenty of irrelevant documents. We all know this happens far too often when we are trying to find something on the Internet with a search engine. It is with this reasoning that rows containing the word are assigned a low semantic value for the particular data set in which they occur. A given word may reach the 50% threshold in one data set but not another.

The 50% threshold has a significant implication when you first try full-text searching to see how it works: If you create a table and insert only one or two rows of text into it, every word in the text occurs in at least 50% of the rows. As a result, no search returns any results. Be sure to insert at least three rows, and preferably many more. Users who need to bypass the 50% limitation can use the boolean search mode; see Section 12.9.2, "Boolean Full-Text Searches".

12.9.2. Boolean Full-Text Searches

MySQL can perform boolean full-text searches using the IN BOOLEAN MODE modifier. With this modifier, certain characters have special meaning at the beginning or end of words in the search string. In the following query, the + and - operators indicate that a word is required to be present or absent, respectively, for a match to occur. Thus, the query retrieves all the rows that contain the word "MySQL" but that do not contain the word "YourSQL":

mysql> SELECT * FROM articles WHERE MATCH (title,body) -> AGAINST ('+MySQL -YourSQL' IN BOOLEAN MODE);+----+-----------------------+-------------------------------------+| id | title | body |+----+-----------------------+-------------------------------------+|  1 | MySQL Tutorial | DBMS stands for DataBase ... ||  2 | How To Use MySQL Well | After you went through a ... ||  3 | Optimizing MySQL  | In this tutorial we will show ...   ||  4 | 1001 MySQL Tricks | 1. Never run mysqld as root. 2. ... ||  6 | MySQL Security | When configured properly, MySQL ... |+----+-----------------------+-------------------------------------+
Note

In implementing this feature, MySQL uses what is sometimes referred to as implied Boolean logic, in which

  • + stands for AND

  • - stands for NOT

  • [no operator] implies OR

Boolean full-text searches have these characteristics:

  • They do not use the 50% threshold.

  • They do not automatically sort rows in order of decreasing relevance. You can see this from the preceding query result: The row with the highest relevance is the one that contains "MySQL" twice, but it is listed last, not first.

  • They can work even without a FULLTEXT index, although a search executed in this fashion would be quite slow.

  • The minimum and maximum word length full-text parameters apply.

  • The stopword list applies.

The boolean full-text search capability supports the following operators:

  • +

    A leading plus sign indicates that this word must be present in each row that is returned.

  • -

    A leading minus sign indicates that this word must not be present in any of the rows that are returned.

    Note: The - operator acts only to exclude rows that are otherwise matched by other search terms. Thus, a boolean-mode search that contains only terms preceded by - returns an empty result. It does not return "all rows except those containing any of the excluded terms."

  • (no operator)

    By default (when neither + nor - is specified) the word is optional, but the rows that contain it are rated higher. This mimics the behavior of MATCH() ... AGAINST() without the IN BOOLEAN MODE modifier.

  • > <

    These two operators are used to change a word's contribution to the relevance value that is assigned to a row. The > operator increases the contribution and the < operator decreases it. See the example following this list.

  • ( )

    Parentheses group words into subexpressions. Parenthesized groups can be nested.

  • ~

    A leading tilde acts as a negation operator, causing the word's contribution to the row's relevance to be negative. This is useful for marking "noise" words. A row containing such a word is rated lower than others, but is not excluded altogether, as it would be with the - operator.

  • *

    The asterisk serves as the truncation (or wildcard) operator. Unlike the other operators, it should be appended to the word to be affected. Words match if they begin with the word preceding the * operator.

    If a word is specified with the truncation operator, it is not stripped from a boolean query, even if it is too short (as determined from the ft_min_word_len setting) or a stopword. This occurs because the word is not seen as too short or a stopword, but as a prefix that must be present in the document in the form of a word that begins with the prefix. Suppose that ft_min_word_len=4. Then a search for '+word +the*' will likely return fewer rows than a search for '+word +the':

    • The former query remains as is and requires both word and the* (a word starting with the) to be present in the document.

    • The latter query is transformed to +word (requiring only word to be present). the is both too short and a stopword, and either condition is enough to cause it to be ignored.

  • "

    A phrase that is enclosed within double quote (""") characters matches only rows that contain the phrase literally, as it was typed. The full-text engine splits the phrase into words and performs a search in the FULLTEXT index for the words. Nonword characters need not be matched exactly: Phrase searching requires only that matches contain exactly the same words as the phrase and in the same order. For example, "test phrase" matches "test, phrase".

    If the phrase contains no words that are in the index, the result is empty. For example, if all words are either stopwords or shorter than the minimum length of indexed words, the result is empty.

The following examples demonstrate some search strings that use boolean full-text operators:

  • 'apple banana'

    Find rows that contain at least one of the two words.

  • '+apple +juice'

    Find rows that contain both words.

  • '+apple macintosh'

    Find rows that contain the word "apple", but rank rows higher if they also contain "macintosh".

  • '+apple -macintosh'

    Find rows that contain the word "apple" but not "macintosh".

  • '+apple ~macintosh'

    Find rows that contain the word "apple", but if the row also contains the word "macintosh", rate it lower than if row does not. This is "softer" than a search for '+apple -macintosh', for which the presence of "macintosh" causes the row not to be returned at all.

  • '+apple +(>turnover <strudel)'

    Find rows that contain the words "apple" and "turnover", or "apple" and "strudel" (in any order), but rank "apple turnover" higher than "apple strudel".

  • 'apple*'

    Find rows that contain words such as "apple", "apples", "applesauce", or "applet".

  • '"some words"'

    Find rows that contain the exact phrase "some words" (for example, rows that contain "some words of wisdom" but not "some noise words"). Note that the """ characters that enclose the phrase are operator characters that delimit the phrase. They are not the quotation marks that enclose the search string itself.

12.9.3. Full-Text Searches with Query Expansion

Full-text search supports query expansion (and in particular, its variant "blind query expansion"). This is generally useful when a search phrase is too short, which often means that the user is relying on implied knowledge that the full-text search engine lacks. For example, a user searching for "database" may really mean that "MySQL", "Oracle", "DB2", and "RDBMS" all are phrases that should match "databases" and should be returned, too. This is implied knowledge.

Blind query expansion (also known as automatic relevance feedback) is enabled by adding WITH QUERY EXPANSION or IN NATURAL LANGUAGE MODE WITH QUERY EXPANSION following the search phrase. It works by performing the search twice, where the search phrase for the second search is the original search phrase concatenated with the few most highly relevant documents from the first search. Thus, if one of these documents contains the word "databases" and the word "MySQL", the second search finds the documents that contain the word "MySQL" even if they do not contain the word "database". The following example shows this difference:

mysql> SELECT * FROM articles -> WHERE MATCH (title,body) -> AGAINST ('database' IN NATURAL LANGUAGE MODE);+----+-------------------+------------------------------------------+| id | title | body |+----+-------------------+------------------------------------------+|  5 | MySQL vs. YourSQL | In the following database comparison ... ||  1 | MySQL Tutorial | DBMS stands for DataBase ... |+----+-------------------+------------------------------------------+2 rows in set (0.00 sec)mysql> SELECT * FROM articles -> WHERE MATCH (title,body) -> AGAINST ('database' WITH QUERY EXPANSION);+----+-------------------+------------------------------------------+| id | title | body |+----+-------------------+------------------------------------------+|  1 | MySQL Tutorial | DBMS stands for DataBase ... ||  5 | MySQL vs. YourSQL | In the following database comparison ... ||  3 | Optimizing MySQL  | In this tutorial we will show ... |+----+-------------------+------------------------------------------+3 rows in set (0.00 sec)

Another example could be searching for books by Georges Simenon about Maigret, when a user is not sure how to spell "Maigret". A search for "Megre and the reluctant witnesses" finds only "Maigret and the Reluctant Witnesses" without query expansion. A search with query expansion finds all books with the word "Maigret" on the second pass.

Note

Because blind query expansion tends to increase noise significantly by returning nonrelevant documents, it is meaningful to use only when a search phrase is rather short.

12.9.4. Full-Text Stopwords

The stopword list is loaded and searched for full-text queries using the server character set and collation (the values of the character_set_server and collation_server system variables). False hits or misses may occur for stopword lookups if the stopword file or columns used for full-text indexing or searches have a character set or collation different from character_set_server or collation_server.

Case sensitivity of stopword lookups depends on the server collation. For example, lookups are case insensitive if the collation is latin1_swedish_ci, whereas lookups are case sensitive if the collation is latin1_general_cs or latin1_bin.

As of MySQL 5.5.6, the stopword file is loaded and searched using latin1 if character_set_server is ucs2, utf16, or utf32. If any table was created with FULLTEXT indexes while the server character set was ucs2, utf16, or utf32, it should be repaired using this statement:

REPAIR TABLE tbl_name QUICK;

The following table shows the default list of full-text stopwords. In a MySQL source distribution, you can find this list in the storage/myisam/ft_static.c file.

a'sableaboutaboveaccording
accordinglyacrossactuallyafterafterwards
againagainstain'tallallow
allowsalmostalonealongalready
alsoalthoughalwaysamamong
amongstanandanotherany
anybodyanyhowanyoneanythinganyway
anywaysanywhereapartappearappreciate
appropriatearearen'taroundas
asideaskaskingassociatedat
availableawayawfullybebecame
becausebecomebecomesbecomingbeen
beforebeforehandbehindbeingbelieve
belowbesidebesidesbestbetter
betweenbeyondbothbriefbut
byc'monc'scamecan
can'tcannotcantcausecauses
certaincertainlychangesclearlyco
comcomecomesconcerningconsequently
considerconsideringcontaincontainingcontains
correspondingcouldcouldn'tcoursecurrently
definitelydescribeddespitediddidn't
differentdodoesdoesn'tdoing
don'tdonedowndownwardsduring
eacheduegeighteither
elseelsewhereenoughentirelyespecially
etetceveneverevery
everybodyeveryoneeverythingeverywhereex
exactlyexampleexceptfarfew
fifthfirstfivefollowedfollowing
followsforformerformerlyforth
fourfromfurtherfurthermoreget
getsgettinggivengivesgo
goesgoinggonegotgotten
greetingshadhadn'thappenshardly
hashasn'thavehaven'thaving
hehe'shellohelphence
herherehere'shereafterhereby
hereinhereuponhersherselfhi
himhimselfhishitherhopefully
howhowbeithoweveri'di'll
i'mi'veieifignored
immediateininasmuchincindeed
indicateindicatedindicatesinnerinsofar
insteadintoinwardisisn't
itit'dit'llit'sits
itselfjustkeepkeepskept
knowknownknowslastlately
laterlatterlatterlyleastless
lestletlet'slikeliked
likelylittlelooklookinglooks
ltdmainlymanymaymaybe
memeanmeanwhilemerelymight
moremoreovermostmostlymuch
mustmymyselfnamenamely
ndnearnearlynecessaryneed
needsneitherneverneverthelessnew
nextninenonobodynon
nonenoonenornormallynot
nothingnovelnownowhereobviously
ofoffoftenohok
okayoldononceone
onesonlyontoorother
othersotherwiseoughtourours
ourselvesoutoutsideoveroverall
ownparticularparticularlyperperhaps
placedpleasepluspossiblepresumably
probablyprovidesquequiteqv
ratherrdrereallyreasonably
regardingregardlessregardsrelativelyrespectively
rightsaidsamesawsay
sayingsayssecondsecondlysee
seeingseemseemedseemingseems
seenselfselvessensiblesent
seriousseriouslysevenseveralshall
sheshouldshouldn'tsincesix
sosomesomebodysomehowsomeone
somethingsometimesometimessomewhatsomewhere
soonsorryspecifiedspecifyspecifying
stillsubsuchsupsure
t'staketakentelltends
ththanthankthanksthanx
thatthat'sthatsthetheir
theirsthemthemselvesthenthence
therethere'sthereaftertherebytherefore
thereintheresthereuponthesethey
they'dthey'llthey'rethey'vethink
thirdthisthoroughthoroughlythose
thoughthreethroughthroughoutthru
thustotogethertootook
towardtowardstriedtriestruly
trytryingtwicetwoun
underunfortunatelyunlessunlikelyuntil
untoupuponususe
usedusefulusesusingusually
valuevariousveryviaviz
vswantwantswaswasn't
waywewe'dwe'llwe're
we'vewelcomewellwentwere
weren'twhatwhat'swhateverwhen
whencewheneverwherewhere'swhereafter
whereaswherebywhereinwhereuponwherever
whetherwhichwhilewhitherwho
who'swhoeverwholewhomwhose
whywillwillingwishwith
withinwithoutwon'twonderwould
wouldn'tyesyetyouyou'd
you'llyou'reyou'veyouryours
yourselfyourselveszero  

12.9.5. Full-Text Restrictions

  • Full-text searches are supported for MyISAM tables only.

  • Full-text searches are not supported for partitioned tables. See Section 18.5, "Restrictions and Limitations on Partitioning".

  • Full-text searches can be used with most multi-byte character sets. The exception is that for Unicode, the utf8 character set can be used, but not the ucs2 character set. However, although FULLTEXT indexes on ucs2 columns cannot be used, you can perform IN BOOLEAN MODE searches on a ucs2 column that has no such index.

    The remarks for utf8 also apply to utf8mb4, and the remarks for ucs2 also apply to utf16 and utf32.

  • Ideographic languages such as Chinese and Japanese do not have word delimiters. Therefore, the FULLTEXT parser cannot determine where words begin and end in these and other such languages. The implications of this and some workarounds for the problem are described in Section 12.9, "Full-Text Search Functions".

  • Although the use of multiple character sets within a single table is supported, all columns in a FULLTEXT index must use the same character set and collation.

  • The MATCH() column list must match exactly the column list in some FULLTEXT index definition for the table, unless this MATCH() is IN BOOLEAN MODE. Boolean-mode searches can be done on nonindexed columns, although they are likely to be slow.

  • The argument to AGAINST() must be a string value that is constant during query evaluation. This rules out, for example, a table column because that can differ for each row.

  • Index hints are more limited for FULLTEXT searches than for non-FULLTEXT searches. See Section 13.2.9.3, "Index Hint Syntax".

12.9.6. Fine-Tuning MySQL Full-Text Search

MySQL's full-text search capability has few user-tunable parameters. You can exert more control over full-text searching behavior if you have a MySQL source distribution because some changes require source code modifications. See Section 2.10, "Installing MySQL from Source".

Note that full-text search is carefully tuned for the most effectiveness. Modifying the default behavior in most cases can actually decrease effectiveness. Do not alter the MySQL sources unless you know what you are doing.

Most full-text variables described in this section must be set at server startup time. A server restart is required to change them; they cannot be modified while the server is running.

Some variable changes require that you rebuild the FULLTEXT indexes in your tables. Instructions for doing so are given later in this section.

  • The minimum and maximum lengths of words to be indexed are defined by the ft_min_word_len and ft_max_word_len system variables. (See Section 5.1.4, "Server System Variables".) The default minimum value is four characters; the default maximum is version dependent. If you change either value, you must rebuild your FULLTEXT indexes. For example, if you want three-character words to be searchable, you can set the ft_min_word_len variable by putting the following lines in an option file:

    [mysqld]ft_min_word_len=3

    Then restart the server and rebuild your FULLTEXT indexes. Note particularly the remarks regarding myisamchk in the instructions following this list.

  • To override the default stopword list, set the ft_stopword_file system variable. (See Section 5.1.4, "Server System Variables".) The variable value should be the path name of the file containing the stopword list, or the empty string to disable stopword filtering. The server looks for the file in the data directory unless an absolute path name is given to specify a different directory. After changing the value of this variable or the contents of the stopword file, restart the server and rebuild your FULLTEXT indexes.

    The stopword list is free-form. That is, you may use any nonalphanumeric character such as newline, space, or comma to separate stopwords. Exceptions are the underscore character ("_") and a single apostrophe ("'") which are treated as part of a word. The character set of the stopword list is the server's default character set; see Section 10.1.3.1, "Server Character Set and Collation".

  • The 50% threshold for natural language searches is determined by the particular weighting scheme chosen. To disable it, look for the following line in storage/myisam/ftdefs.h:

    #define GWS_IN_USE GWS_PROB

    Change that line to this:

    #define GWS_IN_USE GWS_FREQ

    Then recompile MySQL. There is no need to rebuild the indexes in this case.

    Note

    By making this change, you severely decrease MySQL's ability to provide adequate relevance values for the MATCH() function. If you really need to search for such common words, it would be better to search using IN BOOLEAN MODE instead, which does not observe the 50% threshold.

  • To change the operators used for boolean full-text searches, set the ft_boolean_syntax system variable. This variable can be changed while the server is running, but you must have the SUPER privilege to do so. No rebuilding of indexes is necessary in this case. See Section 5.1.4, "Server System Variables", which describes the rules governing how to set this variable.

  • If you want to change the set of characters that are considered word characters, you can do so in several ways, as described in the following list. After making the modification, you must rebuild the indexes for each table that contains any FULLTEXT indexes. Suppose that you want to treat the hyphen character ('-') as a word character. Use one of these methods:

    • Modify the MySQL source: In storage/myisam/ftdefs.h, see the true_word_char() and misc_word_char() macros. Add '-' to one of those macros and recompile MySQL.

    • Modify a character set file: This requires no recompilation. The true_word_char() macro uses a "character type" table to distinguish letters and numbers from other characters. . You can edit the contents of the <ctype><map> array in one of the character set XML files to specify that '-' is a "letter." Then use the given character set for your FULLTEXT indexes. For information about the <ctype><map> array format, see Section 10.3.1, "Character Definition Arrays".

    • Add a new collation for the character set used by the indexed columns, and alter the columns to use that collation. For general information about adding collations, see Section 10.4, "Adding a Collation to a Character Set". For an example specific to full-text indexing, see Section 12.9.7, "Adding a Collation for Full-Text Indexing".

If you modify full-text variables that affect indexing (ft_min_word_len, ft_max_word_len, or ft_stopword_file), or if you change the stopword file itself, you must rebuild your FULLTEXT indexes after making the changes and restarting the server. To rebuild the indexes in this case, it is sufficient to do a QUICK repair operation:

mysql> REPAIR TABLE tbl_name QUICK;

Alternatively, use ALTER TABLE with the DROP INDEX and ADD INDEX options to drop and re-create each FULLTEXT index. In some cases, this may be faster than a repair operation.

Each table that contains any FULLTEXT index must be repaired as just shown. Otherwise, queries for the table may yield incorrect results, and modifications to the table will cause the server to see the table as corrupt and in need of repair.

Note that if you use myisamchk to perform an operation that modifies table indexes (such as repair or analyze), the FULLTEXT indexes are rebuilt using the default full-text parameter values for minimum word length, maximum word length, and stopword file unless you specify otherwise. This can result in queries failing.

The problem occurs because these parameters are known only by the server. They are not stored in MyISAM index files. To avoid the problem if you have modified the minimum or maximum word length or stopword file values used by the server, specify the same ft_min_word_len, ft_max_word_len, and ft_stopword_file values for myisamchk that you use for mysqld. For example, if you have set the minimum word length to 3, you can repair a table with myisamchk like this:

shell> myisamchk --recover --ft_min_word_len=3 tbl_name.MYI

To ensure that myisamchk and the server use the same values for full-text parameters, place each one in both the [mysqld] and [myisamchk] sections of an option file:

[mysqld]ft_min_word_len=3[myisamchk]ft_min_word_len=3

An alternative to using myisamchk for index modification is to use the REPAIR TABLE, ANALYZE TABLE, OPTIMIZE TABLE, or ALTER TABLE statements. These statements are performed by the server, which knows the proper full-text parameter values to use.

12.9.7. Adding a Collation for Full-Text Indexing

This section describes how to add a new collation for full-text searches. The sample collation is like latin1_swedish_ci but treats the '-' character as a letter rather than as a punctuation character so that it can be indexed as a word character. General information about adding collations is given in Section 10.4, "Adding a Collation to a Character Set"; it is assumed that you have read it and are familiar with the files involved.

To add a collation for full-text indexing, use this procedure:

  1. Add a collation to the Index.xml file. The collation ID must be unused, so choose a value different from 62 if that ID is already taken on your system.

    <charset name="latin1">...<collation name="latin1_fulltext_ci" id="62"/></charset>
  2. Declare the sort order for the collation in the latin1.xml file. In this case, the order can be copied from latin1_swedish_ci:

    <collation name="latin1_fulltext_ci"><map>00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F60 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F50 51 52 53 54 55 56 57 58 59 5A 7B 7C 7D 7E 7F80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9FA0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AFB0 B1 B2 B3 B4 B5 B6 B7 B8 B9 BA BB BC BD BE BF41 41 41 41 5C 5B 5C 43 45 45 45 45 49 49 49 4944 4E 4F 4F 4F 4F 5D D7 D8 55 55 55 59 59 DE DF41 41 41 41 5C 5B 5C 43 45 45 45 45 49 49 49 4944 4E 4F 4F 4F 4F 5D F7 D8 55 55 55 59 59 DE FF</map></collation>
  3. Modify the ctype array in latin1.xml. Change the value corresponding to 0x2D (which is the code for the '-' character) from 10 (punctuation) to 01 (small letter). In the following array, this is the element in the fourth row down, third value from the end.

    <ctype><map>0020 20 20 20 20 20 20 20 20 28 28 28 28 28 20 2020 20 20 20 20 20 20 20 20 20 20 20 20 20 20 2048 10 10 10 10 10 10 10 10 10 10 10 10 01 10 1084 84 84 84 84 84 84 84 84 84 10 10 10 10 10 1010 81 81 81 81 81 81 01 01 01 01 01 01 01 01 0101 01 01 01 01 01 01 01 01 01 01 10 10 10 10 1010 82 82 82 82 82 82 02 02 02 02 02 02 02 02 0202 02 02 02 02 02 02 02 02 02 02 10 10 10 10 2010 00 10 02 10 10 10 10 10 10 01 10 01 00 01 0000 10 10 10 10 10 10 10 10 10 02 10 02 00 02 0148 10 10 10 10 10 10 10 10 10 10 10 10 10 10 1010 10 10 10 10 10 10 10 10 10 10 10 10 10 10 1001 01 01 01 01 01 01 01 01 01 01 01 01 01 01 0101 01 01 01 01 01 01 10 01 01 01 01 01 01 01 0202 02 02 02 02 02 02 02 02 02 02 02 02 02 02 0202 02 02 02 02 02 02 10 02 02 02 02 02 02 02 02</map></ctype>
  4. Restart the server.

  5. To employ the new collation, include it in the definition of columns that are to use it:

    mysql> DROP TABLE IF EXISTS t1;Query OK, 0 rows affected (0.13 sec)mysql> CREATE TABLE t1 ( -> a TEXT CHARACTER SET latin1 COLLATE latin1_fulltext_ci, -> FULLTEXT INDEX(a) -> ) ENGINE=MyISAM;Query OK, 0 rows affected (0.47 sec)
  6. Test the collation to verify that hyphen is considered as a word character:

    mysql> INSERT INTO t1 VALUEs ('----'),('....'),('abcd');Query OK, 3 rows affected (0.22 sec)Records: 3  Duplicates: 0  Warnings: 0mysql> SELECT * FROM t1 WHERE MATCH a AGAINST ('----' IN BOOLEAN MODE);+------+| a |+------+| ---- |+------+1 row in set (0.00 sec)

12.10. Cast Functions and Operators

Table 12.14. Cast Functions

NameDescription
BINARYCast a string to a binary string
CAST()Cast a value as a certain type
CONVERT()Cast a value as a certain type

  • BINARY

    The BINARY operator casts the string following it to a binary string. This is an easy way to force a column comparison to be done byte by byte rather than character by character. This causes the comparison to be case sensitive even if the column is not defined as BINARY or BLOB. BINARY also causes trailing spaces to be significant.

    mysql> SELECT 'a' = 'A'; -> 1mysql> SELECT BINARY 'a' = 'A'; -> 0mysql> SELECT 'a' = 'a '; -> 1mysql> SELECT BINARY 'a' = 'a '; -> 0

    In a comparison, BINARY affects the entire operation; it can be given before either operand with the same result.

    BINARY str is shorthand for CAST(str AS BINARY).

    Note that in some contexts, if you cast an indexed column to BINARY, MySQL is not able to use the index efficiently.

  • CAST(expr AS type)

    The CAST() function takes an expression of any type and produces a result value of a specified type, similar to CONVERT(). See the description of CONVERT() for more information.

  • CONVERT(expr,type), CONVERT(expr USING transcoding_name)

    The CONVERT() and CAST() functions take an expression of any type and produce a result value of a specified type.

    The type for the result can be one of the following values:

    • BINARY[(N)]

    • CHAR[(N)]

    • DATE

    • DATETIME

    • DECIMAL[(M[,D])]

    • SIGNED [INTEGER]

    • TIME

    • UNSIGNED [INTEGER]

    BINARY produces a string with the BINARY data type. See Section 11.4.2, "The BINARY and VARBINARY Types" for a description of how this affects comparisons. If the optional length N is given, BINARY(N) causes the cast to use no more than N bytes of the argument. Values shorter than N bytes are padded with 0x00 bytes to a length of N.

    CHAR(N) causes the cast to use no more than N characters of the argument.

    CAST() and CONVERT(... USING ...) are standard SQL syntax. The non-USING form of CONVERT() is ODBC syntax.

    CONVERT() with USING is used to convert data between different character sets. In MySQL, transcoding names are the same as the corresponding character set names. For example, this statement converts the string 'abc' in the default character set to the corresponding string in the utf8 character set:

    SELECT CONVERT('abc' USING utf8);

Normally, you cannot compare a BLOB value or other binary string in case-insensitive fashion because binary strings have no character set, and thus no concept of lettercase. To perform a case-insensitive comparison, use the CONVERT() function to convert the value to a nonbinary string. Comparisons of the result use the string collation. For example, if the character set of the result has a case-insensitive collation, a LIKE operation is not case sensitive:

SELECT 'A' LIKE CONVERT(blob_col USING latin1) FROM tbl_name;

To use a different character set, substitute its name for latin1 in the preceding statement. To specify a particular collation for the converted string, use a COLLATE clause following the CONVERT() call, as described in Section 10.1.9.2, "CONVERT() and CAST()". For example, to use latin1_german1_ci:

SELECT 'A' LIKE CONVERT(blob_col USING latin1) COLLATE latin1_german1_ci  FROM tbl_name;

CONVERT() can be used more generally for comparing strings that are represented in different character sets.

LOWER() (and UPPER()) are ineffective when applied to binary strings (BINARY, VARBINARY, BLOB). To perform lettercase conversion, convert the string to a nonbinary string:

mysql> SET @str = BINARY 'New York';mysql> SELECT LOWER(@str), LOWER(CONVERT(@str USING latin1));+-------------+-----------------------------------+| LOWER(@str) | LOWER(CONVERT(@str USING latin1)) |+-------------+-----------------------------------+| New York | new york  |+-------------+-----------------------------------+

The cast functions are useful when you want to create a column with a specific type in a CREATE TABLE ... SELECT statement:

CREATE TABLE new_table SELECT CAST('2000-01-01' AS DATE);

The functions also can be useful for sorting ENUM columns in lexical order. Normally, sorting of ENUM columns occurs using the internal numeric values. Casting the values to CHAR results in a lexical sort:

SELECT enum_col FROM tbl_name ORDER BY CAST(enum_col AS CHAR);

CAST(str AS BINARY) is the same thing as BINARY str. CAST(expr AS CHAR) treats the expression as a string with the default character set.

CAST() also changes the result if you use it as part of a more complex expression such as CONCAT('Date: ',CAST(NOW() AS DATE)).

You should not use CAST() to extract data in different formats but instead use string functions like LEFT() or EXTRACT(). See Section 12.7, "Date and Time Functions".

To cast a string to a numeric value in numeric context, you normally do not have to do anything other than to use the string value as though it were a number:

mysql> SELECT 1+'1';   -> 2

If you use a string in an arithmetic operation, it is converted to a floating-point number during expression evaluation.

If you use a number in string context, the number automatically is converted to a string:

mysql> SELECT CONCAT('hello you ',2); -> 'hello you 2'

For information about implicit conversion of numbers to strings, see Section 12.2, "Type Conversion in Expression Evaluation".

MySQL supports arithmetic with both signed and unsigned 64-bit values. If you are using numeric operators (such as + or -) and one of the operands is an unsigned integer, the result is unsigned by default (see Section 12.6.1, "Arithmetic Operators"). You can override this by using the SIGNED or UNSIGNED cast operator to cast a value to a signed or unsigned 64-bit integer, respectively.

mysql> SELECT CAST(1-2 AS UNSIGNED) -> 18446744073709551615mysql> SELECT CAST(CAST(1-2 AS UNSIGNED) AS SIGNED); -> -1

If either operand is a floating-point value, the result is a floating-point value and is not affected by the preceding rule. (In this context, DECIMAL column values are regarded as floating-point values.)

mysql> SELECT CAST(1 AS UNSIGNED) - 2.0; -> -1.0

The SQL mode affects the result of conversion operations. Examples:

  • If you convert a "zero" date string to a date, CONVERT() and CAST() return NULL and produce a warning when the NO_ZERO_DATE SQL mode is enabled.

  • For integer subtraction, if the NO_UNSIGNED_SUBTRACTION SQL mode is enabled, the subtraction result is signed even if any operand is unsigned.

For more information, see Section 5.1.7, "Server SQL Modes".

12.11. XML Functions

Table 12.15. XML Functions

NameDescription
ExtractValue()Extracts a value from an XML string using XPath notation
UpdateXML()Return replaced XML fragment

This section discusses XML and related functionality in MySQL.

Note

It is possible to obtain XML-formatted output from MySQL in the mysql and mysqldump clients by invoking them with the --xml option. See Section 4.5.1, "mysql - The MySQL Command-Line Tool", and Section 4.5.4, "mysqldump - A Database Backup Program".

Two functions providing basic XPath 1.0 (XML Path Language, version 1.0) capabilities are available. Some basic information about XPath syntax and usage is provided later in this section; however, an in-depth discussion of these topics is beyond the scope of this Manual, and you should refer to the XML Path Language (XPath) 1.0 standard for definitive information. A useful resource for those new to XPath or who desire a refresher in the basics is the Zvon.org XPath Tutorial, which is available in several languages.

Note

These functions remain under development. We continue to improve these and other aspects of XML and XPath functionality in MySQL 5.5 and onwards. You may discuss these, ask questions about them, and obtain help from other users with them in the MySQL XML User Forum.

XPath expressions used with these functions support user variables and local stored program variables. User variables are weakly checked; variables local to stored programs are strongly checked (see also Bug #26518):

  • User variables (weak checking). Variables using the syntax $@variable_name (that is, user variables) are not checked. No warnings or errors are issued by the server if a variable has the wrong type or has previously not been assigned a value. This also means the user is fully responsible for any typographical errors, since no warnings will be given if (for example) $@myvairable is used where $@myvariable was intended.

    Example:

    mysql> SET @xml = '<a><b>X</b><b>Y</b></a>';Query OK, 0 rows affected (0.00 sec)mysql> SET @i =1, @j = 2;Query OK, 0 rows affected (0.00 sec)mysql> SELECT @i, ExtractValue(@xml, '//b[$@i]');+------+--------------------------------+| @i   | ExtractValue(@xml, '//b[$@i]') |+------+--------------------------------+| 1 | X  |+------+--------------------------------+1 row in set (0.00 sec)mysql> SELECT @j, ExtractValue(@xml, '//b[$@j]');+------+--------------------------------+| @j   | ExtractValue(@xml, '//b[$@j]') |+------+--------------------------------+| 2 | Y  |+------+--------------------------------+1 row in set (0.00 sec)mysql> SELECT @k, ExtractValue(@xml, '//b[$@k]');+------+--------------------------------+| @k   | ExtractValue(@xml, '//b[$@k]') |+------+--------------------------------+| NULL | |+------+--------------------------------+1 row in set (0.00 sec)
  • Variables in stored programs (strong checking). Variables using the syntax $variable_name can be declared and used with these functions when they are called inside stored programs. Such variables are local to the stored program in which they are defined, and are strongly checked for type and value.

    Example:

    mysql> DELIMITER |mysql> CREATE PROCEDURE myproc () -> BEGIN ->   DECLARE i INT DEFAULT 1; ->   DECLARE xml VARCHAR(25) DEFAULT '<a>X</a><a>Y</a><a>Z</a>'; -> ->   WHILE i < 4 DO -> SELECT xml, i, ExtractValue(xml, '//a[$i]'); -> SET i = i+1; ->   END WHILE; -> END |Query OK, 0 rows affected (0.01 sec)mysql> DELIMITER ;mysql> CALL myproc;+--------------------------+---+------------------------------+| xml  | i | ExtractValue(xml, '//a[$i]') |+--------------------------+---+------------------------------+| <a>X</a><a>Y</a><a>Z</a> | 1 | X |+--------------------------+---+------------------------------+1 row in set (0.00 sec)+--------------------------+---+------------------------------+| xml  | i | ExtractValue(xml, '//a[$i]') |+--------------------------+---+------------------------------+| <a>X</a><a>Y</a><a>Z</a> | 2 | Y |+--------------------------+---+------------------------------+1 row in set (0.01 sec)+--------------------------+---+------------------------------+| xml  | i | ExtractValue(xml, '//a[$i]') |+--------------------------+---+------------------------------+| <a>X</a><a>Y</a><a>Z</a> | 3 | Z |+--------------------------+---+------------------------------+1 row in set (0.01 sec)

    Parameters. Variables used in XPath expressions inside stored routines that are passed in as parameters are also subject to strong checking.

Expressions containing user variables or variables local to stored programs must otherwise (except for notation) conform to the rules for XPath expressions containing variables as given in the XPath 1.0 specification.

Note

Currently, a user variable used to store an XPath expression is treated as an empty string. Because of this, it is not possible to store an XPath expression as a user variable. (Bug #32911)

  • ExtractValue(xml_frag, xpath_expr)

    ExtractValue() takes two string arguments, a fragment of XML markup xml_frag and an XPath expression xpath_expr (also known as a locator); it returns the text (CDATA) of the first text node which is a child of the elements or elements matched by the XPath expression. In MySQL 5.5, the XPath expression can contain at most 127 characters. (This limitation is lifted in MySQL 5.6.)

    Using this function is the equivalent of performing a match using the xpath_expr after appending /text(). In other words, ExtractValue('<a><b>Sakila</b></a>', '/a/b') and ExtractValue('<a><b>Sakila</b></a>', '/a/b/text()') produce the same result.

    If multiple matches are found, the content of the first child text node of each matching element is returned (in the order matched) as a single, space-delimited string.

    If no matching text node is found for the expression (including the implicit /text())-for whatever reason, as long as xpath_expr is valid, and xml_frag consists of elements which are properly nested and closed-an empty string is returned. No distinction is made between a match on an empty element and no match at all. This is by design.

    If you need to determine whether no matching element was found in xml_frag or such an element was found but contained no child text nodes, you should test the result of an expression that uses the XPath count() function. For example, both of these statements return an empty string, as shown here:

    mysql> SELECT ExtractValue('<a><b/></a>', '/a/b');+-------------------------------------+| ExtractValue('<a><b/></a>', '/a/b') |+-------------------------------------+| |+-------------------------------------+1 row in set (0.00 sec)mysql> SELECT ExtractValue('<a><c/></a>', '/a/b');+-------------------------------------+| ExtractValue('<a><c/></a>', '/a/b') |+-------------------------------------+| |+-------------------------------------+1 row in set (0.00 sec)

    However, you can determine whether there was actually a matching element using the following:

    mysql> SELECT ExtractValue('<a><b/></a>', 'count(/a/b)');+-------------------------------------+| ExtractValue('<a><b/></a>', 'count(/a/b)') |+-------------------------------------+| 1   |+-------------------------------------+1 row in set (0.00 sec)mysql> SELECT ExtractValue('<a><c/></a>', 'count(/a/b)');+-------------------------------------+| ExtractValue('<a><c/></a>', 'count(/a/b)') |+-------------------------------------+| 0   |+-------------------------------------+1 row in set (0.01 sec)
    Important

    ExtractValue() returns only CDATA, and does not return any tags that might be contained within a matching tag, nor any of their content (see the result returned as val1 in the following example).

    mysql> SELECT ->   ExtractValue('<a>ccc<b>ddd</b></a>', '/a') AS val1, ->   ExtractValue('<a>ccc<b>ddd</b></a>', '/a/b') AS val2, ->   ExtractValue('<a>ccc<b>ddd</b></a>', '//b') AS val3, ->   ExtractValue('<a>ccc<b>ddd</b></a>', '/b') AS val4, ->   ExtractValue('<a>ccc<b>ddd</b><b>eee</b></a>', '//b') AS val5;+------+------+------+------+---------+| val1 | val2 | val3 | val4 | val5 |+------+------+------+------+---------+| ccc  | ddd  | ddd  |  | ddd eee |+------+------+------+------+---------+

    This function uses the current SQL collation for making comparisons with contains(), performing the same collation aggregation as other string functions (such as CONCAT()), in taking into account the collation coercibility of their arguments; see Section 10.1.7.5, "Collation of Expressions", for an explanation of the rules governing this behavior.

    (Previously, binary-that is, case-sensitive-comparison was always used.)

    NULL is returned if xml_frag contains elements which are not properly nested or closed, and a warning is generated, as shown in this example:

    mysql> SELECT ExtractValue('<a>c</a><b', '//a');+-----------------------------------+| ExtractValue('<a>c</a><b', '//a') |+-----------------------------------+| NULL  |+-----------------------------------+1 row in set, 1 warning (0.00 sec)mysql> SHOW WARNINGS;+---------+------+-----------------------------------------------------+| Level   | Code | Message |+---------+------+-----------------------------------------------------+| Warning | 1523 | Incorrect XML value: 'parse error at line 1 pos 11: || |  | END-OF-INPUT unexpected ('>' wanted)'   |+---------+------+-----------------------------------------------------+1 row in set (0.00 sec)mysql> SELECT ExtractValue('<a>c</a><b/>', '//a');+-------------------------------------+| ExtractValue('<a>c</a><b/>', '//a') |+-------------------------------------+| c   |+-------------------------------------+1 row in set (0.00 sec)
  • UpdateXML(xml_target, xpath_expr, new_xml)

    This function replaces a single portion of a given fragment of XML markup xml_target with a new XML fragment new_xml, and then returns the changed XML. The portion of xml_target that is replaced matches an XPath expression xpath_expr supplied by the user. In MySQL 5.5, the XPath expression can contain at most 127 characters. (This limitation is lifted in MySQL 5.6.)

    If no expression matching xpath_expr is found, or if multiple matches are found, the function returns the original xml_target XML fragment. All three arguments should be strings.

    mysql> SELECT ->   UpdateXML('<a><b>ccc</b><d></d></a>', '/a', '<e>fff</e>') AS val1, ->   UpdateXML('<a><b>ccc</b><d></d></a>', '/b', '<e>fff</e>') AS val2, ->   UpdateXML('<a><b>ccc</b><d></d></a>', '//b', '<e>fff</e>') AS val3, ->   UpdateXML('<a><b>ccc</b><d></d></a>', '/a/d', '<e>fff</e>') AS val4, ->   UpdateXML('<a><d></d><b>ccc</b><d></d></a>', '/a/d', '<e>fff</e>') AS val5 -> \G*************************** 1. row ***************************val1: <e>fff</e>val2: <a><b>ccc</b><d></d></a>val3: <a><e>fff</e><d></d></a>val4: <a><b>ccc</b><e>fff</e></a>val5: <a><d></d><b>ccc</b><d></d></a>
Note

A discussion in depth of XPath syntax and usage are beyond the scope of this Manual. Please see the XML Path Language (XPath) 1.0 specification for definitive information. A useful resource for those new to XPath or who are wishing a refresher in the basics is the Zvon.org XPath Tutorial, which is available in several languages.

Descriptions and examples of some basic XPath expressions follow:

  • /tag

    Matches <tag/> if and only if <tag/> is the root element.

    Example: /a has a match in <a><b/></a> because it matches the outermost (root) tag. It does not match the inner a element in <b><a/></b> because in this instance it is the child of another element.

  • /tag1/tag2

    Matches <tag2/> if and only if it is a child of <tag1/>, and <tag1/> is the root element.

    Example: /a/b matches the b element in the XML fragment <a><b/></a> because it is a child of the root element a. It does not have a match in <b><a/></b> because in this case, b is the root element (and hence the child of no other element). Nor does the XPath expression have a match in <a><c><b/></c></a> here, b is a descendant of a, but not actually a child of a.

    This construct is extendable to three or more elements. For example, the XPath expression /a/b/c matches the c element in the fragment <a><b><c/></b></a>.

  • //tag

    Matches any instance of <tag>.

    Example: //a matches the a element in any of the following: <a><b><c/></b></a> <c><a><b/></a></b> <c><b><a/></b></c>.

    // can be combined with /. For example, //a/b matches the b element in either of the fragments <a><b/></a> or <a><b><c/></b></a>

    Note

    //tag is the equivalent of /descendant-or-self::*/tag. A common error is to confuse this with /descendant-or-self::tag, although the latter expression can actually lead to very different results, as can be seen here:

    mysql> SET @xml = '<a><b><c>w</c><b>x</b><d>y</d>z</b></a>';Query OK, 0 rows affected (0.00 sec)mysql> SELECT @xml;+-----------------------------------------+| @xml |+-----------------------------------------+| <a><b><c>w</c><b>x</b><d>y</d>z</b></a> |+-----------------------------------------+1 row in set (0.00 sec)mysql> SELECT ExtractValue(@xml, '//b[1]');+------------------------------+| ExtractValue(@xml, '//b[1]') |+------------------------------+| x z  |+------------------------------+1 row in set (0.00 sec)mysql> SELECT ExtractValue(@xml, '//b[2]');+------------------------------+| ExtractValue(@xml, '//b[2]') |+------------------------------+|  |+------------------------------+1 row in set (0.01 sec)mysql> SELECT ExtractValue(@xml, '/descendant-or-self::*/b[1]');+---------------------------------------------------+| ExtractValue(@xml, '/descendant-or-self::*/b[1]') |+---------------------------------------------------+| x z   |+---------------------------------------------------+1 row in set (0.06 sec)mysql> SELECT ExtractValue(@xml, '/descendant-or-self::*/b[2]');+---------------------------------------------------+| ExtractValue(@xml, '/descendant-or-self::*/b[2]') |+---------------------------------------------------+|   |+---------------------------------------------------+1 row in set (0.00 sec)mysql> SELECT ExtractValue(@xml, '/descendant-or-self::b[1]');+-------------------------------------------------+| ExtractValue(@xml, '/descendant-or-self::b[1]') |+-------------------------------------------------+| z   |+-------------------------------------------------+1 row in set (0.00 sec)mysql> SELECT ExtractValue(@xml, '/descendant-or-self::b[2]');+-------------------------------------------------+| ExtractValue(@xml, '/descendant-or-self::b[2]') |+-------------------------------------------------+| x   |+-------------------------------------------------+1 row in set (0.00 sec)
  • The * operator acts as a "wildcard" that matches any element. For example, the expression /*/b matches the b element in either of the XML fragments <a><b/></a> or <c><b/></c>. However, the expression does not produce a match in the fragment <b><a/></b> because b must be a child of some other element. The wildcard may be used in any position: The expression /*/b/* will match any child of a b element that is itself not the root element.

  • You can match any of several locators using the | (UNION) operator. For example, the expression //b|//c matches all b and c elements in the XML target.

  • It is also possible to match an element based on the value of one or more of its attributes. This done using the syntax tag[@attribute="value"]. For example, the expression //b[@id="idB"] matches the second b element in the fragment <a><b id="idA"/><c/><b id="idB"/></a>. To match against any element having attribute="value", use the XPath expression //*[attribute="value"].

    To filter multiple attribute values, simply use multiple attribute-comparison clauses in succession. For example, the expression //b[@c="x"][@d="y"] matches the element <b c="x" d="y"/> occurring anywhere in a given XML fragment.

    To find elements for which the same attribute matches any of several values, you can use multiple locators joined by the | operator. For example, to match all b elements whose c attributes have either of the values 23 or 17, use the expression //b[@c="23"]|//b[@c="17"]. You can also use the logical or operator for this purpose: //b[@c="23" or @c="17"].

    Note

    The difference between or and | is that or joins conditions, while | joins result sets.

XPath Limitations. The XPath syntax supported by these functions is currently subject to the following limitations:

  • Nodeset-to-nodeset comparison (such as '/a/b[@c=@d]') is not supported.

  • All of the standard XPath comparison operators are supported. (Bug #22823)

  • Relative locator expressions are resolved in the context of the root node. For example, consider the following query and result:

    mysql> SELECT ExtractValue( ->   '<a><b c="1">X</b><b c="2">Y</b></a>', -> 'a/b' -> ) AS result;+--------+| result |+--------+| X Y |+--------+1 row in set (0.03 sec)

    In this case, the locator a/b resolves to /a/b.

    Relative locators are also supported within predicates. In the following example, d[../@c="1"] is resolved as /a/b[@c="1"]/d:

    mysql> SELECT ExtractValue( ->  '<a> -> <b c="1"><d>X</d></b> -> <b c="2"><d>X</d></b> ->  </a>', ->  'a/b/d[../@c="1"]') -> AS result;+--------+| result |+--------+| X  |+--------+1 row in set (0.00 sec)
  • Locators prefixed with expressions that evaluate as scalar values-including variable references, literals, numbers, and scalar function calls-are not permitted, and their use results in an error.

  • The :: operator is not supported in combination with node types such as the following:

    • axis::comment()

    • axis::text()

    • axis::processing-instructions()

    • axis::node()

    However, name tests (such as axis::name and axis::*) are supported, as shown in these examples:

    mysql> SELECT ExtractValue('<a><b>x</b><c>y</c></a>','/a/child::b');+-------------------------------------------------------+| ExtractValue('<a><b>x</b><c>y</c></a>','/a/child::b') |+-------------------------------------------------------+| x |+-------------------------------------------------------+1 row in set (0.02 sec)mysql> SELECT ExtractValue('<a><b>x</b><c>y</c></a>','/a/child::*');+-------------------------------------------------------+| ExtractValue('<a><b>x</b><c>y</c></a>','/a/child::*') |+-------------------------------------------------------+| x y   |+-------------------------------------------------------+1 row in set (0.01 sec)
  • "Up-and-down" navigation is not supported in cases where the path would lead "above" the root element. That is, you cannot use expressions which match on descendants of ancestors of a given element, where one or more of the ancestors of the current element is also an ancestor of the root element (see Bug #16321).

  • The following XPath functions are not supported, or have known issues as indicated:

    • id()

    • lang()

    • local-name()

    • name()

    • namespace-uri()

    • normalize-space()

    • starts-with()

    • string()

    • substring-after()

    • substring-before()

    • translate()

  • The following axes are not supported:

    • following-sibling

    • following

    • preceding-sibling

    • preceding

XPath expressions passed as arguments to ExtractValue() and UpdateXML() may contain the colon character (":") in element selectors, which enables their use with markup employing XML namespaces notation. For example:

mysql> SET @xml = '<a>111<b:c>222<d>333</d><e:f>444</e:f></b:c></a>';Query OK, 0 rows affected (0.00 sec)mysql> SELECT ExtractValue(@xml, '//e:f');+-----------------------------+| ExtractValue(@xml, '//e:f') |+-----------------------------+| 444 |+-----------------------------+1 row in set (0.00 sec)mysql> SELECT UpdateXML(@xml, '//b:c', '<g:h>555</g:h>');+--------------------------------------------+| UpdateXML(@xml, '//b:c', '<g:h>555</g:h>') |+--------------------------------------------+| <a>111<g:h>555</g:h></a>   |+--------------------------------------------+1 row in set (0.00 sec)

This is similar in some respects to what is permitted by Apache Xalan and some other parsers, and is much simpler than requiring namespace declarations or the use of the namespace-uri() and local-name() functions.

Error handling. For both ExtractValue() and UpdateXML(), the XPath locator used must be valid and the XML to be searched must consist of elements which are properly nested and closed. If the locator is invalid, an error is generated:

mysql> SELECT ExtractValue('<a>c</a><b/>', '/&a');ERROR 1105 (HY000): XPATH syntax error: '&a'

If xml_frag does not consist of elements which are properly nested and closed, NULL is returned and a warning is generated, as shown in this example:

mysql> SELECT ExtractValue('<a>c</a><b', '//a');+-----------------------------------+| ExtractValue('<a>c</a><b', '//a') |+-----------------------------------+| NULL  |+-----------------------------------+1 row in set, 1 warning (0.00 sec)mysql> SHOW WARNINGS;+---------+------+-------------------------------------------------------------+| Level   | Code | Message |+---------+------+-------------------------------------------------------------+| Warning | 1523 | Incorrect XML value: 'parse error at line 1 pos 11: || |  | END-OF-INPUT unexpected ('>' wanted)' |+---------+------+-------------------------------------------------------------+1 row in set (0.00 sec)mysql> SELECT ExtractValue('<a>c</a><b/>', '//a');+-------------------------------------+| ExtractValue('<a>c</a><b/>', '//a') |+-------------------------------------+| c   |+-------------------------------------+1 row in set (0.00 sec)
Important

The replacement XML used as the third argument to UpdateXML() is not checked to determine whether it consists solely of elements which are properly nested and closed.

XPath Injection. code injection occurs when malicious code is introduced into the system to gain unauthorized access to privileges and data. It is based on exploiting assumptions made by developers about the type and content of data input from users. XPath is no exception in this regard.

A common scenario in which this can happen is the case of application which handles authorization by matching the combination of a login name and password with those found in an XML file, using an XPath expression like this one:

//user[login/text()='neapolitan' and password/text()='1c3cr34m']/attribute::id

This is the XPath equivalent of an SQL statement like this one:

SELECT id FROM users WHERE login='neapolitan' ANDpassword='1c3cr34m';

A PHP application employing XPath might handle the login process like this:

<?php  $file =   "users.xml";  $login =   $POST["login"];  $password =   $POST["password"];  $xpath = "//user[login/text()=$login and password/text()=$password]/attribute::id";  if( file_exists($file) )  { $xml = simplexml_load_file($file); if($result = $xml->xpath($xpath))  echo "You are now logged in as user $result[0]."; else  echo "Invalid login name or password.";  }  else exit("Failed to open $file.");?>

No checks are performed on the input. This means that a malevolent user can "short-circuit" the test by entering ' or 1=1 for both the login name and password, resulting in $xpath being evaluated as shown here:

//user[login/text()='' or 1=1 and password/text()='' or1=1]/attribute::id

Since the expression inside the square brackets always evaluates as true, it is effectively the same as this one, which matches the id attribute of every user element in the XML document:

//user/attribute::id

One way in which this particular attack can be circumvented is simply by quoting the variable names to be interpolated in the definition of $xpath, forcing the values passed from a Web form to be converted to strings:

$xpath = "//user[login/text()='$login' and password/text()='$password']/attribute::id";

This is the same strategy that is often recommended for preventing SQL injection attacks. In general, the practices you should follow for preventing XPath injection attacks are the same as for preventing SQL injection:

  • Never accepted untested data from users in your application.

  • Check all user-submitted data for type; reject or convert data that is of the wrong type

  • Test numeric data for out of range values; truncate, round, or reject values that are out of range. Test strings for illegal characters and either strip them out or reject input containing them.

  • Do not output explicit error messages that might provide an unauthorized user with clues that could be used to compromise the system; log these to a file or database table instead.

Just as SQL injection attacks can be used to obtain information about database schemas, so can XPath injection be used to traverse XML files to uncover their structure, as discussed in Amit Klein's paper Blind XPath Injection (PDF file, 46KB).

It is also important to check the output being sent back to the client. Consider what can happen when we use the MySQL ExtractValue() function:

mysql> SELECT ExtractValue( -> LOAD_FILE('users.xml'), -> '//user[login/text()="" or 1=1 and password/text()="" or1=1]/attribute::id' -> ) AS id;+-------------------------------+| id |+-------------------------------+| 00327 13579 02403 42354 28570 |+-------------------------------+1 row in set (0.01 sec)

Because ExtractValue() returns multiple matches as a single space-delimited string, this injection attack provides every valid ID contained within users.xml to the user as a single row of output. As an extra safeguard, you should also test output before returning it to the user. Here is a simple example:

mysql> SELECT @id = ExtractValue( -> LOAD_FILE('users.xml'), -> '//user[login/text()="" or 1=1 and password/text()="" or1=1]/attribute::id' -> );Query OK, 0 rows affected (0.00 sec)mysql> SELECT IF( -> INSTR(@id, ' ') = 0, -> @id, -> 'Unable to retrieve user ID') -> AS singleID;+----------------------------+| singleID   |+----------------------------+| Unable to retrieve user ID |+----------------------------+1 row in set (0.00 sec)

In general, the guidelines for returning data to users securely are the same as for accepting user input. These can be summed up as:

  • Always test outgoing data for type and permissible values.

  • Never permit unauthorized users to view error messages that might provide information about the application that could be used to exploit it.

12.12. Bit Functions

Table 12.16. Bitwise Functions

NameDescription
BIT_COUNT()Return the number of bits that are set
&Bitwise AND
~Invert bits
|Bitwise OR
^Bitwise XOR
<<Left shift
>>Right shift

MySQL uses BIGINT (64-bit) arithmetic for bit operations, so these operators have a maximum range of 64 bits.

  • |

    Bitwise OR:

    mysql> SELECT 29 | 15; -> 31

    The result is an unsigned 64-bit integer.

  • &

    Bitwise AND:

    mysql> SELECT 29 & 15; -> 13

    The result is an unsigned 64-bit integer.

  • ^

    Bitwise XOR:

    mysql> SELECT 1 ^ 1; -> 0mysql> SELECT 1 ^ 0; -> 1mysql> SELECT 11 ^ 3; -> 8

    The result is an unsigned 64-bit integer.

  • <<

    Shifts a longlong (BIGINT) number to the left.

    mysql> SELECT 1 << 2; -> 4

    The result is an unsigned 64-bit integer. The value is truncated to 64 bits. In particular, if the shift count is greater or equal to the width of an unsigned 64-bit number, the result is zero.

  • >>

    Shifts a longlong (BIGINT) number to the right.

    mysql> SELECT 4 >> 2; -> 1

    The result is an unsigned 64-bit integer. The value is truncated to 64 bits. In particular, if the shift count is greater or equal to the width of an unsigned 64-bit number, the result is zero.

  • ~

    Invert all bits.

    mysql> SELECT 5 & ~1; -> 4

    The result is an unsigned 64-bit integer.

  • BIT_COUNT(N)

    Returns the number of bits that are set in the argument N.

    mysql> SELECT BIT_COUNT(29), BIT_COUNT(b'101010'); -> 4, 3

12.13. Encryption and Compression Functions

Table 12.17. Encryption Functions

NameDescription
AES_DECRYPT()Decrypt using AES
AES_ENCRYPT()Encrypt using AES
COMPRESS()Return result as a binary string
DECODE()Decodes a string encrypted using ENCODE()
DES_DECRYPT()Decrypt a string
DES_ENCRYPT()Encrypt a string
ENCODE()Encode a string
ENCRYPT()Encrypt a string
MD5()Calculate MD5 checksum
OLD_PASSWORD()Return the value of the pre-4.1 implementation of PASSWORD
PASSWORD()Calculate and return a password string
SHA1(), SHA()Calculate an SHA-1 160-bit checksum
SHA2()Calculate an SHA-2 checksum
UNCOMPRESS()Uncompress a string compressed
UNCOMPRESSED_LENGTH()Return the length of a string before compression

Many encryption and compression functions return strings for which the result might contain arbitrary byte values. If you want to store these results, use a column with a VARBINARY or BLOB binary string data type. This will avoid potential problems with trailing space removal or character set conversion that would change data values, such as may occur if you use a nonbinary string data type (CHAR, VARCHAR, TEXT).

Some encryption functions return strings of ASCII characters: MD5(), OLD_PASSWORD(), PASSWORD(), SHA(), SHA1(). As of MySQL 5.5.3, their return value is a nonbinary string that has a character set and collation determined by the character_set_connection and collation_connection system variables. Before 5.5.3, these functions return binary strings. The same change was made for SHA2() in MySQL 5.5.6.

For versions in which functions such as MD5() or SHA1() return a string of hex digits as a binary string, the return value cannot be converted to uppercase or compared in case-insensitive fashion as is. You must convert the value to a nonbinary string. See the discussion of binary string conversion in Section 12.10, "Cast Functions and Operators".

If an application stores values from a function such as MD5() or SHA1() that returns a string of hex digits, more efficient storage and comparisons can be obtained by converting the hex representation to binary using UNHEX() and storing the result in a BINARY(N) column. Each pair of hex digits requires one byte in binary form, so the value of N depends on the length of the hex string. N is 16 for an MD5() value and 20 for a SHA1() value. For SHA2(), N ranges from 28 to 32 depending on the argument specifying the desired bit length of the result.

The size penalty for storing the hex string in a CHAR column is at least two times, up to eight times if the value is stored in a column that uses the utf8 character set (where each character uses 4 bytes). Storing the string also results in slower comparisons because of the larger values and the need to take character set collation rules into account.

Suppose that an application stores MD5() string values in a CHAR(32) column:

CREATE TABLE md5_tbl (md5_val CHAR(32), ...);INSERT INTO md5_tbl (md5_val, ...) VALUES(MD5('abcdef'), ...);

To convert hex strings to more compact form, modify the application to use UNHEX() and BINARY(16) instead as follows:

CREATE TABLE md5_tbl (md5_val BINARY(16), ...);INSERT INTO md5_tbl (md5_val, ...) VALUES(UNHEX(MD5('abcdef')), ...);

Applications should be prepared to handle the very rare case that a hashing function produces the same value for two different input values. One way to make collisions detectable is to make the hash column a primary key.

Note

Exploits for the MD5 and SHA-1 algorithms have become known. You may wish to consider using one of the other encryption functions described in this section instead, such as SHA2().

Caution

Passwords or other sensitive values supplied as arguments to encryption functions are sent in plaintext to the MySQL server unless an SSL connection is used. Also, such values will appear in any MySQL logs to which they are written. To avoid these types of exposure, applications can encrypt sensitive values on the client side before sending them to the server. The same considerations apply to encryption keys. To avoid exposing these, applications can use stored procedures to encrypt and decrypt values on the server side.

  • AES_DECRYPT(crypt_str,key_str)

    This function decrypts data using the official AES (Advanced Encryption Standard) algorithm. For more information, see the description of AES_ENCRYPT().

  • AES_ENCRYPT(str,key_str)

    AES_ENCRYPT() and AES_DECRYPT() enable encryption and decryption of data using the official AES (Advanced Encryption Standard) algorithm, previously known as "Rijndael." Encoding with a 128-bit key length is used, but you can extend it up to 256 bits by modifying the source. We chose 128 bits because it is much faster and it is secure enough for most purposes.

    AES_ENCRYPT() encrypts a string and returns a binary string. AES_DECRYPT() decrypts the encrypted string and returns the original string. The input arguments may be any length. If either argument is NULL, the result of this function is also NULL.

    Because AES is a block-level algorithm, padding is used to encode uneven length strings and so the result string length may be calculated using this formula:

    16 * (trunc(string_length / 16) + 1)

    If AES_DECRYPT() detects invalid data or incorrect padding, it returns NULL. However, it is possible for AES_DECRYPT() to return a non-NULL value (possibly garbage) if the input data or the key is invalid.

    You can use the AES functions to store data in an encrypted form by modifying your queries:

    INSERT INTO t VALUES (1,AES_ENCRYPT('text','password'));

    AES_ENCRYPT() and AES_DECRYPT() can be considered the most cryptographically secure encryption functions currently available in MySQL.

  • COMPRESS(string_to_compress)

    Compresses a string and returns the result as a binary string. This function requires MySQL to have been compiled with a compression library such as zlib. Otherwise, the return value is always NULL. The compressed string can be uncompressed with UNCOMPRESS().

    mysql> SELECT LENGTH(COMPRESS(REPEAT('a',1000))); -> 21mysql> SELECT LENGTH(COMPRESS('')); -> 0mysql> SELECT LENGTH(COMPRESS('a')); -> 13mysql> SELECT LENGTH(COMPRESS(REPEAT('a',16))); -> 15

    The compressed string contents are stored the following way:

    • Empty strings are stored as empty strings.

    • Nonempty strings are stored as a 4-byte length of the uncompressed string (low byte first), followed by the compressed string. If the string ends with space, an extra "." character is added to avoid problems with endspace trimming should the result be stored in a CHAR or VARCHAR column. (However, use of nonbinary string data types such as CHAR or VARCHAR to store compressed strings is not recommended anyway because character set conversion may occur. Use a VARBINARY or BLOB binary string column instead.)

  • DECODE(crypt_str,pass_str)

    Decrypts the encrypted string crypt_str using pass_str as the password. crypt_str should be a string returned from ENCODE().

  • DES_DECRYPT(crypt_str[,key_str])

    Decrypts a string encrypted with DES_ENCRYPT(). If an error occurs, this function returns NULL.

    This function works only if MySQL has been configured with SSL support. See Section 6.3.8, "Using SSL for Secure Connections".

    If no key_str argument is given, DES_DECRYPT() examines the first byte of the encrypted string to determine the DES key number that was used to encrypt the original string, and then reads the key from the DES key file to decrypt the message. For this to work, the user must have the SUPER privilege. The key file can be specified with the --des-key-file server option.

    If you pass this function a key_str argument, that string is used as the key for decrypting the message.

    If the crypt_str argument does not appear to be an encrypted string, MySQL returns the given crypt_str.

  • DES_ENCRYPT(str[,{key_num|key_str}])

    Encrypts the string with the given key using the Triple-DES algorithm.

    This function works only if MySQL has been configured with SSL support. See Section 6.3.8, "Using SSL for Secure Connections".

    The encryption key to use is chosen based on the second argument to DES_ENCRYPT(), if one was given. With no argument, the first key from the DES key file is used. With a key_num argument, the given key number (0 to 9) from the DES key file is used. With a key_str argument, the given key string is used to encrypt str.

    The key file can be specified with the --des-key-file server option.

    The return string is a binary string where the first character is CHAR(128 | key_num). If an error occurs, DES_ENCRYPT() returns NULL.

    The 128 is added to make it easier to recognize an encrypted key. If you use a string key, key_num is 127.

    The string length for the result is given by this formula:

    new_len = orig_len + (8 - (orig_len % 8)) + 1

    Each line in the DES key file has the following format:

    key_num des_key_str

    Each key_num value must be a number in the range from 0 to 9. Lines in the file may be in any order. des_key_str is the string that is used to encrypt the message. There should be at least one space between the number and the key. The first key is the default key that is used if you do not specify any key argument to DES_ENCRYPT().

    You can tell MySQL to read new key values from the key file with the FLUSH DES_KEY_FILE statement. This requires the RELOAD privilege.

    One benefit of having a set of default keys is that it gives applications a way to check for the existence of encrypted column values, without giving the end user the right to decrypt those values.

    mysql> SELECT customer_address FROM customer_table  > WHERE crypted_credit_card = DES_ENCRYPT('credit_card_number');
  • ENCODE(str,pass_str)

    Encrypt str using pass_str as the password. To decrypt the result, use DECODE().

    The result is a binary string of the same length as str.

    The strength of the encryption is based on how good the random generator is. It should suffice for short strings.

  • ENCRYPT(str[,salt])

    Encrypts str using the Unix crypt() system call and returns a binary string. The salt argument must be a string with at least two characters or the result will be NULL. If no salt argument is given, a random value is used.

    mysql> SELECT ENCRYPT('hello'); -> 'VxuFAJXVARROc'

    ENCRYPT() ignores all but the first eight characters of str, at least on some systems. This behavior is determined by the implementation of the underlying crypt() system call.

    The use of ENCRYPT() with the ucs2, utf16, or utf32 multi-byte character sets is not recommended because the system call expects a string terminated by a zero byte.

    If crypt() is not available on your system (as is the case with Windows), ENCRYPT() always returns NULL.

  • MD5(str)

    Calculates an MD5 128-bit checksum for the string. The value is returned as a string of 32 hex digits, or NULL if the argument was NULL. The return value can, for example, be used as a hash key. See the notes at the beginning of this section about storing hash values efficiently.

    As of MySQL 5.5.3, the return value is a nonbinary string in the connection character set. Before 5.5.3, the return value is a binary string; see the notes at the beginning of this section about using the value as a nonbinary string.

    mysql> SELECT MD5('testing'); -> 'ae2b1fca515949e5d54fb22b8ed95575'

    This is the "RSA Data Security, Inc. MD5 Message-Digest Algorithm."

    See the note regarding the MD5 algorithm at the beginning this section.

  • OLD_PASSWORD(str)

    OLD_PASSWORD() was added when the implementation of PASSWORD() was changed in MySQL 4.1 to improve security. OLD_PASSWORD() returns the value of the pre-4.1 implementation of PASSWORD() as a string, and is intended to permit you to reset passwords for any pre-4.1 clients that need to connect to your version 5.5 MySQL server without locking them out. See Section 6.1.2.4, "Password Hashing in MySQL".

    As of MySQL 5.5.3, the return value is a nonbinary string in the connection character set. Before 5.5.3, the return value is a binary string.

  • PASSWORD(str)

    Calculates and returns a hashed password string from the plaintext password str and returns a nonbinary string in the connection character set (a binary string before MySQL 5.5.3), or NULL if the argument is NULL. This function is the SQL interface to the algorithm used by the server to encrypt MySQL passwords for storage in the mysql.user grant table.

    The password hashing method used by PASSWORD() depends on the value of the old_passwords system variable:

    mysql> SET old_passwords = 0;mysql> SELECT PASSWORD('mypass');+-------------------------------------------+| PASSWORD('mypass') |+-------------------------------------------+| *6C8989366EAF75BB670AD8EA7A7FC1176A95CEF4 |+-------------------------------------------+mysql> SET old_passwords = 1;mysql> SELECT PASSWORD('mypass');+--------------------+| PASSWORD('mypass') |+--------------------+| 6f8c114b58f2ce9e   |+--------------------+

    If old_passwords=1, PASSWORD('str') returns the same value as OLD_PASSWORD('str').

    For descriptions of the permitted values of old_passwords, see Section 5.1.4, "Server System Variables".

    Encryption performed by PASSWORD() is one-way (not reversible). It is not the same type of encryption as used for Unix passwords; for that, use ENCRYPT().

    Note

    The PASSWORD() function is used by the authentication system in MySQL Server; you should not use it in your own applications. For that purpose, consider MD5() or SHA2() instead. Also see RFC 2195, section 2 (Challenge-Response Authentication Mechanism (CRAM)), for more information about handling passwords and authentication securely in your applications.

    Important

    Statements that invoke PASSWORD() may be recorded in server logs or in a history file such as ~/.mysql_history, which means that cleartext passwords may be read by anyone having read access to that information. See Section 6.1.2, "Keeping Passwords Secure".

  • SHA1(str), SHA(str)

    Calculates an SHA-1 160-bit checksum for the string, as described in RFC 3174 (Secure Hash Algorithm). The value is returned as a string of 40 hex digits, or NULL if the argument was NULL. One of the possible uses for this function is as a hash key. See the notes at the beginning of this section about storing hash values efficiently. You can also use SHA1() as a cryptographic function for storing passwords. SHA() is synonymous with SHA1().

    As of MySQL 5.5.3, the return value is a nonbinary string in the connection character set. Before 5.5.3, the return value is a binary string; see the notes at the beginning of this section about using the value as a nonbinary string.

    mysql> SELECT SHA1('abc'); -> 'a9993e364706816aba3e25717850c26c9cd0d89d'

    SHA1() can be considered a cryptographically more secure equivalent of MD5(). However, see the note regarding the MD5 and SHA-1 algorithms at the beginning this section.

  • SHA2(str, hash_length)

    Calculates the SHA-2 family of hash functions (SHA-224, SHA-256, SHA-384, and SHA-512). The first argument is the cleartext string to be hashed. The second argument indicates the desired bit length of the result, which must have a value of 224, 256, 384, 512, or 0 (which is equivalent to 256). If either argument is NULL or the hash length is not one of the permitted values, the return value is NULL. Otherwise, the function result is a hash value containing the desired number of bits. See the notes at the beginning of this section about storing hash values efficiently.

    As of MySQL 5.5.6, the return value is a nonbinary string in the connection character set. Before 5.5.6, the return value is a binary string; see the notes at the beginning of this section about using the value as a nonbinary string.

    mysql> SELECT SHA2('abc', 224); -> '23097d223405d8228642a477bda255b32aadbce4bda0b3f7e36c9da7'

    This function works only if MySQL has been configured with SSL support. See Section 6.3.8, "Using SSL for Secure Connections".

    SHA2() can be considered cryptographically more secure than MD5() or SHA1().

    SHA2() was added in MySQL 5.5.5.

  • UNCOMPRESS(string_to_uncompress)

    Uncompresses a string compressed by the COMPRESS() function. If the argument is not a compressed value, the result is NULL. This function requires MySQL to have been compiled with a compression library such as zlib. Otherwise, the return value is always NULL.

    mysql> SELECT UNCOMPRESS(COMPRESS('any string')); -> 'any string'mysql> SELECT UNCOMPRESS('any string'); -> NULL
  • UNCOMPRESSED_LENGTH(compressed_string)

    Returns the length that the compressed string had before being compressed.

    mysql> SELECT UNCOMPRESSED_LENGTH(COMPRESS(REPEAT('a',30))); -> 30

12.14. Information Functions

Table 12.18. Information Functions

NameDescription
BENCHMARK()Repeatedly execute an expression
CHARSET()Return the character set of the argument
COERCIBILITY()Return the collation coercibility value of the string argument
COLLATION()Return the collation of the string argument
CONNECTION_ID()Return the connection ID (thread ID) for the connection
CURRENT_USER(), CURRENT_USERThe authenticated user name and host name
DATABASE()Return the default (current) database name
FOUND_ROWS()For a SELECT with a LIMIT clause, the number of rows that would be returned were there no LIMIT clause
LAST_INSERT_ID()Value of the AUTOINCREMENT column for the last INSERT
ROW_COUNT()The number of rows updated
SCHEMA()A synonym for DATABASE()
SESSION_USER()Synonym for USER()
SYSTEM_USER()Synonym for USER()
USER()The user name and host name provided by the client
VERSION()Returns a string that indicates the MySQL server version

  • BENCHMARK(count,expr)

    The BENCHMARK() function executes the expression expr repeatedly count times. It may be used to time how quickly MySQL processes the expression. The result value is always 0. The intended use is from within the mysql client, which reports query execution times:

    mysql> SELECT BENCHMARK(1000000,ENCODE('hello','goodbye'));+----------------------------------------------+| BENCHMARK(1000000,ENCODE('hello','goodbye')) |+----------------------------------------------+| 0 |+----------------------------------------------+1 row in set (4.74 sec)

    The time reported is elapsed time on the client end, not CPU time on the server end. It is advisable to execute BENCHMARK() several times, and to interpret the result with regard to how heavily loaded the server machine is.

    BENCHMARK() is intended for measuring the runtime performance of scalar expressions, which has some significant implications for the way that you use it and interpret the results:

    • Only scalar expressions can be used. Although the expression can be a subquery, it must return a single column and at most a single row. For example, BENCHMARK(10, (SELECT * FROM t)) will fail if the table t has more than one column or more than one row.

    • Executing a SELECT expr statement N times differs from executing SELECT BENCHMARK(N, expr) in terms of the amount of overhead involved. The two have very different execution profiles and you should not expect them to take the same amount of time. The former involves the parser, optimizer, table locking, and runtime evaluation N times each. The latter involves only runtime evaluation N times, and all the other components just once. Memory structures already allocated are reused, and runtime optimizations such as local caching of results already evaluated for aggregate functions can alter the results. Use of BENCHMARK() thus measures performance of the runtime component by giving more weight to that component and removing the "noise" introduced by the network, parser, optimizer, and so forth.

  • CHARSET(str)

    Returns the character set of the string argument.

    mysql> SELECT CHARSET('abc'); -> 'latin1'mysql> SELECT CHARSET(CONVERT('abc' USING utf8)); -> 'utf8'mysql> SELECT CHARSET(USER()); -> 'utf8'
  • COERCIBILITY(str)

    Returns the collation coercibility value of the string argument.

    mysql> SELECT COERCIBILITY('abc' COLLATE latin1_swedish_ci); -> 0mysql> SELECT COERCIBILITY(USER()); -> 3mysql> SELECT COERCIBILITY('abc'); -> 4

    The return values have the meanings shown in the following table. Lower values have higher precedence.

    CoercibilityMeaningExample
    0Explicit collationValue with COLLATE clause
    1No collationConcatenation of strings with different collations
    2Implicit collationColumn value, stored routine parameter or local variable
    3System constantUSER() return value
    4CoercibleLiteral string
    5IgnorableNULL or an expression derived fromNULL
  • COLLATION(str)

    Returns the collation of the string argument.

    mysql> SELECT COLLATION('abc'); -> 'latin1_swedish_ci'mysql> SELECT COLLATION(_utf8'abc'); -> 'utf8_general_ci'
  • CONNECTION_ID()

    Returns the connection ID (thread ID) for the connection. Every connection has an ID that is unique among the set of currently connected clients.

    mysql> SELECT CONNECTION_ID(); -> 23786
  • CURRENT_USER, CURRENT_USER()

    Returns the user name and host name combination for the MySQL account that the server used to authenticate the current client. This account determines your access privileges. The return value is a string in the utf8 character set.

    The value of CURRENT_USER() can differ from the value of USER().

    mysql> SELECT USER(); -> 'davida@localhost'mysql> SELECT * FROM mysql.user;ERROR 1044: Access denied for user ''@'localhost' todatabase 'mysql'mysql> SELECT CURRENT_USER(); -> '@localhost'

    The example illustrates that although the client specified a user name of davida (as indicated by the value of the USER() function), the server authenticated the client using an anonymous user account (as seen by the empty user name part of the CURRENT_USER() value). One way this might occur is that there is no account listed in the grant tables for davida.

    Within a stored program or view, CURRENT_USER() returns the account for the user who defined the object (as given by its DEFINER value). For stored procedures and functions and views defined with the SQL SECURITY INVOKER characteristic, CURRENT_USER() returns the object's invoker.

    The following statements support use of the CURRENT_USER() function to take the place of the name of (and, possibly, a host for) an affected user or a definer; in such cases, CURRENT_USER() is expanded where and as needed:

    For information about the implications that this expansion of CURRENT_USER() has for replication in different releases of MySQL 5.5, see Section 16.4.1.6, "Replication of CURRENT_USER()".

  • DATABASE()

    Returns the default (current) database name as a string in the utf8 character set. If there is no default database, DATABASE() returns NULL. Within a stored routine, the default database is the database that the routine is associated with, which is not necessarily the same as the database that is the default in the calling context.

    mysql> SELECT DATABASE(); -> 'test'

    If there is no default database, DATABASE() returns NULL.

  • FOUND_ROWS()

    A SELECT statement may include a LIMIT clause to restrict the number of rows the server returns to the client. In some cases, it is desirable to know how many rows the statement would have returned without the LIMIT, but without running the statement again. To obtain this row count, include a SQL_CALC_FOUND_ROWS option in the SELECT statement, and then invoke FOUND_ROWS() afterward:

    mysql> SELECT SQL_CALC_FOUND_ROWS * FROM tbl_name -> WHERE id > 100 LIMIT 10;mysql> SELECT FOUND_ROWS();

    The second SELECT returns a number indicating how many rows the first SELECT would have returned had it been written without the LIMIT clause.

    In the absence of the SQL_CALC_FOUND_ROWS option in the most recent successful SELECT statement, FOUND_ROWS() returns the number of rows in the result set returned by that statement. If the statement includes a LIMIT clause, FOUND_ROWS() returns the number of rows up to the limit. For example, FOUND_ROWS() returns 10 or 60, respectively, if the statement includes LIMIT 10 or LIMIT 50, 10.

    The row count available through FOUND_ROWS() is transient and not intended to be available past the statement following the SELECT SQL_CALC_FOUND_ROWS statement. If you need to refer to the value later, save it:

    mysql> SELECT SQL_CALC_FOUND_ROWS * FROM ... ;mysql> SET @rows = FOUND_ROWS();

    If you are using SELECT SQL_CALC_FOUND_ROWS, MySQL must calculate how many rows are in the full result set. However, this is faster than running the query again without LIMIT, because the result set need not be sent to the client.

    SQL_CALC_FOUND_ROWS and FOUND_ROWS() can be useful in situations when you want to restrict the number of rows that a query returns, but also determine the number of rows in the full result set without running the query again. An example is a Web script that presents a paged display containing links to the pages that show other sections of a search result. Using FOUND_ROWS() enables you to determine how many other pages are needed for the rest of the result.

    The use of SQL_CALC_FOUND_ROWS and FOUND_ROWS() is more complex for UNION statements than for simple SELECT statements, because LIMIT may occur at multiple places in a UNION. It may be applied to individual SELECT statements in the UNION, or global to the UNION result as a whole.

    The intent of SQL_CALC_FOUND_ROWS for UNION is that it should return the row count that would be returned without a global LIMIT. The conditions for use of SQL_CALC_FOUND_ROWS with UNION are:

    • The SQL_CALC_FOUND_ROWS keyword must appear in the first SELECT of the UNION.

    • The value of FOUND_ROWS() is exact only if UNION ALL is used. If UNION without ALL is used, duplicate removal occurs and the value of FOUND_ROWS() is only approximate.

    • If no LIMIT is present in the UNION, SQL_CALC_FOUND_ROWS is ignored and returns the number of rows in the temporary table that is created to process the UNION.

    Beyond the cases described here, the behavior of FOUND_ROWS() is undefined (for example, its value following a SELECT statement that fails with an error).

    Important

    FOUND_ROWS() is not replicated reliably using statement-based replication. This function is automatically replicated using row-based replication.

  • LAST_INSERT_ID(), LAST_INSERT_ID(expr)

    LAST_INSERT_ID() (with no argument) returns a BIGINT (64-bit) value representing the first automatically generated value successfully inserted for an AUTO_INCREMENT column as a result of the most recently executed INSERT statement. The value of LAST_INSERT_ID() remains unchanged if no rows are successfully inserted.

    For example, after inserting a row that generates an AUTO_INCREMENT value, you can get the value like this:

    mysql> SELECT LAST_INSERT_ID(); -> 195

    The currently executing statement does not affect the value of LAST_INSERT_ID(). Suppose that you generate an AUTO_INCREMENT value with one statement, and then refer to LAST_INSERT_ID() in a multiple-row INSERT statement that inserts rows into a table with its own AUTO_INCREMENT column. The value of LAST_INSERT_ID() will remain stable in the second statement; its value for the second and later rows is not affected by the earlier row insertions. (However, if you mix references to LAST_INSERT_ID() and LAST_INSERT_ID(expr), the effect is undefined.)

    If the previous statement returned an error, the value of LAST_INSERT_ID() is undefined. For transactional tables, if the statement is rolled back due to an error, the value of LAST_INSERT_ID() is left undefined. For manual ROLLBACK, the value of LAST_INSERT_ID() is not restored to that before the transaction; it remains as it was at the point of the ROLLBACK.

    Within the body of a stored routine (procedure or function) or a trigger, the value of LAST_INSERT_ID() changes the same way as for statements executed outside the body of these kinds of objects. The effect of a stored routine or trigger upon the value of LAST_INSERT_ID() that is seen by following statements depends on the kind of routine:

    • If a stored procedure executes statements that change the value of LAST_INSERT_ID(), the changed value is seen by statements that follow the procedure call.

    • For stored functions and triggers that change the value, the value is restored when the function or trigger ends, so following statements will not see a changed value.

    The ID that was generated is maintained in the server on a per-connection basis. This means that the value returned by the function to a given client is the first AUTO_INCREMENT value generated for most recent statement affecting an AUTO_INCREMENT column by that client. This value cannot be affected by other clients, even if they generate AUTO_INCREMENT values of their own. This behavior ensures that each client can retrieve its own ID without concern for the activity of other clients, and without the need for locks or transactions.

    The value of LAST_INSERT_ID() is not changed if you set the AUTO_INCREMENT column of a row to a non-"magic" value (that is, a value that is not NULL and not 0).

    Important

    If you insert multiple rows using a single INSERT statement, LAST_INSERT_ID() returns the value generated for the first inserted row only. The reason for this is to make it possible to reproduce easily the same INSERT statement against some other server.

    For example:

    mysql> USE test;Database changedmysql> CREATE TABLE t ( ->   id INT AUTO_INCREMENT NOT NULL PRIMARY KEY, ->   name VARCHAR(10) NOT NULL -> );Query OK, 0 rows affected (0.09 sec)mysql> INSERT INTO t VALUES (NULL, 'Bob');Query OK, 1 row affected (0.01 sec)mysql> SELECT * FROM t;+----+------+| id | name |+----+------+|  1 | Bob  |+----+------+1 row in set (0.01 sec)mysql> SELECT LAST_INSERT_ID();+------------------+| LAST_INSERT_ID() |+------------------+| 1 |+------------------+1 row in set (0.00 sec)mysql> INSERT INTO t VALUES -> (NULL, 'Mary'), (NULL, 'Jane'), (NULL, 'Lisa');Query OK, 3 rows affected (0.00 sec)Records: 3  Duplicates: 0  Warnings: 0mysql> SELECT * FROM t;+----+------+| id | name |+----+------+|  1 | Bob  ||  2 | Mary ||  3 | Jane ||  4 | Lisa |+----+------+4 rows in set (0.01 sec)mysql> SELECT LAST_INSERT_ID();+------------------+| LAST_INSERT_ID() |+------------------+| 2 |+------------------+1 row in set (0.00 sec)

    Although the second INSERT statement inserted three new rows into t, the ID generated for the first of these rows was 2, and it is this value that is returned by LAST_INSERT_ID() for the following SELECT statement.

    If you use INSERT IGNORE and the row is ignored, the AUTO_INCREMENT counter is not incremented and LAST_INSERT_ID() returns 0, which reflects that no row was inserted.

    If expr is given as an argument to LAST_INSERT_ID(), the value of the argument is returned by the function and is remembered as the next value to be returned by LAST_INSERT_ID(). This can be used to simulate sequences:

    1. Create a table to hold the sequence counter and initialize it:

      mysql> CREATE TABLE sequence (id INT NOT NULL);mysql> INSERT INTO sequence VALUES (0);
    2. Use the table to generate sequence numbers like this:

      mysql> UPDATE sequence SET id=LAST_INSERT_ID(id+1);mysql> SELECT LAST_INSERT_ID();

      The UPDATE statement increments the sequence counter and causes the next call to LAST_INSERT_ID() to return the updated value. The SELECT statement retrieves that value. The mysql_insert_id() C API function can also be used to get the value. See Section 22.8.3.37, "mysql_insert_id()".

    You can generate sequences without calling LAST_INSERT_ID(), but the utility of using the function this way is that the ID value is maintained in the server as the last automatically generated value. It is multi-user safe because multiple clients can issue the UPDATE statement and get their own sequence value with the SELECT statement (or mysql_insert_id()), without affecting or being affected by other clients that generate their own sequence values.

    Note that mysql_insert_id() is only updated after INSERT and UPDATE statements, so you cannot use the C API function to retrieve the value for LAST_INSERT_ID(expr) after executing other SQL statements like SELECT or SET.

  • ROW_COUNT()

    Before MySQL 5.5.5, ROW_COUNT() returns the number of rows changed, deleted, or inserted by the last statement if it was an UPDATE, DELETE, or INSERT. For other statements, the value may not be meaningful.

    As of MySQL 5.5.5, ROW_COUNT() returns a value as follows:

    • DDL statements: 0. This applies to statements such as CREATE TABLE or DROP TABLE.

    • DML statements other than SELECT: The number of affected rows. This applies to statements such as UPDATE, INSERT, or DELETE (as before), but now also to statements such as ALTER TABLE and LOAD DATA INFILE.

    • SELECT: -1 if the statement returns a result set, or the number of rows "affected" if it does not. For example, for SELECT * FROM t1, ROW_COUNT() returns -1. For SELECT * FROM t1 INTO OUTFILE 'file_name', ROW_COUNT() returns the number of rows written to the file.

    • SIGNAL statements: 0.

    For UPDATE statements, the affected-rows value by default is the number of rows actually changed. If you specify the CLIENT_FOUND_ROWS flag to mysql_real_connect() when connecting to mysqld, the affected-rows value is the number of rows "found"; that is, matched by the WHERE clause.

    For REPLACE statements, the affected-rows value is 2 if the new row replaced an old row, because in this case, one row was inserted after the duplicate was deleted.

    For INSERT ... ON DUPLICATE KEY UPDATE statements, the affected-rows value is 1 if the row is inserted as a new row and 2 if an existing row is updated.

    The ROW_COUNT() value is similar to the value from the mysql_affected_rows() C API function and the row count that the mysql client displays following statement execution.

    mysql> INSERT INTO t VALUES(1),(2),(3);Query OK, 3 rows affected (0.00 sec)Records: 3  Duplicates: 0  Warnings: 0mysql> SELECT ROW_COUNT();+-------------+| ROW_COUNT() |+-------------+|   3 |+-------------+1 row in set (0.00 sec)mysql> DELETE FROM t WHERE i IN(1,2);Query OK, 2 rows affected (0.00 sec)mysql> SELECT ROW_COUNT();+-------------+| ROW_COUNT() |+-------------+|   2 |+-------------+1 row in set (0.00 sec)
    Important

    ROW_COUNT() is not replicated reliably using statement-based replication. This function is automatically replicated using row-based replication.

  • SCHEMA()

    This function is a synonym for DATABASE().

  • SESSION_USER()

    SESSION_USER() is a synonym for USER().

  • SYSTEM_USER()

    SYSTEM_USER() is a synonym for USER().

  • USER()

    Returns the current MySQL user name and host name as a string in the utf8 character set.

    mysql> SELECT USER(); -> 'davida@localhost'

    The value indicates the user name you specified when connecting to the server, and the client host from which you connected. The value can be different from that of CURRENT_USER().

    You can extract only the user name part like this:

    mysql> SELECT SUBSTRING_INDEX(USER(),'@',1); -> 'davida'
  • VERSION()

    Returns a string that indicates the MySQL server version. The string uses the utf8 character set. The value might have a suffix in addition to the version number. See the description of the version system variable in Section 5.1.4, "Server System Variables".

    This function is unsafe for statement-based replication. Beginning with MySQL 5.5.1, a warning is logged if you use this function when binlog_format is set to STATEMENT. (Bug #47995)

    mysql> SELECT VERSION(); -> '5.5.31-standard'

12.15. Miscellaneous Functions

Table 12.19. Miscellaneous Functions

NameDescription
DEFAULT()Return the default value for a table column
GET_LOCK()Get a named lock
INET_ATON()Return the numeric value of an IP address
INET_NTOA()Return the IP address from a numeric value
IS_FREE_LOCK()Checks whether the named lock is free
IS_USED_LOCK()Checks whether the named lock is in use. Return connection identifier if true.
MASTER_POS_WAIT()Block until the slave has read and applied all updates up to the specified position
NAME_CONST()Causes the column to have the given name
RAND()Return a random floating-point value
RELEASE_LOCK()Releases the named lock
SLEEP()Sleep for a number of seconds
UUID_SHORT()Return an integer-valued universal identifier
UUID()Return a Universal Unique Identifier (UUID)
VALUES()Defines the values to be used during an INSERT

  • DEFAULT(col_name)

    Returns the default value for a table column. An error results if the column has no default value.

    mysql> UPDATE t SET i = DEFAULT(i)+1 WHERE id < 100;
  • FORMAT(X,D)

    Formats the number X to a format like '#,###,###.##', rounded to D decimal places, and returns the result as a string. For details, see Section 12.5, "String Functions".

  • GET_LOCK(str,timeout)

    Tries to obtain a lock with a name given by the string str, using a timeout of timeout seconds. Returns 1 if the lock was obtained successfully, 0 if the attempt timed out (for example, because another client has previously locked the name), or NULL if an error occurred (such as running out of memory or the thread was killed with mysqladmin kill). If you have a lock obtained with GET_LOCK(), it is released when you execute RELEASE_LOCK(), execute a new GET_LOCK(), or your connection terminates (either normally or abnormally). Locks obtained with GET_LOCK() do not interact with transactions. That is, committing a transaction does not release any such locks obtained during the transaction.

    This function can be used to implement application locks or to simulate record locks. Names are locked on a server-wide basis. If a name has been locked by one client, GET_LOCK() blocks any request by another client for a lock with the same name. This enables clients that agree on a given lock name to use the name to perform cooperative advisory locking. But be aware that it also enables a client that is not among the set of cooperating clients to lock a name, either inadvertently or deliberately, and thus prevent any of the cooperating clients from locking that name. One way to reduce the likelihood of this is to use lock names that are database-specific or application-specific. For example, use lock names of the form db_name.str or app_name.str.

    mysql> SELECT GET_LOCK('lock1',10); -> 1mysql> SELECT IS_FREE_LOCK('lock2'); -> 1mysql> SELECT GET_LOCK('lock2',10); -> 1mysql> SELECT RELEASE_LOCK('lock2'); -> 1mysql> SELECT RELEASE_LOCK('lock1'); -> NULL

    The second RELEASE_LOCK() call returns NULL because the lock 'lock1' was automatically released by the second GET_LOCK() call.

    If multiple clients are waiting for a lock, the order in which they will acquire it is undefined and depends on factors such as the thread library in use. In particular, applications should not assume that clients will acquire the lock in the same order that they issued the lock requests.

    Note

    Before MySQL 5.5.3, if a client attempts to acquire a lock that is already held by another client, it blocks according to the timeout argument. If the blocked client terminates, its thread does not die until the lock request times out.

    This function is unsafe for statement-based replication. Beginning with MySQL 5.5.1, a warning is logged if you use this function when binlog_format is set to STATEMENT. (Bug #47995)

  • INET_ATON(expr)

    Given the dotted-quad representation of an IPv4 network address as a string, returns an integer that represents the numeric value of the address in network byte order (big endian). INET_ATON() returns NULL if it does not understand its argument.

    mysql> SELECT INET_ATON('10.0.5.9'); -> 167773449

    For this example, the return value is calculated as 10�2563 + 0�2562 + 5�256 + 9.

    INET_ATON() may or may not return a non-NULL result for short-form IP addresses (such as '127.1' as a representation of '127.0.0.1'). Because of this, INET_ATON()a should not be used for such addresses.

    Note

    To store values generated by INET_ATON(), use an INT UNSIGNED column rather than INT, which is signed. If you use a signed column, values corresponding to IP addresses for which the first octet is greater than 127 cannot be stored correctly. See Section 11.2.6, "Out-of-Range and Overflow Handling".

  • INET_NTOA(expr)

    Given a numeric IPv4 network address in network byte order, returns the dotted-quad representation of the address as a string. INET_NTOA() returns NULL if it does not understand its argument.

    As of MySQL 5.5.3, the return value is a nonbinary string in the connection character set. Before 5.5.3, the return value is a binary string.

    mysql> SELECT INET_NTOA(167773449); -> '10.0.5.9'
  • IS_FREE_LOCK(str)

    Checks whether the lock named str is free to use (that is, not locked). Returns 1 if the lock is free (no one is using the lock), 0 if the lock is in use, and NULL if an error occurs (such as an incorrect argument).

    This function is unsafe for statement-based replication. Beginning with MySQL 5.5.1, a warning is logged if you use this function when binlog_format is set to STATEMENT. (Bug #47995)

  • IS_USED_LOCK(str)

    Checks whether the lock named str is in use (that is, locked). If so, it returns the connection identifier of the client that holds the lock. Otherwise, it returns NULL.

    This function is unsafe for statement-based replication. Beginning with MySQL 5.5.1, a warning is logged if you use this function when binlog_format is set to STATEMENT. (Bug #47995)

  • MASTER_POS_WAIT(log_name,log_pos[,timeout])

    This function is useful for control of master/slave synchronization. It blocks until the slave has read and applied all updates up to the specified position in the master log. The return value is the number of log events the slave had to wait for to advance to the specified position. The function returns NULL if the slave SQL thread is not started, the slave's master information is not initialized, the arguments are incorrect, or an error occurs. It returns -1 if the timeout has been exceeded. If the slave SQL thread stops while MASTER_POS_WAIT() is waiting, the function returns NULL. If the slave is past the specified position, the function returns immediately.

    If a timeout value is specified, MASTER_POS_WAIT() stops waiting when timeout seconds have elapsed. timeout must be greater than 0; a zero or negative timeout means no timeout.

    This function is unsafe for statement-based replication. Beginning with MySQL 5.5.1, a warning is logged if you use this function when binlog_format is set to STATEMENT. (Bug #47995)

  • NAME_CONST(name,value)

    Returns the given value. When used to produce a result set column, NAME_CONST() causes the column to have the given name. The arguments should be constants.

    mysql> SELECT NAME_CONST('myname', 14);+--------+| myname |+--------+| 14 |+--------+

    This function is for internal use only. The server uses it when writing statements from stored programs that contain references to local program variables, as described in Section 19.7, "Binary Logging of Stored Programs", You might see this function in the output from mysqlbinlog.

  • RELEASE_LOCK(str)

    Releases the lock named by the string str that was obtained with GET_LOCK(). Returns 1 if the lock was released, 0 if the lock was not established by this thread (in which case the lock is not released), and NULL if the named lock did not exist. The lock does not exist if it was never obtained by a call to GET_LOCK() or if it has previously been released.

    The DO statement is convenient to use with RELEASE_LOCK(). See Section 13.2.3, "DO Syntax".

    This function is unsafe for statement-based replication. Beginning with MySQL 5.5.1, a warning is logged if you use this function when binlog_format is set to STATEMENT. (Bug #47995)

  • SLEEP(duration)

    Sleeps (pauses) for the number of seconds given by the duration argument, then returns 0. If SLEEP() is interrupted, it returns 1. The duration may have a fractional part given in microseconds.

    This function is unsafe for statement-based replication. Beginning with MySQL 5.5.1, a warning is logged if you use this function when binlog_format is set to STATEMENT. (Bug #47995)

  • UUID()

    Returns a Universal Unique Identifier (UUID) generated according to "DCE 1.1: Remote Procedure Call" (Appendix A) CAE (Common Applications Environment) Specifications published by The Open Group in October 1997 (Document Number C706, http://www.opengroup.org/public/pubs/catalog/c706.htm).

    A UUID is designed as a number that is globally unique in space and time. Two calls to UUID() are expected to generate two different values, even if these calls are performed on two separate computers that are not connected to each other.

    A UUID is a 128-bit number represented by a utf8 string of five hexadecimal numbers in aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee format:

    • The first three numbers are generated from a timestamp.

    • The fourth number preserves temporal uniqueness in case the timestamp value loses monotonicity (for example, due to daylight saving time).

    • The fifth number is an IEEE 802 node number that provides spatial uniqueness. A random number is substituted if the latter is not available (for example, because the host computer has no Ethernet card, or we do not know how to find the hardware address of an interface on your operating system). In this case, spatial uniqueness cannot be guaranteed. Nevertheless, a collision should have very low probability.

      Currently, the MAC address of an interface is taken into account only on FreeBSD and Linux. On other operating systems, MySQL uses a randomly generated 48-bit number.

    mysql> SELECT UUID(); -> '6ccd780c-baba-1026-9564-0040f4311e29'
    Warning

    Although UUID() values are intended to be unique, they are not necessarily unguessable or unpredictable. If unpredictability is required, UUID values should be generated some other way.

    Note

    UUID() does not work with statement-based replication.

  • UUID_SHORT()

    Returns a "short" universal identifier as a 64-bit unsigned integer (rather than a string-form 128-bit identifier as returned by the UUID() function).

    The value of UUID_SHORT() is guaranteed to be unique if the following conditions hold:

    • The server_id of the current host is unique among your set of master and slave servers

    • server_id is between 0 and 255

    • You do not set back your system time for your server between mysqld restarts

    • You do not invoke UUID_SHORT() on average more than 16 million times per second between mysqld restarts

    The UUID_SHORT() return value is constructed this way:

      (server_id & 255) << 56+ (server_startup_time_in_seconds << 24)+ incremented_variable++;
    mysql> SELECT UUID_SHORT(); -> 92395783831158784

    Note that UUID_SHORT() does not work with statement-based replication.

  • VALUES(col_name)

    In an INSERT ... ON DUPLICATE KEY UPDATE statement, you can use the VALUES(col_name) function in the UPDATE clause to refer to column values from the INSERT portion of the statement. In other words, VALUES(col_name) in the UPDATE clause refers to the value of col_name that would be inserted, had no duplicate-key conflict occurred. This function is especially useful in multiple-row inserts. The VALUES() function is meaningful only in the ON DUPLICATE KEY UPDATE clause of INSERT statements and returns NULL otherwise. See Section 13.2.5.3, "INSERT ... ON DUPLICATE KEY UPDATE Syntax".

    mysql> INSERT INTO table (a,b,c) VALUES (1,2,3),(4,5,6) -> ON DUPLICATE KEY UPDATE c=VALUES(a)+VALUES(b);

12.16. Functions and Modifiers for Use with GROUP BY Clauses

12.16.1. GROUP BY (Aggregate) Functions

Table 12.20. Aggregate (GROUP BY)Functions

NameDescription
AVG()Return the average value of the argument
BIT_AND()Return bitwise and
BIT_OR()Return bitwise or
BIT_XOR()Return bitwise xor
COUNT(DISTINCT)Return the count of a number of different values
COUNT()Return a count of the number of rows returned
GROUP_CONCAT()Return a concatenated string
MAX()Return the maximum value
MIN()Return the minimum value
STD()Return the population standard deviation
STDDEV_POP()Return the population standard deviation
STDDEV_SAMP()Return the sample standard deviation
STDDEV()Return the population standard deviation
SUM()Return the sum
VAR_POP()Return the population standard variance
VAR_SAMP()Return the sample variance
VARIANCE()Return the population standard variance

This section describes group (aggregate) functions that operate on sets of values. Unless otherwise stated, group functions ignore NULL values.

If you use a group function in a statement containing no GROUP BY clause, it is equivalent to grouping on all rows. For more information, see Section 12.16.3, "MySQL Extensions to GROUP BY".

For numeric arguments, the variance and standard deviation functions return a DOUBLE value. The SUM() and AVG() functions return a DECIMAL value for exact-value arguments (integer or DECIMAL), and a DOUBLE value for approximate-value arguments (FLOAT or DOUBLE).

The SUM() and AVG() aggregate functions do not work with temporal values. (They convert the values to numbers, losing everything after the first nonnumeric character.) To work around this problem, convert to numeric units, perform the aggregate operation, and convert back to a temporal value. Examples:

SELECT SEC_TO_TIME(SUM(TIME_TO_SEC(time_col))) FROM tbl_name;SELECT FROM_DAYS(SUM(TO_DAYS(date_col))) FROM tbl_name;

Functions such as SUM() or AVG() that expect a numeric argument cast the argument to a number if necessary. For SET or ENUM values, the cast operation causes the underlying numeric value to be used.

  • AVG([DISTINCT] expr)

    Returns the average value of expr. The DISTINCT option can be used to return the average of the distinct values of expr.

    AVG() returns NULL if there were no matching rows.

    mysql> SELECT student_name, AVG(test_score) -> FROM student -> GROUP BY student_name;
  • BIT_AND(expr)

    Returns the bitwise AND of all bits in expr. The calculation is performed with 64-bit (BIGINT) precision.

    This function returns 18446744073709551615 if there were no matching rows. (This is the value of an unsigned BIGINT value with all bits set to 1.)

  • BIT_OR(expr)

    Returns the bitwise OR of all bits in expr. The calculation is performed with 64-bit (BIGINT) precision.

    This function returns 0 if there were no matching rows.

  • BIT_XOR(expr)

    Returns the bitwise XOR of all bits in expr. The calculation is performed with 64-bit (BIGINT) precision.

    This function returns 0 if there were no matching rows.

  • COUNT(expr)

    Returns a count of the number of non-NULL values of expr in the rows retrieved by a SELECT statement. The result is a BIGINT value.

    COUNT() returns 0 if there were no matching rows.

    mysql> SELECT student.student_name,COUNT(*) -> FROM student,course -> WHERE student.student_id=course.student_id -> GROUP BY student_name;

    COUNT(*) is somewhat different in that it returns a count of the number of rows retrieved, whether or not they contain NULL values.

    COUNT(*) is optimized to return very quickly if the SELECT retrieves from one table, no other columns are retrieved, and there is no WHERE clause. For example:

    mysql> SELECT COUNT(*) FROM student;

    This optimization applies only to MyISAM tables only, because an exact row count is stored for this storage engine and can be accessed very quickly. For transactional storage engines such as InnoDB, storing an exact row count is more problematic because multiple transactions may be occurring, each of which may affect the count.

  • COUNT(DISTINCT expr,[expr...])

    Returns a count of the number of rows with different non-NULL expr values.

    COUNT(DISTINCT) returns 0 if there were no matching rows.

    mysql> SELECT COUNT(DISTINCT results) FROM student;

    In MySQL, you can obtain the number of distinct expression combinations that do not contain NULL by giving a list of expressions. In standard SQL, you would have to do a concatenation of all expressions inside COUNT(DISTINCT ...).

  • GROUP_CONCAT(expr)

    This function returns a string result with the concatenated non-NULL values from a group. It returns NULL if there are no non-NULL values. The full syntax is as follows:

    GROUP_CONCAT([DISTINCT] expr [,expr ...] [ORDER BY {unsigned_integer | col_name | expr} [ASC | DESC] [,col_name ...]] [SEPARATOR str_val])
    mysql> SELECT student_name, -> GROUP_CONCAT(test_score) -> FROM student -> GROUP BY student_name;

    Or:

    mysql> SELECT student_name, -> GROUP_CONCAT(DISTINCT test_score ->   ORDER BY test_score DESC SEPARATOR ' ') -> FROM student -> GROUP BY student_name;

    In MySQL, you can get the concatenated values of expression combinations. To eliminate duplicate values, use the DISTINCT clause. To sort values in the result, use the ORDER BY clause. To sort in reverse order, add the DESC (descending) keyword to the name of the column you are sorting by in the ORDER BY clause. The default is ascending order; this may be specified explicitly using the ASC keyword. The default separator between values in a group is comma (","). To specify a separator explicitly, use SEPARATOR followed by the string literal value that should be inserted between group values. To eliminate the separator altogether, specify SEPARATOR ''.

    The result is truncated to the maximum length that is given by the group_concat_max_len system variable, which has a default value of 1024. The value can be set higher, although the effective maximum length of the return value is constrained by the value of max_allowed_packet. The syntax to change the value of group_concat_max_len at runtime is as follows, where val is an unsigned integer:

    SET [GLOBAL | SESSION] group_concat_max_len = val;

    The return value is a nonbinary or binary string, depending on whether the arguments are nonbinary or binary strings. The result type is TEXT or BLOB unless group_concat_max_len is less than or equal to 512, in which case the result type is VARCHAR or VARBINARY.

    See also CONCAT() and CONCAT_WS(): Section 12.5, "String Functions".

  • MAX([DISTINCT] expr)

    Returns the maximum value of expr. MAX() may take a string argument; in such cases, it returns the maximum string value. See Section 8.3.1, "How MySQL Uses Indexes". The DISTINCT keyword can be used to find the maximum of the distinct values of expr, however, this produces the same result as omitting DISTINCT.

    MAX() returns NULL if there were no matching rows.

    mysql> SELECT student_name, MIN(test_score), MAX(test_score) -> FROM student -> GROUP BY student_name;

    For MAX(), MySQL currently compares ENUM and SET columns by their string value rather than by the string's relative position in the set. This differs from how ORDER BY compares them. This is expected to be rectified in a future MySQL release.

  • MIN([DISTINCT] expr)

    Returns the minimum value of expr. MIN() may take a string argument; in such cases, it returns the minimum string value. See Section 8.3.1, "How MySQL Uses Indexes". The DISTINCT keyword can be used to find the minimum of the distinct values of expr, however, this produces the same result as omitting DISTINCT.

    MIN() returns NULL if there were no matching rows.

    mysql> SELECT student_name, MIN(test_score), MAX(test_score) -> FROM student -> GROUP BY student_name;

    For MIN(), MySQL currently compares ENUM and SET columns by their string value rather than by the string's relative position in the set. This differs from how ORDER BY compares them. This is expected to be rectified in a future MySQL release.

  • STD(expr)

    Returns the population standard deviation of expr. This is an extension to standard SQL. The standard SQL function STDDEV_POP() can be used instead.

    This function returns NULL if there were no matching rows.

  • STDDEV(expr)

    Returns the population standard deviation of expr. This function is provided for compatibility with Oracle. The standard SQL function STDDEV_POP() can be used instead.

    This function returns NULL if there were no matching rows.

  • STDDEV_POP(expr)

    Returns the population standard deviation of expr (the square root of VAR_POP()). You can also use STD() or STDDEV(), which are equivalent but not standard SQL.

    STDDEV_POP() returns NULL if there were no matching rows.

  • STDDEV_SAMP(expr)

    Returns the sample standard deviation of expr (the square root of VAR_SAMP().

    STDDEV_SAMP() returns NULL if there were no matching rows.

  • SUM([DISTINCT] expr)

    Returns the sum of expr. If the return set has no rows, SUM() returns NULL. The DISTINCT keyword can be used to sum only the distinct values of expr.

    SUM() returns NULL if there were no matching rows.

  • VAR_POP(expr)

    Returns the population standard variance of expr. It considers rows as the whole population, not as a sample, so it has the number of rows as the denominator. You can also use VARIANCE(), which is equivalent but is not standard SQL.

    VAR_POP() returns NULL if there were no matching rows.

  • VAR_SAMP(expr)

    Returns the sample variance of expr. That is, the denominator is the number of rows minus one.

    VAR_SAMP() returns NULL if there were no matching rows.

  • VARIANCE(expr)

    Returns the population standard variance of expr. This is an extension to standard SQL. The standard SQL function VAR_POP() can be used instead.

    VARIANCE() returns NULL if there were no matching rows.

12.16.2. GROUP BY Modifiers

The GROUP BY clause permits a WITH ROLLUP modifier that causes extra rows to be added to the summary output. These rows represent higher-level (or super-aggregate) summary operations. ROLLUP thus enables you to answer questions at multiple levels of analysis with a single query. It can be used, for example, to provide support for OLAP (Online Analytical Processing) operations.

Suppose that a table named sales has year, country, product, and profit columns for recording sales profitability:

CREATE TABLE sales( year INT NOT NULL, country VARCHAR(20) NOT NULL, product VARCHAR(32) NOT NULL, profit  INT);

The table's contents can be summarized per year with a simple GROUP BY like this:

mysql> SELECT year, SUM(profit) FROM sales GROUP BY year;+------+-------------+| year | SUM(profit) |+------+-------------+| 2000 | 4525 || 2001 | 3010 |+------+-------------+

This output shows the total profit for each year, but if you also want to determine the total profit summed over all years, you must add up the individual values yourself or run an additional query.

Or you can use ROLLUP, which provides both levels of analysis with a single query. Adding a WITH ROLLUP modifier to the GROUP BY clause causes the query to produce another row that shows the grand total over all year values:

mysql> SELECT year, SUM(profit) FROM sales GROUP BY year WITH ROLLUP;+------+-------------+| year | SUM(profit) |+------+-------------+| 2000 | 4525 || 2001 | 3010 || NULL | 7535 |+------+-------------+

The grand total super-aggregate line is identified by the value NULL in the year column.

ROLLUP has a more complex effect when there are multiple GROUP BY columns. In this case, each time there is a "break" (change in value) in any but the last grouping column, the query produces an extra super-aggregate summary row.

For example, without ROLLUP, a summary on the sales table based on year, country, and product might look like this:

mysql> SELECT year, country, product, SUM(profit) -> FROM sales -> GROUP BY year, country, product;+------+---------+------------+-------------+| year | country | product | SUM(profit) |+------+---------+------------+-------------+| 2000 | Finland | Computer   | 1500 || 2000 | Finland | Phone  | 100 || 2000 | India   | Calculator | 150 || 2000 | India   | Computer   | 1200 || 2000 | USA | Calculator |  75 || 2000 | USA | Computer   | 1500 || 2001 | Finland | Phone  |  10 || 2001 | USA | Calculator |  50 || 2001 | USA | Computer   | 2700 || 2001 | USA | TV | 250 |+------+---------+------------+-------------+

The output indicates summary values only at the year/country/product level of analysis. When ROLLUP is added, the query produces several extra rows:

mysql> SELECT year, country, product, SUM(profit) -> FROM sales -> GROUP BY year, country, product WITH ROLLUP;+------+---------+------------+-------------+| year | country | product | SUM(profit) |+------+---------+------------+-------------+| 2000 | Finland | Computer   | 1500 || 2000 | Finland | Phone  | 100 || 2000 | Finland | NULL   | 1600 || 2000 | India   | Calculator | 150 || 2000 | India   | Computer   | 1200 || 2000 | India   | NULL   | 1350 || 2000 | USA | Calculator |  75 || 2000 | USA | Computer   | 1500 || 2000 | USA | NULL   | 1575 || 2000 | NULL | NULL   | 4525 || 2001 | Finland | Phone  |  10 || 2001 | Finland | NULL   |  10 || 2001 | USA | Calculator |  50 || 2001 | USA | Computer   | 2700 || 2001 | USA | TV | 250 || 2001 | USA | NULL   | 3000 || 2001 | NULL | NULL   | 3010 || NULL | NULL | NULL   | 7535 |+------+---------+------------+-------------+

For this query, adding ROLLUP causes the output to include summary information at four levels of analysis, not just one. Here is how to interpret the ROLLUP output:

  • Following each set of product rows for a given year and country, an extra summary row is produced showing the total for all products. These rows have the product column set to NULL.

  • Following each set of rows for a given year, an extra summary row is produced showing the total for all countries and products. These rows have the country and products columns set to NULL.

  • Finally, following all other rows, an extra summary row is produced showing the grand total for all years, countries, and products. This row has the year, country, and products columns set to NULL.

Other Considerations When using ROLLUP

The following items list some behaviors specific to the MySQL implementation of ROLLUP:

When you use ROLLUP, you cannot also use an ORDER BY clause to sort the results. In other words, ROLLUP and ORDER BY are mutually exclusive. However, you still have some control over sort order. GROUP BY in MySQL sorts results, and you can use explicit ASC and DESC keywords with columns named in the GROUP BY list to specify sort order for individual columns. (The higher-level summary rows added by ROLLUP still appear after the rows from which they are calculated, regardless of the sort order.)

LIMIT can be used to restrict the number of rows returned to the client. LIMIT is applied after ROLLUP, so the limit applies against the extra rows added by ROLLUP. For example:

mysql> SELECT year, country, product, SUM(profit) -> FROM sales -> GROUP BY year, country, product WITH ROLLUP -> LIMIT 5;+------+---------+------------+-------------+| year | country | product | SUM(profit) |+------+---------+------------+-------------+| 2000 | Finland | Computer   | 1500 || 2000 | Finland | Phone  | 100 || 2000 | Finland | NULL   | 1600 || 2000 | India   | Calculator | 150 || 2000 | India   | Computer   | 1200 |+------+---------+------------+-------------+

Using LIMIT with ROLLUP may produce results that are more difficult to interpret, because you have less context for understanding the super-aggregate rows.

The NULL indicators in each super-aggregate row are produced when the row is sent to the client. The server looks at the columns named in the GROUP BY clause following the leftmost one that has changed value. For any column in the result set with a name that is a lexical match to any of those names, its value is set to NULL. (If you specify grouping columns by column number, the server identifies which columns to set to NULL by number.)

Because the NULL values in the super-aggregate rows are placed into the result set at such a late stage in query processing, you cannot test them as NULL values within the query itself. For example, you cannot add HAVING product IS NULL to the query to eliminate from the output all but the super-aggregate rows.

On the other hand, the NULL values do appear as NULL on the client side and can be tested as such using any MySQL client programming interface.

12.16.3. MySQL Extensions to GROUP BY

In standard SQL, a query that includes a GROUP BY clause cannot refer to nonaggregated columns in the select list that are not named in the GROUP BY clause. For example, this query is illegal in standard SQL because the name column in the select list does not appear in the GROUP BY:

SELECT o.custid, c.name, MAX(o.payment)  FROM orders AS o, customers AS c  WHERE o.custid = c.custid  GROUP BY o.custid;

For the query to be legal, the name column must be omitted from the select list or named in the GROUP BY clause.

MySQL extends the use of GROUP BY so that the select list can refer to nonaggregated columns not named in the GROUP BY clause. This means that the preceding query is legal in MySQL. You can use this feature to get better performance by avoiding unnecessary column sorting and grouping. However, this is useful primarily when all values in each nonaggregated column not named in the GROUP BY are the same for each group. The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate. Furthermore, the selection of values from each group cannot be influenced by adding an ORDER BY clause. Sorting of the result set occurs after values have been chosen, and ORDER BY does not affect which values the server chooses.

A similar MySQL extension applies to the HAVING clause. In standard SQL, a query that includes a GROUP BY clause cannot refer to nonaggregated columns in the HAVING clause that are not named in the GROUP BY clause. A MySQL extension permits references to such columns to simplify calculations. This extension assumes that the nongrouped columns will have the same group-wise values. Otherwise, the result is indeterminate.

To disable the MySQL GROUP BY extension, enable the ONLY_FULL_GROUP_BY SQL mode. This enables standard SQL behavior: Columns not named in the GROUP BY clause cannot be used in the select list or HAVING clause unless enclosed in an aggregate function.

ONLY_FULL_GROUP_BY also affects use of aliases in the HAVING clauses. For example, the following query returns name values that occur only once in table orders:

SELECT name, COUNT(name) FROM orders  GROUP BY name  HAVING COUNT(name) = 1;

MySQL extends this behavior to permit the use of an alias in the HAVING clause for the aggregated column:

SELECT name, COUNT(name) AS c FROM orders  GROUP BY name  HAVING c = 1;

Enabling ONLY_FULL_GROUP_BY disables this MySQL extension and a non-grouping field 'c' is used in HAVING clause error occurs because the column c in the HAVING clause is not enclosed in an aggregate function (instead, it is an aggregate function).

The select list extension also applies to ORDER BY. That is, you can refer to nonaggregated columns in the ORDER BY clause that do not appear in the GROUP BY clause. (However, as mentioned previously, ORDER BY does not affect which values are chosen from nonaggregated columns; it only sorts them after they have been chosen.) This extension does not apply if the ONLY_FULL_GROUP_BY SQL mode is enabled.

In some cases, you can use MIN() and MAX() to obtain a specific column value even if it is not unique. If the sort column contains integers no larger than 6 digits, the following query gives the value of column from the row containing the smallest sort value:

SUBSTR(MIN(CONCAT(LPAD(sort,6,'0'),column)),7)

See Section 3.6.4, "The Rows Holding the Group-wise Maximum of a Certain Column".

If you are trying to follow standard SQL, you cannot use expressions in GROUP BY clauses. As a workaround, use an alias for the expression:

SELECT id, FLOOR(value/100) AS val  FROM tbl_name  GROUP BY id, val;

MySQL permits expressions in GROUP BY clauses, so the alias is unnecessary:

SELECT id, FLOOR(value/100)  FROM tbl_name  GROUP BY id, FLOOR(value/100);
Copyright © 1997, 2013, Oracle and/or its affiliates. All rights reserved. Legal Notices
(Sebelumnya) 12.5. String Functions12.17. Spatial Extensions (Berikutnya)