您的位置 » 首页 » 编程开发 » mysql正则:必须学习的mysql正则(rlike/regexp)参考资料

mysql正则:必须学习的mysql正则(rlike/regexp)参考资料

发表于5年前 | 作者: seay | 分类: 编程开发 | 孵化于:2013年03月22日 | 文章热度:37,247 次 全屏阅读

显示不全请点击全屏阅读

Table 12.9. String Regular Expression Operators

Name Description
NOT REGEXP Negation of REGEXP
REGEXP Pattern matching using regular expressions
RLIKE Synonym for REGEXP


A regular expression is a powerful way of specifying a pattern for a complex search.

MySQL uses Henry Spencer’s implementation of regular expressions, which is aimed at conformance with POSIX 1003.2. MySQL uses the extended version to support pattern-matching operations performed with the REGEXPoperator in SQL statements.

This section summarizes, with examples, the special characters and constructs that can be used in MySQL forREGEXP operations. It does not contain all the details that can be found in Henry Spencer’s regex(7) manual page. That manual page is included in MySQL source distributions, in the regex.7 file under the regex directory. See also Section 3.3.4.7, “Pattern Matching”.

  •  expr NOT REGEXP pat, expr NOT RLIKE pat

    This is the same as NOT (expr REGEXP pat).

  •  expr REGEXP pat, expr RLIKE pat

    Performs a pattern match of a string expression expr against a pattern pat. The pattern can be an extended regular expression. The syntax for regular expressions is discussed in Section 12.5.2, “Regular Expressions”. Returns 1 if expr matches pat; otherwise it returns 0. If either expr or pat is NULL, the result is NULL. RLIKE is a synonym for REGEXP, provided for mSQL compatibility.

    The pattern need not be a literal string. For example, it can be specified as a string expression or table column.

    Note

    Because MySQL uses the C escape syntax in strings (for example, \n to represent the newline character), you must double any \ that you use in your REGEXP strings.

    REGEXP is not case sensitive, except when used with binary strings.

    mysql> SELECT 'Monty!' REGEXP '.*'; -> 1
    mysql> SELECT 'new*\n*line' REGEXP 'new\\*.\\*line'; -> 1
    mysql> SELECT 'a' REGEXP 'A', 'a' REGEXP BINARY 'A'; -> 1  0
    mysql> SELECT 'a' REGEXP '^[a-d]'; -> 1
    

    REGEXP and RLIKE use the character set and collations of the arguments when deciding the type of a character and performing the comparison. If the arguments have different character sets or collations, coercibility rules apply as described in Section 10.1.7.5, “Collation of Expressions”.

    Warning

    The REGEXP and RLIKE operators work in byte-wise fashion, so they are not multi-byte safe and may produce unexpected results with multi-byte character sets. In addition, these operators compare characters by their byte values and accented characters may not compare as equal even if a given collation treats them as equal.

A regular expression describes a set of strings. The simplest regular expression is one that has no special characters in it. For example, the regular expression hello matches hello and nothing else.

Nontrivial regular expressions use certain special constructs so that they can match more than one string. For example, the regular expression hello|word matches either the string hello or the string word.

As a more complex example, the regular expression B[an]*s matches any of the strings Bananas, Baaaaas, Bs, and any other string starting with a B, ending with an s, and containing any number of a or n characters in between.

A regular expression for the REGEXP operator may use any of the following special characters and constructs:

  • ^

    Match the beginning of a string.

    mysql> SELECT 'fo\nfo' REGEXP '^fo$'; -> 0
    mysql> SELECT 'fofo' REGEXP '^fo'; -> 1
    
  • $

    Match the end of a string.

    mysql> SELECT 'fo\no' REGEXP '^fo\no$'; -> 1
    mysql> SELECT 'fo\no' REGEXP '^fo$'; -> 0
    
  • .

    Match any character (including carriage return and newline).

    mysql> SELECT 'fofo' REGEXP '^f.*$'; -> 1
    mysql> SELECT 'fo\r\nfo' REGEXP '^f.*$'; -> 1
    
  • a*

    Match any sequence of zero or more a characters.

    mysql> SELECT 'Ban' REGEXP '^Ba*n'; -> 1
    mysql> SELECT 'Baaan' REGEXP '^Ba*n'; -> 1
    mysql> SELECT 'Bn' REGEXP '^Ba*n'; -> 1
    
  • a+

    Match any sequence of one or more a characters.

    mysql> SELECT 'Ban' REGEXP '^Ba+n'; -> 1
    mysql> SELECT 'Bn' REGEXP '^Ba+n'; -> 0
    
  • a?

    Match either zero or one a character.

    mysql> SELECT 'Bn' REGEXP '^Ba?n'; -> 1
    mysql> SELECT 'Ban' REGEXP '^Ba?n'; -> 1
    mysql> SELECT 'Baan' REGEXP '^Ba?n'; -> 0
    
  • de|abc

    Match either of the sequences de or abc.

    mysql> SELECT 'pi' REGEXP 'pi|apa'; -> 1
    mysql> SELECT 'axe' REGEXP 'pi|apa'; -> 0
    mysql> SELECT 'apa' REGEXP 'pi|apa'; -> 1
    mysql> SELECT 'apa' REGEXP '^(pi|apa)$'; -> 1
    mysql> SELECT 'pi' REGEXP '^(pi|apa)$'; -> 1
    mysql> SELECT 'pix' REGEXP '^(pi|apa)$'; -> 0
    
  • (abc)*

    Match zero or more instances of the sequence abc.

    mysql> SELECT 'pi' REGEXP '^(pi)*$'; -> 1
    mysql> SELECT 'pip' REGEXP '^(pi)*$'; -> 0
    mysql> SELECT 'pipi' REGEXP '^(pi)*$'; -> 1
    
  • {1}, {2,3}

    {n} or {m,n} notation provides a more general way of writing regular expressions that match many occurrences of the previous atom (or piece) of the pattern. m and n are integers.

    • a*

      Can be written as a{0,}.

    • a+

      Can be written as a{1,}.

    • a?

      Can be written as a{0,1}.

    To be more precise, a{n} matches exactly n instances of a. a{n,} matches n or more instances of a. a{m,n}matches m through n instances of a, inclusive.

    m and n must be in the range from 0 to RE_DUP_MAX (default 255), inclusive. If both m and n are given, m must be less than or equal to n.

    mysql> SELECT 'abcde' REGEXP 'a[bcd]{2}e'; -> 0
    mysql> SELECT 'abcde' REGEXP 'a[bcd]{3}e'; -> 1
    mysql> SELECT 'abcde' REGEXP 'a[bcd]{1,10}e'; -> 1
    
  • [a-dX], [^a-dX]

    Matches any character that is (or is not, if ^ is used) either a, b, c, d or X. A  character between two other characters forms a range that matches all characters from the first character to the second. For example, [0-9]matches any decimal digit. To include a literal ] character, it must immediately follow the opening bracket [. To include a literal  character, it must be written first or last. Any character that does not have a defined special meaning inside a [] pair matches only itself.

    mysql> SELECT 'aXbc' REGEXP '[a-dXYZ]'; -> 1
    mysql> SELECT 'aXbc' REGEXP '^[a-dXYZ]$'; -> 0
    mysql> SELECT 'aXbc' REGEXP '^[a-dXYZ]+$'; -> 1
    mysql> SELECT 'aXbc' REGEXP '^[^a-dXYZ]+$'; -> 0
    mysql> SELECT 'gheis' REGEXP '^[^a-dXYZ]+$'; -> 1
    mysql> SELECT 'gheisa' REGEXP '^[^a-dXYZ]+$'; -> 0
    
  • [.characters.]

    Within a bracket expression (written using [ and ]), matches the sequence of characters of that collating element.characters is either a single character or a character name like newline. The following table lists the permissible character names.

    The following table shows the permissible character names and the characters that they match. For characters given as numeric values, the values are represented in octal.

    Name Character Name Character
    NUL 0 SOH 001
    STX 002 ETX 003
    EOT 004 ENQ 005
    ACK 006 BEL 007
    alert 007 BS 010
    backspace ‘\b’ HT 011
    tab ‘\t’ LF 012
    newline ‘\n’ VT 013
    vertical-tab ‘\v’ FF 014
    form-feed ‘\f’ CR 015
    carriage-return ‘\r’ SO 016
    SI 017 DLE 020
    DC1 021 DC2 022
    DC3 023 DC4 024
    NAK 025 SYN 026
    ETB 027 CAN 030
    EM 031 SUB 032
    ESC 033 IS4 034
    FS 034 IS3 035
    GS 035 IS2 036
    RS 036 IS1 037
    US 037 space ‘ ‘
    exclamation-mark ‘!’ quotation-mark ‘”‘
    number-sign ‘#’ dollar-sign ‘$’
    percent-sign ‘%’ ampersand ‘&’
    apostrophe ‘\” left-parenthesis ‘(‘
    right-parenthesis ‘)’ asterisk ‘*’
    plus-sign ‘+’ comma ‘,’
    hyphen ‘-‘ hyphen-minus ‘-‘
    period ‘.’ full-stop ‘.’
    slash ‘/’ solidus ‘/’
    zero ‘0’ one ‘1’
    two ‘2’ three ‘3’
    four ‘4’ five ‘5’
    six ‘6’ seven ‘7’
    eight ‘8’ nine ‘9’
    colon ‘:’ semicolon ‘;’
    less-than-sign ‘<‘ equals-sign ‘=’
    greater-than-sign ‘>’ question-mark ‘?’
    commercial-at ‘@’ left-square-bracket ‘[‘
    backslash ‘\\’ reverse-solidus ‘\\’
    right-square-bracket ‘]’ circumflex ‘^’
    circumflex-accent ‘^’ underscore ‘_’
    low-line ‘_’ grave-accent ‘`’
    left-brace ‘{‘ left-curly-bracket ‘{‘
    vertical-line ‘|’ right-brace ‘}’
    right-curly-bracket ‘}’ tilde ‘~’
    DEL 177    
    mysql> SELECT '~' REGEXP '[[.~.]]'; -> 1
    mysql> SELECT '~' REGEXP '[[.tilde.]]'; -> 1
    
  • [=character_class=]

    Within a bracket expression (written using [ and ]), [=character_class=] represents an equivalence class. It matches all characters with the same collation value, including itself. For example, if o and (+) are the members of an equivalence class, [[=o=]], [[=(+)=]], and [o(+)] are all synonymous. An equivalence class may not be used as an endpoint of a range.

  • [:character_class:]

    Within a bracket expression (written using [ and ]), [:character_class:] represents a character class that matches all characters belonging to that class. The following table lists the standard class names. These names stand for the character classes defined in the ctype(3) manual page. A particular locale may provide other class names. A character class may not be used as an endpoint of a range.

    Character Class Name Meaning
    alnum Alphanumeric characters
    alpha Alphabetic characters
    blank Whitespace characters
    cntrl Control characters
    digit Digit characters
    graph Graphic characters
    lower Lowercase alphabetic characters
    print Graphic or space characters
    punct Punctuation characters
    space Space, tab, newline, and carriage return
    upper Uppercase alphabetic characters
    xdigit Hexadecimal digit characters
    mysql> SELECT 'justalnums' REGEXP '[[:alnum:]]+'; -> 1
    mysql> SELECT '!!' REGEXP '[[:alnum:]]+'; -> 0
    
  • [[:<:]], [[:>:]]

    These markers stand for word boundaries. They match the beginning and end of words, respectively. A word is a sequence of word characters that is not preceded by or followed by word characters. A word character is an alphanumeric character in the alnum class or an underscore (_).

    mysql> SELECT 'a word a' REGEXP '[[:<:]]word[[:>:]]'; -> 1
    mysql> SELECT 'a xword a' REGEXP '[[:<:]]word[[:>:]]'; -> 0
    

To use a literal instance of a special character in a regular expression, precede it by two backslash (\) characters. The MySQL parser interprets one of the backslashes, and the regular expression library interprets the other. For example, to match the string 1+2 that contains the special + character, only the last of the following regular expressions is the correct one:

mysql> SELECT '1+2' REGEXP '1+2'; -> 0
mysql> SELECT '1+2' REGEXP '1\+2'; -> 0
mysql> SELECT '1+2' REGEXP '1\\+2'; -> 1

Tags:

mysql,

如果您喜欢我的博客,欢迎点击图片定订阅到邮箱填写您的邮件地址,订阅我们的精彩内容: 也可以点击链接【订阅到鲜果】

如果我的想法或工具帮助到了你,也可微信扫下方二维码打赏本人一杯咖啡


来自 Seay互联网安全博客
本文地址:http://www.cnseay.com/2458/
文章版权说明请看置顶文章,尊重作者,转载请以链接形式标明原文地址

马上分享给你的朋友吧~

发表评论

你的大名(必填)

你的邮箱(必填)

评论内容(必填)