I spend 3 hours to learn about regular expression. The lesson comes from The Linux Command Line
, that is one open source e-book about the Linux bash command.To demo the regular expressions, this book use the grep
command:
The name
grep
is actually derived from the phrase "global regular expression print".
metacharacters
Regular expression metacharacters consist of the following:
^ $ . [] {} - ? * + () | \
POSIX Character Classes
Basically , we need to understand a little about the characters code history:
Back when Unix was first developed, it only knew about ASCII characters, and this fea- ture reflects that fact. In ASCII, the first 32 characters (numbers 0-31) are control codes (things like tabs, backspaces, and carriage returns). The next 32 (32-63) contain printable characters, including most punctuation characters and the numerals zero through nine. The next 32 (numbers 64-95) contain the uppercase letters and a few more punctuation symbols. The final 31 (numbers 96-127) contain the lowercase letters and yet more punc- tuation symbols. Based on this arrangement, systems using ASCII used a collation order that looked like this:
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
This differs from proper dictionary order, which is like this:
aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ
As the popularity of Unix spread beyond the United States, there grew a need to support characters not found in U.S. English. The ASCII table was expanded to use a full eight bits, adding characters numbers 128-255, which accommodated many more languages. To support this ability, the POSIX standards introduced a concept called a locale, which could be adjusted to select the character set needed for a particular location.
We also need to pay attention to the difference between pathname expansion and regular expression, but POSIX characters classes can be used for both.
BRE and ERE
BRE : basic regular expressions, following metacharacters are recognized:
^ $ . [] *
ERE: extended regular expression, besides the BRE metacharacters , the following metacharacters(AND THEIR ASSOCIATED FUNCTIONS) are ADDED:
() {} ? + |
The “(”, “)”, “{”, and “}” characters are treated as metacharacters in BRE if they are escaped with a backslash, whereas with ERE, preced- ing any metacharacter with a backslash causes it to be treated as a literal.
At last, what is the means to POSIX? POSIX is Portable Operating System Interface (with the "X" added to the end for extra snappiness).