Unix Regular Expressions (Pattern Matching)

Some Unix utilities, such as grep, and editors, such as vi, use regular expressions as search strings. A regular expression is a sequence of characters which follows certain rules of interpretation. Some of the characters are taken literally, while others are taken to mean special things.

A regular expression matches the longest possible string, starting as close as possible to the beginning of the line. Expressions which include spaces need delimiters.

Delimiter: A character which marks the start and end of a regular expression. The "/" is frequently used as a delimiter, but almost any other character may also serve as a delimiter.
Example: /this is a regular expression/

The special characters:

. (period) Match any single character.
Example: /c.t/ matches cat, cot, cut
Example: /t.k.n/ matches token, taken, tikin, etc.
 
* Repeat the preceding regular expression zero or more times. In this case, regular expression means the character (regular or special) immediately before the *, or a larger expression if the * follows the delimiter.
Example: /yz*/ matches y, yz, yz, yazz, etc.
Example: /(.*)/ matches as long a string as possible between parentheses.
 
^ Force the match to occur only at the beginning of the line.
Example: /^function/ matches function as the first item on a line.
 
$ Force the match to occur only at the end of a line.
Example: /terminator$/ matches terminator at end of a line.
 
[ ] Define a character class (a set of characters) as acceptable matches for any single character. One occurrence of the character class will match one character in the text; if you want to match a series of letters, use the character class followed by an *.
Example: /[Aa][Bb][Cc]/ matches ABC, abc, Abc, aBc, abC, ABc, aBC but does not match AaBbCc
Example: /[Aa]*[Bb]*[Cc]*/ matches AAaaBC, AbbbBc,AAbbCcccc,etc. That is, any combination of A's,B's, and C's, regardless of case.

A group of characters with contiguous ASCII codes, such as a-z, A-Z, 0-9, can be defined as a character class as: [a-z] [A-Z] [0-9]. This sort of definition may be combined with an explicit list of other characters: [a-zA-Z{}0-9] defines a character class containing capital and small letters, digits, and the left and right curly bracket.

When you want to find a special character as itself, you must quote it with a backslash.

Example: /[0-9][0-9]\.[0-9][0-9]/ will match 03.98, 45.76, etc.

To repeat a regular expression longer than a single character:

/\(th[ai][ts]\)*/ matches this, that, or thisthatthisthat

In replacement strings in vi and sed substitute commands, an & means the string that you searched for, and \n means the bracketed regular expression beginning with the nth \(.

Example: s/dollar/&s/ changes dollar to dollars
Example: s/\ ([0-9]\)\(Cost\)/\2\1/ changes 3Cost to Cost3, 5Cost to Cost5, etc.


Previous Page
Index
Next Page