The purpose of this post is to explain, with some examples, how to use the power of extended patterns in your ksh93 scripts. It is assumed that you are reasonably familiar with basic regular expressions (BREs) as implemented in sed, grep or awk. If you need a introductory tutorial on regular expressions, here is one at IBM developerWorks.
The following table shows the the basic pattern matching operators in ksh93.
?(pattern) | Match if found 0 or 1 times | |
*(pattern) | Match if found 0 or more times | |
+(pattern) | Match if found 1 or more times | |
@(pattern1|...) | Match if any of the patterns found | |
!(pattern) | Match if no pattern found |
Note that an operator must preceed a pattern in ksh93 whereas in egrep, sed and awk, the operator is placed after the pattern.
Here is an example of how to use the above operators to modify the contents of the string str
str="Joe Mike and Dave are all good friends"
print ${str//a?(re)/_}
# output: Joe Mike _nd D_ve _ _ll good friends
print ${str//g*(o)/_}
# output: Joe Mike and Dave are all _d friends
print ${str//+(o)/_}
# output: J_ Mike and Dave are all g_d friends
print ${str//@(Joe|Mike|Dave)/_}
# output: _ _ and _ are all good friends
print ${str//@(Joe|Mike|g*(o))/_}
# output: _ _ and Dave are all _d friends
print ${str//!(Joe)/_}
# output: _
print ${str//!(Joe|Mike|Dave)/_}
# output: _
str='An extended pattern expression'
print ${str//e/#}
print ${str//[^e]/#}
print ${str//+(e)/#}
print ${str//-(e)/#}
print ${str//?(e)/#}
print ${str//*(e)/#}
print ${str//!(e)/#}
An #xt#nd#d patt#rn #xpr#ssion
###e##e##e######e###e###e######
An #xt#nd#d patt#rn #xpr#ssion
An extended pattern expression
#A#n# #x#t#n#d#d# #p#a#t#t#r#n# #x#p#r#s#s#i#o#n#
#A#n# #x#t#n#d#d# #p#a#t#t#r#n# #x#p#r#s#s#i#o#n#
#
{n}(pattern) | Match if found exactly n times where n is a non-negative number |
{n,m}(pattern) | Match if found between n and m times where n and m are non-negative integers and n <= m |
Here is an example of how to use the above interval quantifiers to match various strings.
print $(
[[ aaaa == {4}(a) ]] || print $?
[[ aaaa == {,4}(a) ]] || print $?
[[ aaaa == {3,}(a) ]] || print $?
[[ aaaa == {2,4}(a) ]] || print $?
[[ abc == {1,4}(ab)c ]] || print $?
[[ abcabc == {,2}(abc) ]] || print $?
[[ abababcc == {1,4}(ab){1,2}(c) ]] || print $?
[[ abc == {1,4}(ab){1,2}(c) ]] || print $?
[[ abcdcdabcd == {3,6}(ab|cd) ]] || print $?
[[ abcdcdabcde == {5}(ab|cd)e ]] || print $?
)
You can use the '-' qualifier to indicate to the shell that you want to use non-geedy matching as shown in the table below.
?-(pattern) | Shortest match if found 0 or 1 times |
*-(pattern) | Shortest match if found 0 or more times |
+-(pattern) | Shortest match if found 1 or more times |
@-(pattern1|…) | Shortest match if any of the patterns found |
{n,m}-(pattern) | Shortest match if found between n and m times |
Alternatively, you can use the ~(-g) subpattern to indicate to ksh93 that you want to use non-geedy matching. The following examples show both methods.
str="bcdabdcbabcd"
print " Greedy: ${str/+(*ab)/_}"
print "Non-greedy: ${str/+-(*ab)/_}"
str="heleelloo hello"
print " Greedy: ${str//he*l/_}"
print "Non-greedy: ${str//~(-g)he*l/_}"
print " Greedy: ${str//?(he*ll)/_}"
print "Non-greedy: ${str//~(-g)?(he*ll)/_}"
print "Non-greedy: ${str//?-(he*ll)/_}"
print " Greedy: ${str//+(he*ll)/_}"
print "Non-greedy: ${str//+-(he*ll)/_}"
print " Greedy: ${str//*(he*ll)/_}"
print "Non-greedy: ${str//*-(he*ll)/_}"
print " Greedy: ${str//{1,2}(he*ll)/_}"
print "Non-greedy: ${str//~(-g){1,2}(he*l)/_}"
?(pattern-list) | Optionally matches any one of the patterns |
*(pattern-list) | Matches zero or more occurrences of the patterns. |
+(pattern-list) | Matches one or more occurrences of the patterns. |
{n}(pattern-list) | Matches exactly n occurrences of the patterns. |
{m,n}(pattern-list) | Matches m to n occurrences of the patterns. If m is omitted, 0 is used. If n is omitted at least m occurrences are matched. |
@(pattern-list) | Matches exactly one of the patterns. |
!(pattern-list) | Matches anything except one of the patterns. |
Again, by default, matching is greedy. Each pattern in the pattern-list attempts to match the longest string possible consistent with generating the longest overall match. If more than one match is possible, the match starting closest to the beginning of the string will be chosen. However, for each of the above compound patterns a − can be inserted in front of the ( to specify that the shortest match to the specified pattern-list should be used.
Finer grained control of extended pattern matching is possible using sub-patterns of the form ~(options:pattern-list) where :pattern-list is optional and options consists of one or more of the following option flags:
+ | Enable following options (default) |
- | Disable following options |
E | Remainder of the pattern uses ERE pattern syntax |
F | Remainder of the pattern uses fgrep-like pattern syntax. |
G | Remainder of pattern uses BRE pattern syntax |
K | Remainder of pattern uses ksh93 pattern syntax (default) |
i | Case insensitive match |
g | Greedy match (default) |
l | Left anchor pattern. |
r | Right anchor pattern. |
If both options and :pattern-list are specified, then the specified options apply only to :pattern-list. Otherwise, the specified options remain in effect until disabled by a subsequent ~(...) sub-pattern or at the end of the sub-pattern containing ~(...).
ksh93 provides a way to translate extended patterns into regular expressions and vice-versa by means of two printf options.
$ printf "%R\n" "*[!0-9]*'
[^0-9]
$ printf "%P\n" "([0-9]+\.){3}"
*{3}(+([0-9])\.)*
$
Enjoy!
P.S. All the examples included in this post were tested on ksh93t+ 12/10/2008.
0 comments:
Post a Comment