KSH93 Extended I/0

The ksh93 exec command is a special overloaded built-in command that can be used to manipulate file descriptors or to replace the current shell with a new command.  Starting with version s of ksh93, which was released in 2006, ksh93 contains a number of very useful extensions to exec relating to file descriptors and input/output which most shell programmers are unaware of.

From the current ksh93 man page
<#((expr))    Evaluate  arithmetic  expression  expr   and
position file descriptor 0 to the resulting
value bytes from the start of the file. The
variables CUR and EOF evaluate to the cur-
rent offset and end-of-file offset respec-
tively when evaluating expr.

>#((offset)) The same as <# except applies to file
descriptor 1.

<#pattern Seeks forward to the beginning of the next
line containing pattern.

<##pattern The same as <# except that the portion of
the file that is skipped is copied to stan-
dard output.
This post is about the these four extensions.  It is assumed that you are familar with ksh93 and the standard shell redirection facilities that are available in most modern shells, e.g.
exec 1>file          # redirect stdout to file
exec 2>&1 # redirect stderr to same
exec 1>&- # close file descriptor
If not, a quick Web search will return plenty of tutorials on the subject.

As with most Unix and GNU/Linux man pages, the ksh93 man page is quite cryptic and concise so before we go any further I will attempt to describe what each extension can be used for.  The first extension is useful for determining the current offset within a file or can be used to return the size of a file by by returning the value of the offset of the end of the value.  (You can also get the current file offset with offset=$(fd<#).)  The second extension can be used to position a file descriptor anywhere within a file.  The third extension can be used to seek forward to the start of a line containing a pattern specified by you.  The final extension is useful in that it outputs all lines between the current offset and the start of a line containing a pattern specified by you.  A number of things should be noted.  First, these extensions only work with text files.  Second, newlines count towards the offset from the start of the file.  Another thing to watch out for is while it is not clear from the ksh93 man page, it appears you must open a file for both read and write is you want to perform random access.  Why this is so is unclear to me but you get odd results if you seek backwards in a file even if you are only reading the file.

Because exec is a special command, any failure in I/O redirection will cause the script that invokes it to immediately terminate.  This can be prevented by invoking exec from the command built-in.  Incidently, redirect is an predefined alias for command exec.

For our first example, we will demonstrate the how to use these extensions to perform some basic I/O operations on a simple text file.
#!/bin/ksh93

TMP=file.$$

# create the temporary file used by example
cat <<EOT >$TMP
aaa
bbb
ccc
ddd
eee
fff
EOT

# open file descriptor 3 for read/write
command exec 3<> $TMP || exit 1

# check file descriptor 3 position
print
print "At offset: $(3<#)"
if (($(3<#) != 0))
then
print "Not at offset 0"
exit 1
fi

# read in the first line and print it
read -u3
print $REPLY
print "At offset $(3<#) after reading line"
print

# search forward for string "ddd"
3<#"ddd"
print "At offset $(3<#) after search forward for 'ddd'"
read -u3
print $REPLY
print

# check that we are at offset 8 and, if so, read line
if (( $(3<# ((8))) != 8))
then
print "Not at offset 8"
exit 1
fi
print "At offset $(3<#) after specifying absolute offset of 8"
read -u3
print $REPLY
print

# go to end of file. We know by inspection that is at offset 24, so check.
if (( $(3<#((EOF))) != 4*6 ))
then
print "Not at EOF"
exit 1
fi
print "At offset $(3<#) after specifying EOF"
print

# backup one line i.e. 4 characters
3<#((CUR - 4))
print "At offset $(3<#) after backing up 4 characters"
read -u3
print $REPLY
print

redirect 3<&- || echo 'cannot close FD 3'

rm $TMP
The following is the output from this example
$ ./example1

At offset: 0
aaa
At offset 4 after reading line

At offset 12 after search forward for 'ddd'
ddd

At offset 8 after specifying absolute offset of 8
ccc

At offset 24 after specifying EOF

At offset 20 after backing up 4 characters
fff
$
Our next example will selectly output the lines of a file between 2 search strings 'test' and '**' and display these lines as a single line.
$!/bin/ksh93

TMP=file1.$$
OUT=file2.$$

# create the temporary file used by example
cat <<'EOT' >$TMP
aaa
test
bbb
ccc
ddd
eee
fff
**
ggg
EOT

# redirect stdout
exec 4>&1

exec 3< $TMP > $OUT

# seek to start of line containing "test"
3<#'test'
# advance 5 characters i.e. the start of next line
3<# ((CUR + 5))
# output all lines until EOF or "**" found
3<##'\*\**'

exec 3<&-

# swap back stdout
exec 1>&-
exec 1>&4
exec 4>&-

rm $TMP

# hack to output as a single line
out=$(<$OUT)
echo $out

rm $OUT
The following is the output when we run this shell script
$ ./example2
bbb ccc ddd eee fff
$
For our next example, consider the following fictitious file which contains the results of a series of pings to local systems.
Wed Nov  4 15:25:14 EST 2008
PING Host11 (10.0.201.51) 56(84) bytes of data.
--- Host11 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4010ms
rtt min/avg/max/mdev = 7.400/41.641/59.604/17.946 ms
.
Wed Nov 4 15:25:18 EST 2008
PING Host12 (10.0.202.51) 56(84) bytes of data.
--- Host12 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4016ms
rtt min/avg/max/mdev = 43.804/55.694/67.728/9.359 ms
.
Wed Nov 4 15:25:22 EST 2008
PING Host13 (10.0.205.51) 56(84) bytes of data.
--- Host13 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4007ms
rtt min/avg/max/mdev = 40.237/56.283/68.973/10.433 ms
.
Wed Nov 4 15:25:26 EST 2008
PING Host14 (10.0.201.52) 56(84) bytes of data.
--- Host14 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4007ms
rtt min/avg/max/mdev = 36.053/65.964/96.782/20.642 ms
The requirement is to create a table showing min, loss and mdev values for each host pinged.  Here is our shell script
#!/bin/ksh93

redirect 3< file

# get and store value of EOF
eof=$(3<#((EOF)))

# check that we are at start of file i.e. offset 0
if (( $(3<#((0))) != 0 ))
then
print "ERROR: not at offset 0"
exit 1
fi

# loop though the file and build one line of output per host
while (( $(3<#((CUR))) < $eof ))
do
3<#'PING*'
read -A -u3 arr1
3<#'*packet loss*'
IFS=" %" read -A -u3 arr2
IFS=" /" read -A -u3 arr3
printf "%s %6.3f %3d %6.3f\n" ${arr1[1]} ${arr3[6]} ${arr2[5]} ${arr3[9]}
done

redirect 3<&-
Here is the output after this script is run against the above file
$ ./example3
Host11 7.400 0 17.946
Host12 43.804 0 9.359
Host13 40.237 0 10.433
Host14 36.053 0 20.642
Our final example will demonstrate how to replace a line and append additional lines to an existing file.  Note also that we specify the use of a file descriptor > 10 by means of the extended file descriptor {variable} syntax.
#!/bin/ksh93
TMP=file.$$

# create temporary file
cat <<EOT >$TMP
aaa
bbb
ccc
ddd
eee
fff
EOT

# open FD > 10 for read/write
command exec {n} <> $TMP || exit 1
print "File descriptor assigned is: ${n}"
print

# search forward for string "ddd"
{n}<#"ddd"

# replace this string with "DDD"
print -u${n} -f "%.3c\n" D

# go to end of file
redirect {n}>#((EOF))

# add 2 lines of text
print -u${n} -f "%.6c\n" G
print -u${n} -f "%.6c\n" H

redirect {n}>&- || echo 'cannot close FD'

cat $TMP

rm $TMP
Here is the output for this script.
$ ./example4
File descriptor assigned is: 11

aaa
bbb
ccc
DDD
eee
fff
GGGGGG
HHHHHH
Note that ksh93 does not support syntax of the form exec {n}<>&- since <&- and >&- both close the file descriptor n.  If you do the following in ksh93
exec {n}<&-
exec {n}>&-
you do not get an error.  On the other hand, if you do the same in zsh an error message is issued saying file descriptor n is already closed.

As you can see from the these examples, the extended capabilities of exec in ksh93 can be used to do file manipulation and transformation operations that were previously only possible using utilities such as sed, grep and awk.  Enjoy!

0 comments:

Post a Comment