Musings of an OS Plumber: January 2009

Scripting HAL

Recent releases of Fedora and other GNU/Linux distributions include a Hardware Abstraction Layer (HAL) which is used to support device plug-and-play capabilities. In this post I will show you how your shell scripts can use HAL to retrieve device and system information.

The term HAL is overloaded as it used to refer both to a specification and the actual software which implements the specification. From an application developers viewpoint, HAL is way to enumerate the capabilities and features of hardware attached to a system and receive notification when something about the hardware changes.

First, a very quick overview of HAL. Each item of physical hardware in a computer is regarded as being an device object which identified by a Unique Device Identifier (UDI). Associated with each device object is a variable set of well-defined typed key-value pairs (or metadata) called device properties which describe what each device object represents together with its properties. Some device properties are derived from the actual physical hardware, some are merged from XML-formatted files, known as Device Information Files, and some are derived from the actual device configuration. Mandatory device properties are defined in the HAL specification.

A HAL daemon is used to maintain and update the list of device objects forming a hardware configuration and is notified when devices are added or removed. HAL also provides callbacks so that the operating system and applications can react to hardware configuration changes in order to maintain system policy. An example of such a callback and policy is when you plug in a USB memory stick HAL automatically creates a mount point and mounts the device.

For a good backgrounder on HAL, see this article Making Hardware Just Work by Havoc Pennington. For more detailed information, you should read the HAL Specification and the HAL project page at freedesktop.org.

A number of command line utilities are provided for accessing and manipulating device objects.

hal-disable-polling	disable polling on drives with removable media
hal-find-by-capability	find device objects by capability matching
hal-find-by-property	find device objects by property matching
hal-get-property	get a property from a device object
hal-is-caller-locked-out	determine if caller is locked out
hal-is-caller-privileged	determine if caller is privileged
hal-lock	lock an interface
hal-set-property	set a property on a device object
lshal	list devices objects and properties

For the purposes of this post, we are only interested in a small number of the above utilities, i.e. those utilities which locate and return information about device objects.

Here is a simple use of the hal-find-by-property utility to retrieve a list of network interfaces.

$ hal-find-by-property --key linux.subsystem --string net
/org/freedesktop/Hal/devices/net_3a_67_56_92_fb_73
/org/freedesktop/Hal/devices/net_computer_loopback
/org/freedesktop/Hal/devices/net_00_1c_c0_6d_8a_f7

The HAL specification specifies a system namespace which contains information about a system and it's currently running kernel. The following script retrieves certain information from the system namespace.

udi=$(hal-find-by-property --key info.product --string Computer)
printf "Board Manufacturer: $(hal-get-property --udi $udi --key system.board.vendor)\n"
printf "             Model: $(hal-get-property --udi $udi --key system.board.product)\n"
printf "           Version: $(hal-get-property --udi $udi --key system.board.version)\n"
printf "        Serial No.: $(hal-get-property --udi $udi --key system.board.serial)\n"
printf "      Release Date: $(hal-get-property --udi $udi --key system.firmware.release_date)\n"
printf "      Bios Version: $(hal-get-property --udi $udi --key system.firmware.version)\n"
printf "  Motherboard UUID: $(hal-get-property --udi $udi --key system.hardware.uuid)\n"

Here is the output from the computer which I used to write this post.

Board Manufacturer: Intel Corporation
             Model: DX48BT2
           Version: AAE26191-205
        Serial No.: BQBQ8280004G
      Release Date: 08/07/2008
      Bios Version: BTX3810J.86A.1814.2008.0807.2334
  Motherboard UUID: 2237F666-4F75-11DD-894F-0007E9747DB3

The next example retrieves certain information about all the storage devices on a system using the volume capability. The volume namespace is for device objects that represent storage devices with a filesystem that are mountable. Such device objects have the capability volume.

for udi in $(/usr/bin/hal-find-by-capability --capability volume)
do
    mount=$(hal-get-property --udi $udi --key volume.mount_point)
    device=$(hal-get-property --udi $udi --key block.device)
    label=$(hal-get-property --udi $udi --key volume.label)
    fstype=$(hal-get-property --udi $udi --key volume.fstype)
    echo "$device  $mount  $fstype  $label"
done

Here is sample output from this script.

/dev/sda1  /boot      ext3  /boot
/dev/sdb2  /abac      ext3  /abac
/dev/sdb1  /home/fpm  ext3  fpm
/dev/sda2    LVM2_member
/dev/sdc1    LVM2_member
/dev/sdc2    LVM2_member

You can also drill down and return information about specific devices such as an optical drive, i.e. device objects which have a capability of cdrom.

for i in $(hal-find-by-property --key storage.drive_type --string cdrom)
do
   printf "Manufacturer: $(hal-get-property --udi $i --key storage.vendor)\n"
   printf "       Model: $(hal-get-property --udi $i --key block.device)\n"
   printf "         Bus: $(hal-get-property --udi $i --key storage.model)\n"
   printf "        Path: $(hal-get-property --udi $i --key storage.bus)\n"
done

Here is the output for the computer which I am using to write this post. It contains a single DVD RW drive.

Manufacturer: HP
       Model: /dev/sr0
         Bus: DVD Writer 1070d
        Path: pci

In many cases there are multiple ways of retrieving the information you want. For example, here is another way of retrieving the same information, together with details of the volume label and mount point if there is a disk in the drive, using the storage.cdrom capability.

for udi in $(/usr/bin/hal-find-by-capability --capability storage.cdrom)
do
    device=$(hal-get-property --udi $udi --key block.device)
    vendor=$(hal-get-property --udi $udi --key storage.vendor)
    model=$(hal-get-property --udi $udi --key storage.model)
    if [[ $(hal-get-property --udi $udi --key storage.removable.media_available) = true ]]
    then
        parent_udi=$(hal-find-by-property --key block.storage_device --string $udi)
        mount=$(hal-get-property --udi $parent_udi --key volume.mount_point)
        label=$(hal-get-property --udi $parent_udi --key volume.label)
    fi
    printf "$vendor  $model  $device  $mount  $label\n"
done

Here is the output from this script when there is a DVD in the drive followed by when there is not a DVD in the drive.

$ ./findodrive
HP  DVD Writer 1070d  /dev/sr0  /media/Kota_Kinabalu1  Kota_Kinabalu1
$ ./findodrive
HP  DVD Writer 1070d  /dev/sr0

Here is a script to list details about all removable USB storage drives that are currently installed on your system.

#!/bin/ksh93
#
# list attached USB storage devices
#

for udi in $(/usr/bin/hal-find-by-capability --capability storage)
do
    device=$(hal-get-property --udi $udi --key block.device)
    vendor=$(hal-get-property --udi $udi --key storage.vendor)
    model=$(hal-get-property --udi $udi --key storage.model)
    if [[ $(hal-get-property --udi $udi --key storage.bus) = "usb" ]]
    then
        parent_udi=$(hal-find-by-property --key block.storage_device --string $udi)
        mount=$(hal-get-property --udi $parent_udi --key volume.mount_point)
        label=$(hal-get-property --udi $parent_udi --key volume.label)
        media_size=$(hal-get-property --udi $udi --key storage.removable.media_size)
        size=$(( ceil(media_size/(1000*1000*1000)) ))
        printf "$vendor  $model  $device  $mount  $label "${size}GB" \n"
    fi
done

And here is the output when two USB thumb drives are present.

./listusb   
Kingston  DataTraveler 2.0  /dev/sdd  /media/KINGSTON  KINGSTON 1GB 
Kingston  DataTraveler 2.0  /dev/sdc  /media/USB-4GB  USB-4GB 4GB 
$

Well that is about all for now. You should have gained sufficient information from this post to go away and experiment on your own system.

Remember, however, that the HAL specification and it's implementation in GNU/Linux distributions is still somewhat in a state of flux. The above examples worked on Fedora 10 as of the date of this post. If an example does not work on your system, I recommend that you first check the output from the lshal utility to see if the device object and associated device properties are instantiated on your system. You can use udevadm --monitor to see what kernel events are being pushed out to user space. HAL monitors these events. You can also use lshal --monitor to see what events HAL emits to user space.

KSH93 Extended Patterns

According to David Korn, a shell is primarily a string processing language. Pattern matching is an important compoment of any such language and indeed Korn Shell 93 (ksh93) has excellent support for extended patterns as well as regular expressions. Extended patterns can be thought of as class or type of extended regular expressions. Both the bash and zsh shells have something similar but not as comprehensive. However, as usual, extended patterns are documented quite tersely in the ksh93 man page.

The purpose of this post is to explain, with some examples, how to use the power of extended patterns in your ksh93 scripts. It is assumed that you are reasonably familiar with basic regular expressions (BREs) as implemented in sed, grep or awk. If you need a introductory tutorial on regular expressions, here is one at IBM developerWorks.

The following table shows the the basic pattern matching operators in ksh93.

?(pattern)	Match if found 0 or 1 times
*(pattern)	Match if found 0 or more times
+(pattern)	Match if found 1 or more times
@(pattern1\|...)	Match if any of the patterns found
!(pattern)	Match if no pattern found

Note that an operator must preceed a pattern in ksh93 whereas in egrep, sed and awk, the operator is placed after the pattern.

Here is an example of how to use the above operators to modify the contents of the string str

str="Joe Mike and Dave are all good friends"

print ${str//a?(re)/_}
# output: Joe Mike _nd D_ve _ _ll good friends

print ${str//g*(o)/_}
# output: Joe Mike and Dave are all _d friends

print ${str//+(o)/_}
# output: J_ Mike and Dave are all g_d friends

print ${str//@(Joe|Mike|Dave)/_}
# output: _ _ and _ are all good friends

print ${str//@(Joe|Mike|g*(o))/_}
# output: _ _ and Dave are all _d friends

print ${str//!(Joe)/_}
# output: _

print ${str//!(Joe|Mike|Dave)/_}
# output: _

In the above example, I have shown the expected output as a comment below each print statement. Here is another example which should futher clarify your understanding of these pattern operators. Note particularly the output of of the last print statement.

str='An extended pattern expression'

print ${str//e/#}
print ${str//[^e]/#}
print ${str//+(e)/#}
print ${str//-(e)/#}
print ${str//?(e)/#}
print ${str//*(e)/#}
print ${str//!(e)/#}

which produces the following output.

An #xt#nd#d patt#rn #xpr#ssion
###e##e##e######e###e###e######
An #xt#nd#d patt#rn #xpr#ssion
An extended pattern expression
#A#n# #x#t#n#d#d# #p#a#t#t#r#n# #x#p#r#s#s#i#o#n#
#A#n# #x#t#n#d#d# #p#a#t#t#r#n# #x#p#r#s#s#i#o#n#
#

The following table show a number of pattern matching interval quantifiers.

{n}(pattern)	Match if found exactly n times where n is a non-negative number
{n,m}(pattern)	Match if found between n and m times where n and m are non-negative integers and n <= m

Here is an example of how to use the above interval quantifiers to match various strings.

print $(
   [[ aaaa == {4}(a) ]] || print $?
   [[ aaaa == {,4}(a) ]] || print $?
   [[ aaaa == {3,}(a) ]] || print $?
   [[ aaaa == {2,4}(a) ]] || print $?
   [[ abc == {1,4}(ab)c ]] || print $?
   [[ abcabc == {,2}(abc) ]] || print $?
   [[ abababcc == {1,4}(ab){1,2}(c) ]] || print $?
   [[ abc == {1,4}(ab){1,2}(c) ]] || print $?
   [[ abcdcdabcd == {3,6}(ab|cd) ]] || print $?
   [[ abcdcdabcde == {5}(ab|cd)e ]] || print $?
)

By default an extended pattern attempts to match the longest possible string consistent with generating the longest overall match. This is known as a greedy or maximal match. A non-greedy (or minimal) match is one that matches the shortest possible string. Perl was the first scripting langauge to popularize non-geedy matching. By the way, ksh93 and zsh are the only shells that support non-greedy matching.

You can use the '-' qualifier to indicate to the shell that you want to use non-geedy matching as shown in the table below.

?-(pattern)	Shortest match if found 0 or 1 times
*-(pattern)	Shortest match if found 0 or more times
+-(pattern)	Shortest match if found 1 or more times
@-(pattern1\|…)	Shortest match if any of the patterns found
{n,m}-(pattern)	Shortest match if found between n and m times

Alternatively, you can use the ~(-g) subpattern to indicate to ksh93 that you want to use non-geedy matching. The following examples show both methods.

str="bcdabdcbabcd"

print "    Greedy: ${str/+(*ab)/_}"
print "Non-greedy: ${str/+-(*ab)/_}"

str="heleelloo hello"

print "    Greedy: ${str//he*l/_}"
print "Non-greedy: ${str//~(-g)he*l/_}"
print "    Greedy: ${str//?(he*ll)/_}"
print "Non-greedy: ${str//~(-g)?(he*ll)/_}"
print "Non-greedy: ${str//?-(he*ll)/_}"
print "    Greedy: ${str//+(he*ll)/_}"
print "Non-greedy: ${str//+-(he*ll)/_}"
print "    Greedy: ${str//*(he*ll)/_}"
print "Non-greedy: ${str//*-(he*ll)/_}"
print "    Greedy: ${str//{1,2}(he*ll)/_}"
print "Non-greedy: ${str//~(-g){1,2}(he*l)/_}"

A pattern-list is a list of one or more patterns separated from each other by either a & or a |. A & (denoting logical AND) means that all patterns must be matched whereas | (denoting logical OR) means that only one pattern need be matched. Composite patterns can also be created as shown below.

?(pattern-list)	Optionally matches any one of the patterns
*(pattern-list)	Matches zero or more occurrences of the patterns.
+(pattern-list)	Matches one or more occurrences of the patterns.
{n}(pattern-list)	Matches exactly n occurrences of the patterns.
{m,n}(pattern-list)	Matches m to n occurrences of the patterns. If m is omitted, 0 is used. If n is omitted at least m occurrences are matched.
@(pattern-list)	Matches exactly one of the patterns.
!(pattern-list)	Matches anything except one of the patterns.

Again, by default, matching is greedy. Each pattern in the pattern-list attempts to match the longest string possible consistent with generating the longest overall match. If more than one match is possible, the match starting closest to the beginning of the string will be chosen. However, for each of the above compound patterns a − can be inserted in front of the ( to specify that the shortest match to the specified pattern-list should be used.

Finer grained control of extended pattern matching is possible using sub-patterns of the form ~(options:pattern-list) where :pattern-list is optional and options consists of one or more of the following option flags:

+	Enable following options (default)
-	Disable following options
E	Remainder of the pattern uses ERE pattern syntax
F	Remainder of the pattern uses fgrep-like pattern syntax.
G	Remainder of pattern uses BRE pattern syntax
K	Remainder of pattern uses ksh93 pattern syntax (default)
i	Case insensitive match
g	Greedy match (default)
l	Left anchor pattern.
r	Right anchor pattern.

If both options and :pattern-list are specified, then the specified options apply only to :pattern-list. Otherwise, the specified options remain in effect until disabled by a subsequent ~(...) sub-pattern or at the end of the sub-pattern containing ~(...).

ksh93 provides a way to translate extended patterns into regular expressions and vice-versa by means of two printf options.

$ printf "%R\n" "*[!0-9]*'
[^0-9]
$ printf "%P\n" "([0-9]+\.){3}"
*{3}(+([0-9])\.)*
$

I hope that I have inspired you to go away and experiment on your own with some of the more advanced features of extended patterns in ksh93. Once you master the syntax, you can significantly reduce the need for your scripts to invoke external utilities such as sed or awk simply to parse text strings.

Enjoy!

P.S. All the examples included in this post were tested on ksh93t+ 12/10/2008.

JavaScript E4X

In previous posts, I discussed the SpiderMonkey command line shell js and how to add support to it to enable full access (read, write, create, copy, delete, etc.) to the local filesystem via the File object and the NSPR library.

While rumaging around in the source code and documentation for js, I found that js partially supported the EX4 XML extension via a user configurable option.

This post looks at what it takes to load an XML document into js from your local filesystem, process it and write out the resulting document to your local filesystem using File objects and the E4X extension.

The ECMAScript for XML (E4X) (ECMA-357) specification adds native support for XML objects and XMLList objects to the JavaScript programming language. This standard was first published in 2004 and was based on XML extensions provided in the BEA (now Oracle) Weblogic Workshop product. These extensions were designed by Terry Lucas and John Schneider who led the ECMAScript for XML (E4X) initiative.

The basis idea behind E4X was that declarative languages such as XSL and XPATH are too complex for the average programmer to quickly learn and therefore a simpler way of accessing and manipulating XML documents was needed. Personally I do not agree with that assertion.

As an aside, currently Schneider is founder and CEO at AgileDelta which developed the Efficient XML binary format specification which I plan to write about in a future post. A W3C working group is currently developing the EXI specification which is based on the AgileDelta specification.

TO BE CONTINUED

JavaScript File Object

As you are probably aware JavaScript engines such as SpiderMonkey typically do not allow access to the local filesystem for reasons of security. To enable developers to test the scripts from a command line, js includes the load() function which enables you to load one or more JavaScript scripts into the SpiderMonkey engine. However this is not sufficient for our purposes as no means is provided to write to the filesystem. Looking more closely at the source code, I noticed support for File objects. This support is not enabled by default however. It is not sufficient to simply recompile SpiderMonkey with this option enabled; you must also download and build the Netscape Portable Runtime (NSPR) library. This library provides a platform-neutral API for system level and libc-like functions, and is used by a number of Mozilla projects and other third party software developers. The current release is 4.7.3 and you can download it here.

There are some gotchas to building Spidermonkey with NSPR. First of all, you need to successfully build NSPR. The source code tarball for NSPR comes with the standard GNU autoconfigure tools. If you are on a 64-bit system, you need to execute configure with the -enable-64bit option; otherwise the build will quickly fail. You should then test the build by going to the test subdirectory, building the testsuite and executing it. You also need to modify SpiderMonkey's Makefile.ref (I am assuming you are building SpiderMonkey 1.7 and not an earlier release) to include libnspr and the NSPR headers. Two compile time defines are needed. You can either define JS_HAS_FILE_OBJECT and JS_THREADSAFE in Makefile.ref or as command line arguments to make. After than you, should be able to successfully build SpiderMonkey with native File object support.

Now that we have js build with support for File objects, what can we do with it. Well, I guess we should start with the expected Hello World script.

js> File.output.writeln("Hello World");     
Hello World
true
js> File.output.writeln("Hello, world"); "OK" 
Hello, world
OK
js> File.output.writeln("Hello, world"); ""  
Hello, world

js>

Notice that true is outputted unless you append something else as shown above. Here is another short example which demonstrates how to list the properties of the instance File object for the current directory.

js> dir = new File('.');
/home/fpm/js/.
js> for ( i in dir ) print(i);
length
parent
path
name
isDirectory
isFile
exists
canRead
canWrite
canAppend
canReplace
isOpen
type
mode
creationTime
lastModified
size
hasRandomAccess
hasAutoFlush
position
isNative
js>print(dir.path);
/home/fpm/js/.
js>print(dir.size)
4096
js>

The next example shows how to list some information about files in the current directory.

js> dir = new File('.');
/home/fpm/js/.
js> files = dir.list(); 'OK'
OK
js> for (i in files ) print (files[i].name + '  ' + files[i].size + '  ' + files[i].creationTime); 
music.xml  1081  Tue Jan 06 2009 17:37:14 GMT-0500 (EST)
xml.js  259  Tue Dec 30 2008 18:23:22 GMT-0500 (EST)
xml1.js  699  Tue Jan 06 2009 23:33:26 GMT-0500 (EST)
2.xml  96  Tue Jan 06 2009 22:41:37 GMT-0500 (EST)
3.xml  127  Wed Jan 07 2009 00:02:18 GMT-0500 (EST)
multiply.js  249  Tue Dec 30 2008 17:49:02 GMT-0500 (EST)
helloworld.js  88  Tue Dec 30 2008 17:12:50 GMT-0500 (EST)
hw.js  124  Thu Jan 01 2009 00:24:38 GMT-0500 (EST)
xml2.js  502  Wed Jan 07 2009 00:02:17 GMT-0500 (EST)
regex.js  143  Tue Dec 30 2008 18:10:55 GMT-0500 (EST)
1.xml  15  Tue Jan 06 2009 20:35:27 GMT-0500 (EST)
js>

In the above example, list is an instance method of the File object. Using other File object instance methods, you can read data from a file and write data to a file.

In the following example, the script reads lines in from file.in and write the lines out to another file file.out, prepending each line with the corresponding line number.

#!/bin/js

var filein = new File("file.in");
filein.open("read", "text");

var fileout = new File('file.out');
fileout.open("write,create", "text");

var n=1;
while (data = filein.readln())
   fileout.writeln(n++ + ' ' + data);

filein.close();
fileout.close();

The File object provides two ways to access data inside a file: a text oriented access, based on characters, and a binary oriented access, based on bytes. In text mode, the maximum line length is 256 and the following encodings are supported: ASCII (text), UTF-8 and UCS-2. The File object has a number of instance methods including read, readLn, readAll, write, writeLn and writeAll. If you just want to copy the file in it's entirety, you can use the copyTo method.

var file = new File("file.in");
file.open("read", "text");

file.copyTo("file.out");

file.close();

Similar instance methods which work on files include remove to delete a file or a directory and removeTo to rename a file.

Now that I have shown you how to enable js to access the local filesystem, I would be amiss if I did not point out to you the following warning which is in the Javascript File Object Proposal.

Flaming disclaimer: Do not play with the File object if
you are not prepared to have your hard drive erased,
smashed, and broken into little bits! It mostly works
right now, but no guarantees.

So far I have had no problem using the File object on my Fedora 10 64-bit platform but one never knows! Proceed with caution.

KSH93 Stat Builtin

One of the builtin commands that is missing in ksh93, in my humble opinion, is a builtin similar to stat(1) which would return information about a file. Here is my initial implementation of a stat builtin. The output is a compound variable whose subvariables contain the contents of the various fields of the stat(2) structure. If you are unfamilar with compound variables, see my previous post for an eluridation.

/*
**  FPMurphy  2009-01-03
**
**  License: Common Public License Version 1.0
**
*/

#pragma prototyped

#include "defs.h"
#include "builtins.h"
#include "path.h"
#include <tm.h>

/* macro to create subvariables */
#define CREATE_CVE(X,Y,Z) \
   strcpy(b,(X)); \
   np = nv_open(buf, shp->var_tree, NV_NOASSIGN|NV_VARNAME); \
   nv_putval(np, (char*)(Y), (Z) ); \
   nv_close(np)

static char strperms_buf[30];

static const
char sh_optstat[] =
   "[-?\n@(#)$Id: stat 2009-01-03 $\n]"
   "[-author?Finnbarr P. Murphy fpm at hotmail.com ]"
   "[-license?http://www.opensource.org/licenses/cpl1.0.txt]"
   "[+NAME? stat - get file status]"
   "[+DESCRIPTION?\bstat\b creates the compound variable \avar\a corresponding"
       " to the file given by the pathname \afile\a.  The elements of \avar\a"
       " are the names of fields in the \astat\a structure with the \bst_\b"
       " prefix removed, together with the basename of \afile\a.]"
   "\n"
   "\nvar file\n"
   "\n"
   "[+EXIT STATUS?]{"
       "[+0?Success.]"
       "[+>0?An error occurred.]"
   "}"
   "[+SEE ALSO?\bstat\b(1),\bstat\b(2)]"
;

/* stringify the permission bits */
static char *
strperms(char * p, mode_t mode)
{
   char ftype = '?';

   if (S_ISBLK(mode))  ftype = 'b';
   if (S_ISCHR(mode))  ftype = 'c';
   if (S_ISDIR(mode))  ftype = 'd';
   if (S_ISFIFO(mode)) ftype = '|';
   if (S_ISLNK(mode))  ftype = 'l';
   if (S_ISREG(mode))  ftype = '-';

   sfsprintf(p, 30, "\\0%010lo %c%c%c%c%c%c%c%c%c%c %c%c%c\0",
   mode, ftype,
   mode & S_IRUSR ? 'r' : '-',
   mode & S_IWUSR ? 'w' : '-',
   mode & S_IXUSR ? 'x' : '-',
   mode & S_IRGRP ? 'r' : '-',
   mode & S_IWGRP ? 'w' : '-',
   mode & S_IXGRP ? 'x' : '-',
   mode & S_IROTH ? 'r' : '-',
   mode & S_IWOTH ? 'w' : '-',
   mode & S_IXOTH ? 'x' : '-',
   mode & S_ISUID ? 'U' : '-',
   mode & S_ISGID ? 'G' : '-',
   mode & S_ISVTX ? 'S' : '-');

   return(p);
}


int
b_stat(int argc, char *argv[], void *extra)
{
   register Shell_t *shp = ((Shbltin_t*)extra)->shp;
   register Namval_t *np;
   register int n;
   struct stat statb;
   char buf[100];
   char *b;

   while (n = optget(argv, sh_optstat)) switch (n) {
      case ':':
         errormsg(SH_DICT, 2, "%s", opt_info.arg);
         break;
      case '?':
         errormsg(SH_DICT, ERROR_usage(2), "%s", opt_info.arg);
         break;
   }

   argc -= opt_info.index;
   argv += opt_info.index;
   if (argc!=2)
       errormsg(SH_DICT, ERROR_usage(2), optusage((char*)0));

   /* stat the file */
   if (stat(argv[1], &statb) < 0)
       errormsg(SH_DICT, ERROR_system(1), "%s: stat failed", argv[1]);

   strcpy(buf, argv[0]);
   b = buf;
   while (*b) b++;

   /* create compound variable */
   np = nv_open(buf, shp->var_tree, NV_NOASSIGN|NV_VARNAME|NV_ARRAY );
   if (!nv_isnull(np))
      nv_unset(np);
   nv_setvtree(np);
   nv_close(np);

   /* create compound variable elements */
   CREATE_CVE(".name", path_basename(argv[1]) , NV_RDONLY);
   CREATE_CVE(".atime", &statb.st_atime, NV_RDONLY|NV_INTEGER);
   CREATE_CVE(".ctime", &statb.st_ctime, NV_RDONLY|NV_INTEGER);
   CREATE_CVE(".mtime", &statb.st_mtime, NV_RDONLY|NV_INTEGER);
   CREATE_CVE(".uid", &statb.st_uid, NV_RDONLY|NV_INTEGER);
   CREATE_CVE(".gid", &statb.st_gid, NV_RDONLY|NV_INTEGER);
   CREATE_CVE(".size", &statb.st_size, NV_RDONLY|NV_INTEGER|NV_LONG);
   CREATE_CVE(".dev", &statb.st_dev, NV_RDONLY|NV_INTEGER);
   CREATE_CVE(".ino", &statb.st_ino, NV_RDONLY|NV_INTEGER|NV_LONG);
   CREATE_CVE(".nlink", &statb.st_nlink, NV_RDONLY|NV_INTEGER);
   CREATE_CVE(".mode", strperms(strperms_buf, statb.st_mode), NV_RDONLY);

   return(0);
}

This code was tested using ksh93t+.

You can embed a man page into the builtin as done in the above source code. Most, if not all, of the ksh93 commands have such embedded man pages.

Here is the output from stat --help.

$ stat --help
Usage: stat [ options ] var file
$

and here is the output from stat --man.

$ stat --man
NAME
   stat - get file status

SYNOPSIS
  stat [ options ] var file

DESCRIPTION
  stat creates the compound variable var corresponding to the file given 
  by the pathname file. The elements of var are the names of fields in the 
  stat structure with the st_ prefix removed, together with the basename 
  of file.

EXIT STATUS
    0     Success.
    >0    An error occurred.

SEE ALSO
  stat(1),stat(2)

IMPLEMENTATION
  version         stat 2009-01-03
  author          Finnbarr P. Murphy fpm at hotmail.com
  license         http://www.opensource.org/licenses/cpl1.0.txt
$

Here is an example of using the stat builtin to get information about a file called tksh.

$ stat fileinfo ./tksh
$ print $fileinfo
( atime=1231192796 ctime=1231192291 dev=2065 gid=500 ino=50897 
mode='0100755 -rwxr-xr-x ---' mtime=1231192291 name=tksh nlink=1 size=2171604 uid=500 )
$ print ${fileinfo.atime}
1231192796
$ printf "%(%Y-%m-%d %H:%M:%S)T\n" "#${fileinfo.atime}"
2009-01-05 16:59:56
$

As you can see the stat builtin gets the file statistics using stat(2), creates a compound variable with the specified name, i.e. fileinfo, and then creates a series of subvariables (atime, ctime, mtime, etc. ) whose names correspond to the fields of the stat(2) structure. This compound variable is then available to you to use as necessary within your shell script. Rather than having to handle a series of variables, one per stat(2) structure field, you simply have to deal with single compound variable, i.e. fileinfo.

As always, email me if you have any questions.

KSH93 Compound Variables

Most shells, including bash, ksh88 and pdksh, implement a flat variable namespace. According to Dave Korn, one of the lessons learned from UNIX is that a hierarchical namespace is better than a flat namespace. For this reason, amongst others, a hierarchical variable namespace was implemented by him in ksh93, with . (dot) as the separator for each level of the hierarchy. This expanded variable namespace enabled the implementation of an aggregate definition for a shell variable to include subvariables. Such shell variables are called compound variables. Not much appears to have been written about compound variables to date and, as usual, the ksh93 man page is terse on the subject. This post will try to explain compound variables in detail and demonstrate how useful they can be in when dealing with structured data.

As in bash, ksh88, and pdksh, a variable in ksh93 is defined by a name=value pair.

$ myvar=10
$ print $myvar
10

Now consider the following commands which show how to declare a compound variable, define subvariables and interact with those subvariables.

$ typeset -C myvar       # declare 'myvar' as compound variable 
$ myvar.x=10             # set subvariable 'x' to 10
$ myvar.y=20             # set subvariable 'y' to 20
$ print $myvar           # print definition of compound variable
( x=10 y=20 )
$ print ${myvar.x}       # print value of 'x' subvariable
10
$ print ${myvar.y}       # print value of 'y' subvariable
20
$ yourvar=( x=10 y=5 )   # declare and define 'yourvar' compound variable
$ print $yourvar             
( x=10 y=5 )
$ print $(( yourvar.x * yourvar.y ))   # multiply 2 subvariables together.
50
$

A variable with a . (dot,period) in it’s name is called a subvariable. However to create a subvariable, a variable whose name consists of everything up to the period must already exist. Note that variable names that begin with .sh are reserved for use by ksh93.

Just as var='' initializes a simple variable, var=() does the same for a compound variable. This is not assigning a value to var; it is simply declaring that var is a compound variable and its value will be defined by all subvariables of the form var.*.

You can specify the type of specific subvariables. For example, to create a compound variable with a subvariable named x of type integer.

$ var=( typeset -i x=2 )

You can have more than one level of subvariables in a compound variable. However, you must be careful to first declare the compound variable otherwise you will get an error.

$ var=
$ var.x.y=2
/bin/ksh93t: var.x.y=2: no parent
$ var=()
$ var.x.y=2
$ print $var
( x=( y=2 ) )
$

You can use += word with compound variables provided the types are compatable. When += word is applied to an arithmetic type, word is added to the current value. When applied to a string variable, word is appended to the value.

$ var=( str="abc" typeset -i num=12 )
$ var.str+="def"
$ var.num+=12
$ print ${var.}
( typeset -i num=24 str=abcdef )
$

You may be wondering why the value of a compound variable is outputted in the form that it is. The reason is that value of a compound variable is intended to be in a form ready for reinput by ksh93 as shown by the following example.

$ var=( x=2 y=4 )
$ print $var
( x=2 y=4 )
$ print -r "newvar=$var" > file
$ unset var
$ print $var

$ . ./file
$ print $newvar
( x=2 y=4)
$

You can copy compound variables using the eval builtin.

$ var=( x=2 y=3 )
$ print $var
( x=2 y=3 )
$ eval "newvar=$var"
$ print $newvar
( x=2 y=3 )
$

You can create a new compound variable from part of a existing compound variable.

$ var=( x=1 y=2 )
$ var.y.a=1
$ var.y.b=2
$ var.y.c=3
$ print $var
( x=1 y=2 y=( a=1 b=2 c=3 .=2 ) )
$ eval "newvar=${var.y.}"
$ print $newvar
( a=1 b=2 c=3 .=2 )
$ print -r -- "newvar=${newvar.}"
newvar=(
        a=1
        b=2
        c=3
        .=2
)
$

Compound variables can be exported but subvariables cannot.

$ testvar=( x=12 )
$ echo ${testvar.x}
12
$ export testvar.x
/bin/ksh93: export: testvar.x: invalid export name
$ export testvar
$ env | grep testvar
testvar=(x=12;)
$

One thing you need to be aware of is that ksh93 also uses the . (dot) notation to denote what are called discipline functions. Shell variables in ksh93 can also behave as active objects rather than as simple storage units by having a one or more functions associated with a variable. These functions are called discipline functions. A discipline function is defined like any other function, except that the name for a discipline function is formed by using the variable name, followed by a . (dot), followed by the discipline name. Any variable can have discipline functions defined that are invoked when the variable is referenced or assigned a value. The default set of discipline functions in ksh93 is get, set, and unset. Other discipline functions can be defined via a custom shared library.

Compound variables are currently a work in progress. Some of the examples that I have shown above may not work in future versions of ksh93. These examples were tested on ksh93t 2008-07-24.

Well, that is about all you need to get you started using compound variables. I hope this post has given you some ideas about how useful such variables could be in your future ksh93 scripts.

JavaScript Shells

Recently I was working on a fairly complex JavaScript script relating to floating point conversions for a new Web page.  After a while I got tired on trying to debug the problem via a Web browser and decided to see if I could find a JavaScript shell, i.e. a standalone Javascript intrepreter just like Ruby's irb, Python's interactive prompt or the Korn shell, which could load and run JavaScript scripts from the command line without having to reload a Web page.

First, some background on the JavaScript langauge for those who are unfamilar with the details.  JavaScript is a complex full-featured weakly typed object- based functional programming language originally developed by Brendan Eich in 1995 while working on the Netscape Navigator browser.  It is most frequently used in client-side web applications but is also used to enable scripting access to embedded objects in other applications.

The langauge has been standardized in the ECMA-262 (ECMAScript) specification.  The first version of ECMAScript was published in June 1997, and was partially based on JavaScript v1.2.  The current version is Edition 3 (Dec 1999) and work is ongoing on the next edition.  Formally, Javascript is a dialect of ECMAScript whose langauge specification is controlled by the Mozilla Foundation.  There are other dialects including ActionScript which the scripting language used in Adobe Flash.  Javascript is still evolving as a language and several versions are in daily use.  The current version is JavaScript 1.8.

The JavaScript engine in Firefox is written in C.  It was orginally called Javascript Reference (JSRef) but nowadays is known as SpiderMonkey.  Other Mozilla products also use this engine and it is available to the public under a MPL/GPL/LGPL tri-license.  The current version, SpiderMonkey 1.7, conforms to JavaScript 1.8 which is a superset of ECMA-262 Edition 3. It consists of a library (or DLL) containing the JavaScript runtime (compiler, interpreter, decompiler, garbage collector, atom manager and standard classes) engine.  This codebase has no dependencies on the rest of the Mozilla codebase. The codebase also contains the routines for a simple user interface which can be linked to the runtime library in order to make a command line shell.

You can download the source code for SpiderMonkey 1.7 here.  Aternatively you can use wget,curl or ftp to download the tarball.  No build script is provided with this version of SpiderMonkey.  Here is how I downloaded, built and smoketest'ed the shell.

mkdir mozilla
cd mozilla
wget http://ftp.mozilla.org/pub/mozilla.org/js/js-1.7.0.tar.gz
tar xzf js-1.7.0.tar.gz
cd js/src
make -f Makefile.ref

If everything compiles correctly, you should then smoketest the JavaScript shell (js) by executing the following command:

./Linux_All_DBG.OBJ/js ./perfect.js

If the 3 perfect numbers between 1 and 500 are printed and you are returned to your shell prompt without any error messages, all is well. You should then copy the ./Linux_All_DBG.OBJ/js to /bin or /usr/local/bin to make it easier to use.

Unlike other programming languages, JavaScript does not have a concept of printing to STDOUT or reading from STDIN. These functions, along with quit(), load() and a small number of other functions are provided within js. They are not part of the Javascript runtime library. The first thing you will notice is that everything is function-based. To exit js, you do not type quit, instead you have to type quit().

Here is the standard Hello World example.

$ cat helloworld.js
//
// Hello World!
//

function helloWorld(name)
{
    print("Hello World, " + name);
}
$ /bin/js
js> load('helloworld.js')
js> helloWorld('Finnbarr')
Hello World, Finnbarr
js> helloWorld('Patricia')
Hello World, Patricia
js> quit()
$

Here is the same script called directly from ksh93. This is possible because of the shebang (!#) sytax on the first line of the script.

$ cat hw.js
#!/bin/js

//
// Hello World!
//

function helloWorld(name)
{
    print("Hello World, " + name);
}

helloWorld('Finnbarr');
$ ./hw.js
Hello World, Finnbarr
$

The shell comes with a readline() function which enables you to ask a user to enter a value such a string or a number as the following example shows.

$ cat multiply.js
//
// multiply.js
//

function multiply()
{
    print("Enter a number:");
    var n1 = readline();
    print("Enter another one:");
    var n2 = readline();

    print("You entered " + n1 + " and " + n2 + ". The result is " + n1*n2);
}

$ /bin/js
js> load('multiply.js')
js> multiply()
Enter a number:
12
Enter another one:
10
You entered 12 and 10. The result is 120

You have full access to all functionality that is defined in the Javascript 1.8 specification.

$ cat regex.js
//
//  test regular expression
//

myRe = /d(b+)d/g;
myArray = myRe.exec("cdbbdbsbz");

print("The value of lastIndex is " + myRe.lastIndex);

$ /bin/js -f regex.js
The value of lastIndex is 5
$

You can also disassemble your JavaScript script using the dissrc() function.

js> dissrc(multiply)     

;-------------------------   7:     print("Enter a number:");
00000:   7  name "print"
00003:   7  pushobj
00004:   7  string "Enter a number:"
00007:   7  call 1
00010:   7  pop
;-------------------------   8:     var n1 = readline();
00011:   8  name "readline"
00014:   8  pushobj
00015:   8  call 0
00018:   8  setvar 0
00021:   8  pop
;-------------------------   9:     print("Enter another one:");
00022:   9  name "print"
00025:   9  pushobj
00026:   9  string "Enter another one:"
00029:   9  call 1
00032:   9  pop
;-------------------------  10:     var n2 = readline();
00033:  10  name "readline"
00036:  10  pushobj
00037:  10  call 0
00040:  10  setvar 1
00043:  10  pop
;-------------------------  11:  
;-------------------------  12:     print("You entered " + n1 + " and " + n2 + ". The result is " + n1*n2);
00044:  12  name "print"
00047:  12  pushobj
00048:  12  string "You entered "
00051:  12  getvar 0
00054:  12  add
00055:  12  string " and "
00058:  12  add
00059:  12  getvar 1
00062:  12  add
00063:  12  string ". The result is "
00066:  12  add
00067:  12  getvar 0
00070:  12  getvar 1
00073:  12  mul:0
00074:  12  add
00075:  12  call 1
00078:  12  pop
00079:  12  stop
js> quit()
$

A major limitation of js is that it cannot access files such as an XML documents, DOM objects nor upload or download files.  An alternative command line shell is available from Mozilla called xpcshell.  It is a XPConnect-enabled JavaScript command line shell where scripts running in it can access XPCOM functionality.  For those of you who are unfamilar with XPCOM, it is a cross platform component object model, similar to Microsoft's COM.  XPCOM is frequently used for unit testing.

Besides SpiderMonkey and TraceMonkey, there are a number of other standalone JavaScript shells available including wxjs and JSDB.  There also is the Rhino engine, created primarily by Norris Boyd, which is a JavaScript implementation written in Java that is ECMA-262 Edition 3 compliant.

I would be amiss if I did not point out that there are alternatives to using a command line shell such as js.  My personal favorite is Firebug which is a Firefox extension that provides a more advanced interactive shell, an advanced DOM inspector, a JavaScript debugger, a profiling tool and various other useful tools.

Plans for SpiderMonkey 1.8 appear to be shelved.  The Firefox development team are currently in the process of replacing SpiderMonkey with TraceMonkey which is based on a technique developed at UC Irvine called trace trees, builds on code and ideas shared with the Tamarin Tracing project, and adds just-in-time (JIT) native code compilation.  The net result is a seriously significant speed increase both in the browser chrome and Web page content.  As with SpiderMonkey, you can download the source code from the TraceMonkey mercurial repository and build your own command line shell.

After languishing for a number of years, JavaScript is becoming an increasing important language for Web applications. It is one of those languages that every Web programmer needs to understand in depth.  Being able to run and debug JavaScript scripts from the command line greatly assists in that understanding.

Enjoy!

Musings of an OS Plumber