KSH93 Message Localization

The current version of ksh93 (93t+ 2009-05-01) supports localization of internal error messages and getopts messages but localization of user messages in shell scripts is flawed.

For a project I am working on, I needed to be able to supply localized messages for a small number of shell scripts and thus found myself in the bowels of ksh93 figuring out how to make it so.

First a word of warning.  For the unwary, modifying the source code for ksh93 and libast is not for the faint-hearted.  It is a fairly complex code base with many levels of abstraction in places.  This post assumes that you are reasonably familiar with libast and ksh93 sources and know how to rebuild ksh93 from these sources.

Before getting down to the source code modifications, a discussion of the design decisions and trade offs is in order. The major design decision was to use the POSIX message catalog (catopen(), catgets(), catclose()) localization model rather than the GNU gettext() localization model because (1) libast already provided the three cat* APIs as wrappers around the equivalent underlying libc APIs and more importantly (2) I did not want to contaminate the ksh93 sources which have an Common Public Licence V1.0 license with GNU gettext() sources which have a GPL license.

The second major design decision I made was that the message catalogs would have the same name as the shell script and and would reside in the system default message catalog location rather than locally (although this can be overwritten by the NLSPATH environmental variable.)  My current development system is Fedora 11 (Leonidas) x86_64.  There the default location for localized messages is /usr/share/locale/%l/LC_MESSAGES/.  Thus, for example, the french message catalog for a shell script named fpm would be /usr/share/locale/fr/LC_MESSAGES/fpm assuming LC_MESSAGES was set to fr_FR.UTF-8 or something equivalent.

The final major design decision I made was not to require the use of set and message numbers for identifying message strings as is normal in the POSIX message catalog model but rather to follow the gettext() model whereby a message string conveys sufficient information to find it's equivalent localized string. 

The way this works is that you must provide a C locale message catalog with a single set of messages (set 1) which ksh93 reads to determine the message number.  It then uses that message number to retrieve the same message from set 1 of the localized message catalog.  This means that the C locale message catalog and the various other localized message catalogs must be kept synchronized for correct localized message retrieval. Note that ksh93 defaults to the original message string if it cannot find that message string in the C locale message catalog or in the localized message catalog.

Modifications are only necessary to two source files.  Add the following function to .../lib/libast/translate.c.
/* 
* Support internationalized messages in $"..." format
* Assumes that catalog name is exactly the same as the script name
*/
char*
fpm_translate(const char* cmd, const char* msg)
{
register char* r;
register char* u;
char * t;
nl_catd d;
int n;
int msgno;
char * j;
char * s;

r = (char*)msg;

if (!msg) return NiL;

if (cmd && (t = strrchr(cmd, '/')))
cmd = (const char*)(t + 1);

/* first locate the message in the C locale and determine message number */
u = setlocale(LC_MESSAGES, NiL);
j = strdup(u);
setlocale(LC_MESSAGES, "C");
if (( d = find("C", cmd)) != NOCAT )
{
n = 0;
msgno = 0;
for (;;)
{
n++;
if ((s = catgets(d, 1, n, NULL)) == NULL)
break;
if (!strcmp(msg, s))
{
msgno = n;
break;
}
}
catclose(d);
}
setlocale(LC_MESSAGES, j);
free(j);

/* then use that message number to retrieve equivalent L10N message */
if (( d = find(setlocale(LC_MESSAGES,NiL), cmd)) != NOCAT )
{
s = catgets(d, 1, msgno, msg);
r = strdup(s);
catclose(d);
}

return r;
}
and modify sh_endword() in .../cmd/ksh93/sh/lex.c as follows:
#if NO_LOCALIZED_MESSAGES 
#if ERROR_VERSION >= 20000317L
msg = ERROR_translate(0,error_info.id,0,ep);
#else
# if ERROR_VERSION >= 20000101L
msg = ERROR_translate(error_info.id,ep);
# else
msg = ERROR_translate(ep,2);
# endif
#endif
#else
msg = fpm_translate(shp->st.cmdname, ep);
#endif
Then rebuild ksh93 using .../bin/package make or with your own custom build script.

Here is a simple shell script (demo) which has a localized French message catalog.  Use the system gencat to compile the two message catalogs, not UWIN's gencat.
$ cat demo
#!/bin/ksh
name="Finnbarr"
echo "ksh93 message translation example"
echo $"Hello"
print $"Goodbye"
printf $"Welcome %s\n" $name
print $"No Translation available"

$ cat demo.msg.c
$quote "
$set 1 This is the C locale message set
1 "Hello"
2 "Goodbye"
3 "Welcome %s\\n"

$ cat demo.msg.fr
$quote "
$set 1 This is the French locale message set
1 "Bonjour"
2 "Au Revoir"
3 "Bienvenu %s\\n"
Note the requirement to guard newlines with an backslash.

Here is an example of the output from demo under various locale settings.
$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

$ ./demo
A simple ksh93 message translation example
Hello
Goodbye
Welcome Finnbarr
No translated message

$ unset LANG
$ locale
LANG=
LC_CTYPE="POSIX"
LC_NUMERIC="POSIX"
LC_TIME="POSIX"
LC_COLLATE="POSIX"
LC_MONETARY="POSIX"
LC_MESSAGES="POSIX"
LC_PAPER="POSIX"
LC_NAME="POSIX"
LC_ADDRESS="POSIX"
LC_TELEPHONE="POSIX"
LC_MEASUREMENT="POSIX"
LC_IDENTIFICATION="POSIX"
LC_ALL=

$ ./demo
A simple ksh93 message translation example
Hello
Goodbye
Welcome Finnbarr
No translated message

$ export LC_MESSAGES=fr_FR.UTF-8
$ locale
LANG=
LC_CTYPE="POSIX"
LC_NUMERIC="POSIX"
LC_TIME="POSIX"
LC_COLLATE="POSIX"
LC_MONETARY="POSIX"
LC_MESSAGES=fr_FR.UTF-8
LC_PAPER="POSIX"
LC_NAME="POSIX"
LC_ADDRESS="POSIX"
LC_TELEPHONE="POSIX"
LC_MEASUREMENT="POSIX"
LC_IDENTIFICATION="POSIX"
LC_ALL=

$ ./demo
A simple ksh93 message translation example
Bonjour
Au Revoir
Bienvenu Finnbarr
No translated message
$
Well, that's about all for now.  I have provided sufficient information for you to try out localized message strings with ksh93 for yourself if this is something you need.

Note that this is still a work in progress.  I have not yet tested support for multibyte message catalogs (I am not sure if ksh93 even works in multibyte locales!).  Since I have hijacked the existing hook for error message catalogs, I probably should figure out how to get localization of ksh93 error messages working again.  There may be memory leaks or boundary conditions that I am unaware of.  Therefore if you are going to use this modification in a production environment, test and test again!

A useful future enhancement would be to decouple the name of the message catalog from the name of the shell script and also allow message set numbers other than 1.  This could be achieved by using a pair of environmental variables or by means of a msgcat compound variable where msgcat.name is the name of the message catalog to be used and msgcat.set is the actual message set to be used.

Have a great Memorial Day, and take a few moments to remember those who have sacrificed so that we can be free.

Comments and feedback are welcome as always.


[2009-09-05] Recently I spent some more time in the bowels of the AST library (libast) and made an interesting discovery regarding message catalog support for shell scripts. I discovered that ../lib/libast/misc/translate.c is where the real problem is located. The translate() function contains a call to catget() but only for message set number 3 (three). This is hardcoded into this call via the AST_MESSAGE_SET define in ../libast/include/ast_std.h. So unless you define $set 3 in your message catalog, internationalization or localization of a ksh93 shell script simply will not work.
 

0 comments:

Post a Comment