Table 3.2 shows a possible model of a single, one lined
database-entry, also indicating the structure used within the individual data
categories. It is printed as is, including all signs that are present
in the ASCII source of the EAGLET data-base; for printing purposes only
line breaks were added. The real entry does not have any line breaks or carriage
returns, but every single term entry consists of one and only one line. The
line breaks do not represent spaces here. The data-categories are limited by
the comma-character `,', the escape character is the backslash
`
'. 20 separate columns are defined for the data-base table. Instances
of the data categories are represented by expressions referring to the name of
the relevant data-categories, for some categories with some generally included
markup and for others additional numbers, at least in those cases where the rule of
elementarity of data categories is violated. In these latter cases one or two
more elements are included as they occur in the database (except of
definitions, where three is currently the maximum number).
A conversion of such a complex database entry is not easy but it is possible to manage it with the help of UNIX tools . The actual script is given in appendix C. For a UNIX script it is fairly complex; most scripts are written only with few commandlines as easy and fast written tools for computer experts needing tools for otherwise annoying and repeatedly appearing tasks. This script consists of some hundred lines of source code, most of them containing substrings of subroutines.
The most important criterion for this program was that it has to work on UNIX machines, has to be capable of handling great amounts of data and has to be able to be changed quickly. Consequently, programming languages needing compilers such as C and C++ did not match the task. UNIX tools were easy to change in a few seconds, easy to debug, and what is even more important, if changes in the database itself take place, such as the introduction of new or/and other categories, these scripts can easily be adapted to new needs.
Data loss is a serious problem in database conversion, as was previously
pointed out on page
. To evaluate the efficiency of the
script developed for the EAGLET to MARTIF conversion the following tables
3.3 up to 3.7 show the degree of equivalence of the
EAGLET data category in comparison to the MARTIF standard with a description
of the used tool, comments on possible data loss and some comments.
Thorsten Trippel
Fri May 21 13:04:11 MET DST 1999