How to put IPA on the Web --
a simple hack

Dafydd Gibbon, U Bielefeld

1 October 1997

Contents

The problem

Web browsers which support the standard symbols of the International Phonetic Association (IPA), for example with Unicode characters, are not in general use, but the need for rendering IPA symbols is rather urgent for web publications and teaching materials. A simple proceduregif, which is better than the scanning process which is sometimes used, was developed in order to render IPA symbols in the hypertext version of the recently published Handbook of Standards and Resources for Spoken Language Systems, edited by Dafydd Gibbon, Roger Moore and Richard Winski, published by Mouton de Gruyter, Berlin (1997). The procedure was required for inserting a large number of IPA macros into the HTML version of a tex2html_wrap_inline177 formatted book, since the widely used latex2html hyperdocument generator software, in the version and configuration available, was not able to handle the problem. A sample is contained in Table 1.

 


IPA SAMPA ASCII Open- Front- Rounded- Comment
Close Back Unrounded
a a 97 open front unrounded
tex2html_wrap_inline179 A 65 open back unrounded
æ { 123 near-open front unrounded English bad
tex2html_wrap_inline181 6 54 near-open central unrounded German Butter
tex2html_wrap_inline183 Q 81 open back rounded
tex2html_wrap_inline185 O 79 open-mid back rounded
e e 101 close-mid front unrounded
tex2html_wrap_inline187 E 69 open-mid front unrounded
tex2html_wrap_inline189 @ 64 mid central unrounded schwa
tex2html_wrap_inline191 3 51 mid central unrounded
i i 105 close front unrounded
tex2html_wrap_inline193 I 73 near-close front unrounded lax
o o 111 close-mid back rounded
ø 2 50 close-mid front rounded
tex2html_wrap201 9 57 open-mid front rounded
tex2html_wrap203 & 38 open front rounded
u u 117 close back rounded
U U 85 near-close back rounded lax
tex2html_wrap_inline195 } 125 close central rounded
tex2html_wrap_inline197 V 86 open-mid back unrounded
y y 121 close front rounded
tex2html_wrap_inline199 Y 89 near-close front rounded lax
Table 1: Vowels 

Solution Part I - an eggbox for carrying IPA symbols

An obvious solution for rendering IPA symbols in HTML documents is to use in-line transparent GIF images for each symbol or combination of symbols; conversion of single symbols was targeted in the present application. What is not so obvious is how to generate the GIFS automatically and insert them into complex texts formatted in tex2html_wrap_inline177. The GIFs generated by the procedure described here can be imported into proprietary web assistants and HTML editors if required, and of course converted into other graphics formats.

The tex2html_wrap_inline177 IPA macros used in the EAGLES application are those contained in the wsuipa macro set, but the procedure operates with any special character macros (e.g. the \LaTeX macro, which is rendered as tex2html_wrap_inline177).

The method uses a property of the latex2html script which passes certain objects which it cannot handle, such as figure and math environments, to other programmes, effectively using the following cascade:

  1. Call latex to generate DVI format.
  2. Call dvips to generate PS format.
  3. Call pstogif to generate GIF format.
  4. Call giftrans to generate transparent GIF format.
  5. Generate HTML inline image element.
The size of the image is controlled by the .latex2html-init configuration file; this file provides the following scaling parameters which can be modified to produce an acceptable text/figure size ratio:

# This number will determine the size of the equations, special characters,
# and anything which will be converted into an inlined image
# *except* "image generating environments" such as "figure", "table"
# or "minipage".
# Effective values are those greater than 0.
# Sensible values are between 0.1 - 4.
$MATH_SCALE_FACTOR = 1.4;

# This number will determine the size of
# image generating environments such as "figure", "table" or "minipage".
# Effective values are those greater than 0.
# Sensible values are between 0.1 - 4.
$FIGURE_SCALE_FACTOR = 1;

The IPA rendering procedure uses these features of latex2html by inserting the IPA macros into a tex2html_wrap_inline177 math environment with a simple macro which I call tex2html_wrap_inline213eggbox. Packed into an tex2html_wrap_inline213eggbox, the symbols are then safely transported through the latex2html conversion cascade described above, and emerge as inline GIFs of the correct size:

Macro definition: \newcommand{\eggbox}[1]{$\mbox{#1}$}
Macro example: \eggbox{\eggbox{\schwa}}

The latex2html call for converting this document into a zero page depth hyperdocument (with minor manual post-editing) was:

latex2html -html_version 3.0 -t "IPA on the Web" -split 0 ipa2eggbox.tex

Solution Part II - automatic insertion into the eggbox

The \eggbox macros were automatically inserted into the original tex2html_wrap_inline177 file using the standard UNIX script language sed:

#!/bin/sh
# ipa2eggbox, Dafydd Gibbon, 5 May 1997
# IPA macro set is wsuipa
# Convention: macro may be followed by one or more spaces.
# Caution: check that macro name is not a prefix of another macro name.
# Macro \eggbox definition: \newcommand{\eggbox}[1]{$\mbox{#1}$}

cat $1 |
sed "s/\\\schwa */\\\eggbox{\\\schwa}/g
     s/\\\scu */\\\eggbox{\\\scu}/g
     s/\\\sci */\\\eggbox{\\\sci}/g
     s/\\\inva */\\\eggbox{\\\inva}/g
     s/\\\invv */\\\eggbox{\\\invv}/g
     s/\\\niepsilon */\\\eggbox{\\\niepsilon}/g
     s/\\\scy */\\\eggbox{\\\scy}/g
     s/\\\stress */\\\eggbox{\\\stress}/g
     s/\\\eng */\\\eggbox{\\\eng}/g
     s/\\\esh */\\\eggbox{\\\esh}/g
     s/\\\eth */\\\eggbox{\\\eth}/g
     s/\\\openo */\\\eggbox{\\\openo}/g
     s/\\\nitheta */\\\eggbox{\\\nitheta}/g
     s/\\\dz */\\\eggbox{\\\dz}/g
     s/{\\\ipa{\\\symbol{[^}]*}}}/\\\eggbox{&}/g
     s/\\\diaunder\[\\\syllabic|m\]/\\\eggbox{\\\diaunder\[\\\syllabic|m\]}/g
     s/\\\revepsilon */\\\eggbox{\\\revepsilon}/g
     s/\\\downp */\\\eggbox{\\\downp}/g
     s/\\\curlyc */\\\eggbox{\\\curlyc}/g
     s/\\\invh */\\\eggbox{\\\invh}/g
     s/\\\invscr */\\\eggbox{\\\invscr}/g
     s/\\\nj */\\\eggbox{\\\nj}/g
     s/\\\invscripta */\\\eggbox{\\\invscripta}/g
     s/\\\baro */\\\eggbox{\\\baro}/g
     s/\\\invy */\\\eggbox{\\\invy}/g
     s/\\\scr[^i] */\\\eggbox{\\\scr}/g
     s/\\\curlyz */\\\eggbox{\\\curlyz}/g
     s/\\\baru */\\\eggbox{\\\baru}/g
     s/\\\scripta */\\\eggbox{\\\scripta}/g
     s/\\\yogh */\\\eggbox{\\\yogh}/g
     s/\\\nigamma */\\\eggbox{\\\nigamma}/g
     s/\\\glotstop */\\\eggbox{\\\glotstop}/g
     s/\\\secstress */\\\eggbox{\\\secstress}/g"

Conclusion

Until browsers appear which provide renderings for IPA Unicode, the automatic inline GIF here can fill the gap reasonably satisfactorily. If and when such browsers appear will depend, among other things, on the efforts and influence of the spoken language community.


Footnote:
...procedure
This document can be freely distributed in any format. Acknowledgment would be appreciated.
 

Dafydd Gibbon
Wed Oct 1 17:02:20 MET DST 1997