The Arabic Mac
Macros for converting old documents to Unicode:
Below are some comments on each of the fonts covered by the conversion macros. For explanation of some of the terms used in the description ("combining diacritics", "private area codes", etc.), see the glossary / general comments at the end. The comments here relate to the process of converting to Unicode, not the relative merit of each font as it stands, and are based on the information I possess about them. Additions, corrections, updates are welcome.
The fonts involved
This is a Macintosh font for Persian transcription, related to HaifaTimes below, see below for details. As noted there, the bitmap provided with the download is in fact not the screen font for Abbas, but for the related GalilTimes font (q.v.). Like the others of the family, it uses some private area codes* and displaces some European diacritics.**
This is, I believe, a Macintosh font made by the late B.W. Andrzejewski, I saw it in 1994. It is mainly a combining font, Mac style, with only the basic Arabic diacritics; the converter replaces the most common character combinations (a, i, u, e, o line over and s, d, t, h, z, k with dot under) with complete characters and places any remaining diacritic over their intended base character.
I have very little information about this Windows font, not even its full name (it may be "AHT Times New Roman"), and the converter is incomplete, it is only based on a single sample text file which did not contain all relevant characters. I welcome further information about this font.
AO Times New Roman
This Windows font was used in many publications of the German Archaeological Institute. It contains a few private area codes* and displaces some European symbols.** It does not have combining diacritics, but a couple of non-combining equivalents.
This Windows font is distributed by the University of Köln, signed Kirsch RRZK 1991. A report says that it does not work properly in Windows, in that word units are not recognized. It seems also to have been inproperly converted to Unicode, in that fake (Tibetan) Unicode values have been assigned to all characters in the font, which may have caused the error. The macro attempts to correct this by reverting all character values, not just the transliteration characters, to their Unicode standard value. Real-life tests by users of this font are welcome to see if this is successful.
This Windows font, available on the Internet, was created by the Ahlul Bayt Islamic Library. In addition to the regular Arabic diacritics, it contains vowels with a half-circle above. These are not defined in Unicode, and are here converted to the relevant vowel with "modifier centred left half ring", the most similar Unicode diacritic available (U+02D3), for the most approximate rendering possible.
This is a Mac font for Akkadian transcription, related to the HaifaTimes group, see below for details. The provided screen and printer fonts are identical, unlike others of the family. Like the others of the family, it uses some private area codes* and displaces some European diacritics.**
Not to be confused with Times Beyrut Roman below, nor Bairut of the Jaghbub family, this Macintosh font for Aramaic transcription is related to the HaifaTimes group, see below for further detail. The download includes a screen font with the printer font. This is not the GalilTimes screen font (as with some of the other fonts in this family), but it does differ in some respects from the printer (postscript) font, some characters are located differently, others are missing. The macro is based on the printer font, which is the only one that will be visible under OS X (the Nisus set also has a separate macro for the bitmap layout, in case that is actually in use). - Beyrut uses some control codes for diacritics (e.g. p, o, u & s superscript). I welcome comments if that causes problems in conversion.
This is apparently originally a Windows font, but used also on Macs, I have got it from an Israeli colleague. It lacks A,U line above, but has a number more specialized characters. It contains a few "private area" characters and displaces two or three European diacritics.*
(An earlier version of this conversion file contained an incomplete conversion of Bloomington based on observed use, for the current version I have been able to work from the font itself and should be complete.)
This is a Windows font for Aramaic transcription, related to the HaifaTimes group, see below for details. Like the others of the family, it uses some private area codes* and displaces some European diacritics.**
This is a Windows font made, as the name shows, by the Deutsche Morgenländische Gesellschaft, primarily for its Zeitschrift, but was also used by German Orientalists for other purposes. I have not had direct access to this font, the macro is based on a character map that may or may not be complete, some identifications are uncertain. The DMG font was "cannibalistic" in that it used the positions for numerals as well as the regular characters a, s, g, j, etc. for diacritics (typing s gave "s dot under", etc.). Thus, the font could not be used for regular text, only for the diacritic characters, unlike almost all other fonts here which could also for regular English text (the other exception is Times NewArabic Roman). Thus, conversion should only be used on specific diacritic characters (select them before converting), otherwise regular text will be converted to unwanted diacritics. There is some unclarity concering long vowels, I welcome real life experience if I have interpreted the conversion correctly. It contains one "private area" character.*
This is a Windows font used among others by Studia Iranica. It appears to have been made (originally?) by Ecological Linguistics in 1990, the version used here dates from 2006 and may be an amended version. It is mainly a "complete character" font, but also has half a dozen "combining diacritics". However, these last are of two kinds: one particular macron (line above) is thrown forward, "Mac style", all the others are placed on the preceding character, "Unicode style". They are all converted to Unicode style diacritics. The font also contains a few "private area" codes*.
Galil / GalilTimes
These two fonts are related to the HaifaTimes group, see below for further detail. They come in (incompatible) Windows and Mac versions, thus there are two conversion macros, one for the Windows version (Galil) and one for the Mac version (GalilTimes). Like the others of the family, both use some private area codes* and displace some European diacritics.**
This Macintosh font one of a family of Mac and Windows font signed Ulrich Seeger of Karlsruhe 1998, with its siblings Abbas, Assur, Bock, Beyrut, Galil/GalilTimes, Nebe, Pamuk, Sima, and UrmiTimes; all are available on the Internet. Most of the Mac fonts are available as screen ("bitmap") and related printer (postscript) fonts, Pamuk and the Windows fonts as single Truetype fonts. However, the screen fonts provided for HaifaTimes and Abbas ("HaifaTimes.fam" and "Abbas.fam") are in fact the bitmaps for the GalilTimes Mac font, no doubt uploaded by error. The HaifaTimes macro here thus converts according to the postscript font's layout. The font has displaced some European diacritics**, which are relocated here.
This Windows font appears to have been made on the basis of EuroIranica above, and shares some characters with it, but differs in some respects. It was used i.a. for the online version of the Encyclopedia Iranica (which today uses Unicode). It is mainly a "complete character" font, but also has half a dozen "combining diacritics". Unlike EuroIranica, all combining diacritics are placed on the preceding character, "Unicode style". It also contains a few "private area" codes*.
This is a Windows font, the version I have appears to be dated 1992 - 2001 and has some (false) Unicode values added. The conversion seems however straightforward.
This is actually a series of three Windows fonts, IslwTimes, IslwHelvetica, and IslwCourier, following the same layout, so the same macro IswTimes will convert all three fonts; the result is however displayed in Times (or Lucida Grande) in all cases. It contains a couple of "private area" codes*.
This is my own set of three fonts, Jaghbub, Koufra, Bairut, so these macros have actually been tested in real life. There are a few "Mac-style" combination characters, which are placed onto their correct base character. I have made Unicode versions of these fonts, JaghbUni, etc., so unlike the others, this macro will place the text in JaghbUni if that is installed (whether or not the original was written in Jaghbub, Koufra or Bairut), otherwise Times/Lucida Grande.
This is a Windows font created by my colleagues Joseph Bell and Petr Zemanek for their journal Journal of Arabic and Islamic Studies, hence the name. It is a Mac-style combination fonts, with several positions for each diacritic (to cater for wide and narrow base characters etc.), which are united to their Unicode equivalent and placed on the correct base character.
ME Times & ME Geneva
Not to be confused with MidEastTimes below, to avoid confusion I have named the macro "ME Geneva", but the perhaps more used ME Times font follows the same layout. These were very widely used Mac fonts in the 1980s and 1990s. They contain two "private area" characters.* The result is displayed in Times (Lucida Grande) whichever of the two ME fonts were used originally.
Not to be confused with ME Times above. This was also a very widely used Mac font in the 1980s and 1990s. It contains nine "private area" characters,* and has some "Unicode style" combining diacritics.
This is a Windows font for Aramaic transcription, related to the HaifaTimes group (and to Bock in particular), see this for details. Like the others of the family, it uses some private area codes* and displaces some European diacritics.**
I have no particular knowledge of this font. The conversion is based on information gathered from Yusuke Kinoshita's conversion tools for rtf files. To be checked & confirmed by users of that font.
New World / New World Transliterator
These are the Mac and Windows versions of the same font, made by Christopher Buck (Ottawa) in the early 1990s. "New World Transliterator" is a Mac-only font. I only have some notes about this, and cannot confirm if the macro catches all diacritic characters in the font, but it should include the basic set. "New World" is a Truetype font, dated 1995, made for use both on Mac and Windows, and is available on the Internet. The two diverge considerably in their character setup, thus they are treated as two different fonts here with different macros. Both contain four-five private area codes;* the NWT font also relocates some European accents.**
This is a Mac truetype font for Turkish transcription, related to the HaifaTimes family, see this for details. Like the others of the family, it uses some private area codes* and displaces some European diacritics.**
This is a Windows font for South Arabic transcription, related to the HaifaTimes family, see this for details. Like the others of the family, it uses some private area codes.*
This is a commercial font, distributed by Linguists' Software. It exists in several variations, both for Mac and PC, but this marco is based on a (the latest?) Windows version, the only one I have information about. I do not have access to the font itself, so this is based on observations and notes. It contains one combining diacritic, which I have taken to be "Unicode style", and two private area codes.* It also relocates a few European accents.**
I have no information about this font whatsoever, the conversion macro is made on the basis of earlier notes. It covers all the basic Arabic transliteration characters, I welcome information about whether the font has other characters not covered.
Times Beyrut Roman & OI-Beirut
These are two Windows fonts, not to be confused with the Mac font Beyrut or the Bairut of the Jaghbub package. They are available on the Internet, from the Oriental Seminar of Freiburg and made for(?) the Oriental Institute in Beirut, respectively. They follow the same pattern, the macro will convert either. There are also two sister fonts, Courier Beirut and Arial Beirut which presumably use the same layout. It contains two private area codes,* and relocates some European accents.** (Notice that the fonts are sometimes referred to as ...Beyrut, sometimes as ...Beirut)
This Mac font was, as the name indicates, made by the publisher EJ Brill for contributers to its Encyclopedia of Islam (EI(2)). It is clearly marked by this, containing i.a. the typical EI characters Dj, Kh, etc. as complete characters, but also many other specialized diacritics. The combining characters are converted to regular Dj, Kh etc in the BrillEncyc macro, but are retained as separate combinations in the private area* extended macro. (Brill now requests Unicode fonts from its contributors.) There may have been a comparable Windows font from Brill, but this macro converts the Mac version.
Times New Arabic
Not to be confused with the similarly named Times NewArabic Roman below. TNA is a combining font, "Unicode style", available on the Internet and distributed i.a. by McGill University. It has several positions for some diacritics, suited for different character widths. These are merged into their Unicode variants, but left as combination diacritics. For improved quality (at least in MS Word), you may want to replace them with the respective complete characters, but I have left that to the user.
Times New Arabic Roman / Times New English Roman
These are two related fonts which share some of their layout, but vary in some respects, so they are here treated as separate fonts (not to be confused with the McGill font, Times New Arabic, above). They are signed "El Chipo", 2001. The major difference between "New English" and "New Arabic" is that TNAR cannibalizes a number of regular characters, in particular numerals. It cannot therefore be used for regular text, unlike TNER. The "New Arabic" font also duplicates most of the diacritic characters to two positions, one in "low ASCII", one in "high ASCII", unlike "New English", which works more like regular diacritic fonts with diacritics replacing lesser used symbols only. All variants are here converted to the single respective Unicode value.
In most files, the "cannibalistic" TNAR will probably be used for diacritics only, and another font, TNER or other, for regular text, or at least for numerals. The macro cannot distinguish what font was used originally, so the user must do that, selecting characters/sections that are intended for diacritics and applying the macro on those. Otherwise all numerals will be converted to diacritic characters as well, as well as certain accents and punctuation (!, $, #, &, §, *, @, ` , ø, å, etc. - but regular a-z are untouched. All macros work so that, if some text is selected, only the selection is converted, if no particular text is selected, all is converted. So, for TNAR, as for DMG Tms above, always do select the relevant text sections).
This is a combined legacy and Unicode font, according to the Readme on its website. It is made by Yusuke Kinoshita, and the Unicode version is very extensive, containing many of the "private area" codes commented on below. This macro converts from the legacy ("Classic") version's layout to Unicode values. It is based on the description of the layout provided on the download site.
Timur & Helvan
These two fonts used to be distributed by the University of Zürich, but can today be found on Yahoo only, signed Zabidi. It is a set of Macintosh and Windows fonts, which share their layout, so the enclosed macro will convert both the Mac and Windows versions of both fonts. It contains two private area codes.*
This is another commercial font that is distributed by Linguists' Software. This marco is based on a Mac version, the only one I have information on. It contains two combining diacritics in the "Unicode style", and a number of private area codes.* It also relocates a few European accents.**
This is a Mac font for Aramaic transcription, related to the HaifaTimes family, see this for details. As in some of the other fonts, the screen font provided with the download is clearly erronous; it is in fact the GalilTimes screen font with just five of the many UrmiTimes characters added to the layout. However, I have added a conversion macro for this bitmap font as well, in case it has been in use separately from the postscript font. This macro, called "UrmiTbm" operates thus as for a separate font, and should not be confused with UrmiTimes proper. Like the others of the family, it uses some private area codes* and displaces some European diacritics.**
As always, I welcome additional information and corrections. Write to:
Glossary / general comments:
Diacritic or transliteration characters are generally normal letters - a, s, z, etc. - with something above or below them, a dot or a line ("macron", "breve"). A font can contain them in two ways, either as a complete character, where the character e.g. "s with dot under" is typed as a single stroke, so the "s" is part of the complete character, or as a combining diacritic, where you type a regular "s", and then type "dot under", so that the "s" is just the regular "s", and the "dot under" can be put under any character you like, r, p, w or whatever.
Combining and complete characters
Most of these fonts use the complete character approach, which allows for better typography. A few however, either contain combining diacritics as an addition to the complete characters (to add flexibility, e.g. Jaghbub), or as their main method (this is the case for JAIS-font, Times New Arabic, and afroas). Combining characters pose a particular problem, however, when converting to Unicode: Do you type the diacritic before the regular "base" character, or after?
On the Mac, we are used to type the diacritic first, and then the letter - so the diacritic is placed onto the following character. In Unicode, however, the diacritic is always placed over the preceding character, so there you must type the letter first, and then the diacritic (above, I somewhat anachronistically call these two ways the "Mac style" and "Unicode style" respectively).
This means that in the "Mac style" fonts, just converting the combining diacritics will place the diacritic incorrectly onto the letter before the one you want. The macros take care of this by moving such combining diacritics one step to the right, but the user might be aware of this possible cause for error.
* Private area codes
The Unicode system includes almost all the about 360 different transliteration characters used in these fonts - but not quite all. A few dozen characters - typically such as "z with two dots below" - are not defined in Unicode (as well as more remote combinations as "e with line above, grave accent above the line, and dot under"!). These are converted to regular "z" plus "combining diacritic two dots below", etc; by combining two different characters. The appearance is similar to the original, but they are technically two characters while the original was one - you see the difference if you backspace; in the converted document one backspace will remove just the diacritic, not the z.
While that is normally quite OK, there is however also a way to retain the "complete character" for those who want to go the extra mile. Unicode has set aside a space for undefined characters, called the "private area". A few specialist fonts for linguistics use this area to add such more remote diacritic combinations, and two such fonts, Titus Cyberbit Basic and TimesTL even agree on where to place them, thus creating a small "shared private code" (a few characters following this lead are also in Cardo, jUnicode and Leeds Unicode). Thus, you can keep "z with two dots" as a complete, and not a combining character.
Mostly, this is unnecessary, in most programs using standard Unicode characters only is preferable; adhering to a standard is after all the object of the exercise. But in particular Microsoft Word does handle such "combining diacritic with diacritic" rather badly. In the case where an "e with line above" is combined with a separate character "combining acute" above the line, Mac word processors following Apple's lead will shift the acute a little upwards, so that both line and acute are visible. Word will not, it will superimpose the acute onto the line, so that the smaller diacritic disappears from view. Here, having the combination as one separate character will give a better result. Unfortunately, you are then locked to displaying the text in these two fonts, TimesTL and Titus, the only ones who have this "private" character. When the text is displayed in other Unicode fonts (if the user does not have the Titus font installed) these characters will disappear from view.
Thus, I have catered for both: The regular macros create "combinations", which is recommended for all regular users and particularly those who want other people to be able to read their documents. However, there is also a separate set of "extended characters" macros - marked "xt" and packaged apart - that use the "private area" characters of Titus and TimesTL. Thus, these xt macros can only be used when you have either (or preferably both) of these two fonts installed. (There are also a number of combinations that do not even exist in Titus or TimesTL, these are of course only converted to combinations in both macro sets.) - Which font is required is indicated in the macro code itself, I recommend primarily to install Titus Cyberbit as it has the largest number of them, but preferably both fonts. These macros set the text in either of these two fonts, rather than Times / Lucida Grande.
** European accents
Some of these fonts appear also to have switched around the location of regular European diacritcs (é, ñ, etc.). This may or may not be related to the regular Windows-Mac file conversion, which sometimes creates havoc with European accents, or to corruption in the fonts I have worked from. Thus, before "real-world" testing, I am not sure how these conversions work. But since each font seems to be different in this respect (among those that do such re-adjustment), and it occurs both in Mac and Windows fonts, I have assumed these are willed by the font makers, and have compensated for them, putting these European accents also in their correct Unicode position, according to how the font layout appears on my machine. I welcome real-life experience as to whether that works correctly.