| gxvalid: TrueType GX validator |
| ============================== |
| |
| |
| 1. What is this |
| --------------- |
| |
| `gxvalid' is a module to validate TrueType GX tables: a collection of |
| additional tables in TrueType font which are used by `QuickDraw GX |
| Text', Apple Advanced Typography (AAT). In addition, gxvalid can |
| validates `kern' tables which have been extended for AAT. Like the |
| otvalid module, gxvalid uses Freetype 2's validator framework |
| (ftvalid). |
| |
| You can link gxvalid with your program; before running your own layout |
| engine, gxvalid validates a font file. As the result, you can remove |
| error-checking code from the layout engine. It is also possible to |
| use gxvalid as a stand-alone font validator; the `ftvalid' test |
| program included in the ft2demo bundle calls gxvalid internally. |
| A stand-alone font validator may be useful for font developers. |
| |
| This documents documents the following issues. |
| |
| - supported TrueType GX tables |
| - fundamental validation limitations |
| - permissive error handling of broken GX tables |
| - `kern' table issue. |
| |
| |
| 2. Supported tables |
| ------------------- |
| |
| The following GX tables are currently supported. |
| |
| bsln |
| feat |
| just |
| kern(*) |
| lcar |
| mort |
| morx |
| opbd |
| prop |
| trak |
| |
| The following GX tables are currently unsupported. |
| |
| cvar |
| fdsc |
| fmtx |
| fvar |
| gvar |
| Zapf |
| |
| The following GX tables won't be supported. |
| |
| acnt(**) |
| hsty(***) |
| |
| The following undocumented tables in TrueType fonts designed for Apple |
| platform aren't handled either. |
| |
| addg |
| CVTM |
| TPNM |
| umif |
| |
| |
| *) The `kern' validator handles both the classic and the new kern |
| formats; the former is supported on both Microsoft and Apple |
| platforms, while the latter is supported on Apple platforms. |
| |
| **) `acnt' tables are not supported by currently available Apple font |
| tools. |
| |
| ***) There is one more Apple extension, `hsty', but it is for |
| Newton-OS, not GX (Newton-OS is a platform by Apple, but it can |
| use sfnt- housed bitmap fonts only). Therefore, it should be |
| excluded from `Apple platform' in the context of TrueType. |
| gxvalid ignores it as Apple font tools do so. |
| |
| |
| We have checked 183 fonts bundled with MacOS 9.1, MacOS 9.2, MacOS |
| 10.0, MacOS X 10.1, MSIE for MacOS, and AppleWorks 6.0. In addition, |
| we have checked 67 Dynalab fonts (designed for MacOS) and 189 Ricoh |
| fonts (designed for Windows and MacOS dual platforms). The number of |
| fonts including TrueType GX tables are as follows. |
| |
| bsln: 76 |
| feat: 191 |
| just: 84 |
| kern: 59 |
| lcar: 4 |
| mort: 326 |
| morx: 19 |
| opbd: 4 |
| prop: 114 |
| trak: 16 |
| |
| Dynalab and Ricoh fonts don't have GX tables except of `feat' and |
| `mort'. |
| |
| |
| 3. Fundamental validation limitations |
| ------------------------------------- |
| |
| TrueType GX provides layout information to libraries for font |
| rasterizers and text layout. gxvalid can check whether the layout |
| data in a font is conformant to the TrueType GX format specified by |
| Apple. But gxvalid cannot check a how QuickDraw GX/AAT renderer uses |
| the stored information. |
| |
| 3-1. Validation of State Machine activity |
| ----------------------------------------- |
| |
| QuickDraw GX/AAT uses a `State Machine' to provide `stateful' layout |
| features, and TrueType GX stores the state transition diagram of |
| this `State Machine' in a `StateTable' data structure. While the |
| State Machine receives a series of glyph IDs, the State Machine |
| starts with `start of text' state, walks around various states and |
| generates various layout information to the renderer, and finally |
| reaches the `end of text' state. |
| |
| gxvalid can check essential errors like: |
| |
| - possibility of state transitions to undefined states |
| - existence of glyph IDs that the State Machine doesn't know how |
| to handle |
| - the State Machine cannot compute the layout information from |
| given diagram |
| |
| These errors can be checked within finite steps, and without the |
| State Machine itself, because these are `expression' errors of state |
| transition diagram. |
| |
| There is no limitation about how long the State Machine walks |
| around, so validation of the algorithm in the state transition |
| diagram requires infinite steps, even if we had a State Machine in |
| gxvalid. Therefore, the following errors and problems cannot be |
| checked. |
| |
| - existence of states which the State Machine never transits to |
| - the possibility that the State Machine never reaches `end of |
| text' |
| - the possibility of stack underflow/overflow in the State Machine |
| (in ligature and contextual glyph substitutions, the State |
| Machine can store 16 glyphs onto its stack) |
| |
| In addition, gxvalid doesn't check `temporary glyph IDs' used in the |
| chained State Machines (in `mort' and `morx' tables). If a layout |
| feature is implemented by a single State Machine, a glyph ID |
| converted by the State Machine is passed to the glyph renderer, thus |
| it should not point to an undefined glyph ID. But if a layout |
| feature is implemented by chained State Machines, a component State |
| Machine (if it is not the final one) is permitted to generate |
| undefined glyph IDs for temporary use, because it is handled by next |
| component State Machine and not by the glyph renderer. To validate |
| such temporary glyph IDs, gxvalid must stack all undefined glyph IDs |
| which can occur in the output of the previous State Machine and |
| search them in the `ClassTable' structure of the current State |
| Machine. It is too complex to list all possible glyph IDs from the |
| StateTable, especially from a ligature substitution table. |
| |
| 3-2. Validation of relationship between multiple layout features |
| ---------------------------------------------------------------- |
| |
| gxvalid does not validate the relationship between multiple layout |
| features at all. |
| |
| If multiple layout features are defined in TrueType GX tables, |
| possible interactions, overrides, and conflicts between layout |
| features are implicitly given in the font too. For example, there |
| are several predefined spacing control features: |
| |
| - Text Spacing (Proportional/Monospace/Half-width/Normal) |
| - Number Spacing (Monospaced-numbers/Proportional-numbers) |
| - Kana Spacing (Full-width/Proportional) |
| - Ideographic Spacing (Full-width/Proportional) |
| - CJK Roman Spacing (Half-width/Proportional/Default-roman |
| /Full-width-roman/Proportional) |
| |
| If all layout features are independently managed, we can activate |
| inconsistent typographic rules like `Text Spacing=Monospace' and |
| `Ideographic Spacing=Proportional' at the same time. |
| |
| The combinations of layout features is managed by a 32bit integer |
| (one bit each for selector setting), so we can define relationships |
| between up to 32 features, theoretically. But if one feature |
| setting affects another feature setting, we need typographic |
| priority rules to validate the relationship. Unfortunately, the |
| TrueType GX format specification does not give such information even |
| for predefined features. |
| |
| |
| 4. Permissive error handling of broken GX tables |
| ------------------------------------------------ |
| |
| When Apple's font rendering system finds an inconsistency, like a |
| specification violation or an unspecified value in a TrueType GX |
| table, it does not always return error. In most cases, the rendering |
| engine silently ignores such wrong values or even whole tables. In |
| fact, MacOS is shipped with fonts including broken GX/AAT tables, but |
| no harmful effects due to `officially broken' fonts are observed by |
| end-users. |
| |
| gxvalid is designed to continue the validation process as long as |
| possible. When gxvalid find wrong values, gxvalid warns it at least, |
| and takes a fallback procedure if possible. The fallback procedure |
| depends on the debug level. |
| |
| We used the following three tools to investigate Apple's error handling. |
| |
| - FontValidator (for MacOS 8.5 - 9.2) resource fork font |
| - ftxvalidator (for MacOS X 10.1 -) dfont or naked-sfnt |
| - ftxdumperfuser (for MacOS X 10.1 -) dfont or naked-sfnt |
| |
| However, all tests were done on a PowerPC based Macintosh; at present, |
| we have not checked those tools on a m68k-based Macintosh. |
| |
| In total, we checked 183 fonts bundled to MacOS 9.1, MacOS 9.2, MacOS |
| 10.0, MacOS X 10.1, MSIE for MacOS, and AppleWorks 6.0. These fonts |
| are distributed officially, but many broken GX/AAT tables were found |
| by Apple's font tools. In the following, we list typical violation of |
| the GX specification, in fonts officially distributed with those Apple |
| systems. |
| |
| 4-1. broken BinSrchHeader (19/183) |
| ---------------------------------- |
| |
| `BinSrchHeader' is a header of a data array for m68k platforms to |
| access memory efficiently. Although there are only two independent |
| parameters for real (`unitSize' and `nUnits'), BinSrchHeader has |
| three additional parameters which can be calculated from `unitSize' |
| and `nUnits', for fast setup. Apple font tools ignore them |
| silently, so gxvalid warns if it finds and inconsistency, and always |
| continues validation. The additional parameters are ignored |
| regardless of the consistency. |
| |
| 19 fonts include such inconsistencies; all breaks are in the |
| BinSrchHeader structure of the `kern' table. |
| |
| 4-2. too-short LookupTable (5/183) |
| ---------------------------------- |
| |
| LookupTable format 0 is a simple array to get a value from a given |
| GID (glyph ID); the index of this array is a GID too. Therefore, |
| the length of the array is expected to be same as the maximum GID |
| value defined in the `maxp' table, but there are some fonts whose |
| LookupTable format 0 is too short to cover all GIDs. FontValidator |
| ignores this error silently, ftxvalidator and ftxdumperfuser both |
| warn and continue. Similar problems are found in format 3 subtables |
| of `kern'. gxvalid warns always and abort if the validation level |
| is set to FT_VALIDATE_PARANOID. |
| |
| 5 fonts include too-short kern format 0 subtables. |
| 1 font includes too-short kern format 3 subtable. |
| |
| 4-3. broken LookupTable format 2 (1/183) |
| ---------------------------------------- |
| |
| LookupTable format 2, subformat 4 covers the GID space by a |
| collection of segments which are specified by `firstGlyph' and |
| `lastGlyph'. Some fonts store `firstGlyph' and `lastGlyph' in |
| reverse order, so the segment specification is broken. Apple font |
| tools ignore this error silently; a broken segment is ignored as if |
| it did not exist. gxvalid warns and normalize the segment at |
| FT_VALIDATE_DEFAULT, or ignore the segment at FT_VALIDATE_TIGHT, or |
| abort at FT_VALIDATE_PARANOID. |
| |
| 1 font includes broken LookupTable format 2, in the `just' table. |
| |
| *) It seems that all fonts manufactured by ITC for AppleWorks have |
| this error. |
| |
| 4-4. bad bracketing in glyph property (14/183) |
| ---------------------------------------------- |
| |
| GX/AAT defines a `bracketing' property of the glyphs in the `prop' |
| table, to control layout features of strings enclosed inside and |
| outside of brackets. Some fonts give inappropriate bracket |
| properties to glyphs. Apple font tools warn about this error; |
| gxvalid warns too and aborts at FT_VALIDATE_PARANOID. |
| |
| 14 fonts include wrong bracket properties. |
| |
| |
| 4-5. invalid feature number (117/183) |
| ------------------------------------- |
| |
| The GX/AAT extension can include 255 different layout features, but |
| popular layout features are predefined (see |
| http://developer.apple.com/fonts/Registry/index.html). Some fonts |
| include feature numbers which are incompatible with the predefined |
| feature registry. |
| |
| In our survey, there are 140 fonts including `feat' table. |
| |
| a) 67 fonts use a feature number which should not be used. |
| b) 117 fonts set the wrong feature range (nSetting). This is mostly |
| found in the `mort' and `morx' tables. |
| |
| Apple font tools give no warning, although they cannot recognize |
| what the feature is. At FT_VALIDATE_DEFAULT, gxvalid warns but |
| continues in both cases (a, b). At FT_VALIDATE_TIGHT, gxvalid warns |
| and aborts for (a), but continues for (b). At FT_VALIDATE_PARANOID, |
| gxvalid warns and aborts in both cases (a, b). |
| |
| 4-6. invalid prop version (10/183) |
| ---------------------------------- |
| |
| As most TrueType GX tables, the `prop' table must start with a 32bit |
| version identifier: 0x00010000, 0x00020000 or 0x00030000. But some |
| fonts store nonsense binary data instead. When Apple font tools |
| find them, they abort the processing immediately, and the data which |
| follows is unhandled. gxvalid does the same. |
| |
| 10 fonts include broken `prop' version. |
| |
| All of these fonts are classic TrueType fonts for the Japanese |
| script, manufactured by Apple. |
| |
| 4-7. unknown resource name (2/183) |
| ------------------------------------ |
| |
| NOTE: THIS IS NOT A TRUETYPE GX ERROR. |
| |
| If a TrueType font is stored in the resource fork or in dfont |
| format, the data must be tagged as `sfnt' in the resource fork index |
| to invoke TrueType font handler for the data. But the TrueType font |
| data in `Keyboard.dfont' is tagged as `kbd', and that in |
| `LastResort.dfont' is tagged as `lst'. Apple font tools can detect |
| that the data is in TrueType format and successfully validate them. |
| Maybe this is possible because they are known to be dfont. The |
| current implementation of the resource fork driver of FreeType |
| cannot do that, thus gxvalid cannot validate them. |
| |
| 2 fonts use an unknown tag for the TrueType font resource. |
| |
| 5. `kern' table issues |
| ---------------------- |
| |
| In common terminology of TrueType, `kern' is classified as a basic and |
| platform-independent table. But there are Apple extensions of `kern', |
| and there is an extension which requires a GX state machine for |
| contextual kerning. Therefore, gxvalid includes a special validator |
| for `kern' tables. Unfortunately, there is no exact algorithm to |
| check Apple's extension, so gxvalid includes a heuristic algorithm to |
| find the proper validation routines for all possible data formats, |
| including the data format for Microsoft. By calling |
| classic_kern_validate() instead of gxv_validate(), you can specify the |
| `kern' format explicitly. However, current FreeType2 uses Microsoft |
| `kern' format only, others are ignored (and should be handled in a |
| library one level higher than FreeType). |
| |
| 5-1. History |
| ------------ |
| |
| The original 16bit version of `kern' was designed by Apple in the |
| pre-GX era, and it was also approved by Microsoft. Afterwards, |
| Apple designed a new 32bit version of the `kern' table. According |
| to the documentation, the difference between the 16bit and 32bit |
| version is only the size of variables in the `kern' header. In the |
| following, we call the original 16bit version as `classic', and |
| 32bit version as `new'. |
| |
| 5-2. Versions and dialects which should be differentiated |
| --------------------------------------------------------- |
| |
| The `kern' table consists of a table header and several subtables. |
| The version number which identifies a `classic' or a `new' version |
| is explicitly written in the table header, but there are |
| undocumented differences between Microsoft's and Apple's formats. |
| It is called a `dialect' in the following. There are three cases |
| which should be handled: the new Apple-dialect, the classic |
| Apple-dialect, and the classic Microsoft-dialect. An analysis of |
| the formats and the auto detection algorithm of gxvalid is described |
| in the following. |
| |
| 5-2-1. Version detection: classic and new kern |
| ---------------------------------------------- |
| |
| According to Apple TrueType specification, there are only two |
| differences between the classic and the new: |
| |
| - The `kern' table header starts with the version number. |
| The classic version starts with 0x0000 (16bit), |
| the new version starts with 0x00010000 (32bit). |
| |
| - In the `kern' table header, the number of subtables follows |
| the version number. |
| In the classic version, it is stored as a 16bit value. |
| In the new version, it is stored as a 32bit value. |
| |
| From Apple font tool's output (DumpKERN is also tested in addition |
| to the three Apple font tools in above), there is another |
| undocumented difference. In the new version, the subtable header |
| includes a 16bit variable named `tupleIndex' which does not exist |
| in the classic version. |
| |
| The new version can store all subtable formats (0, 1, 2, and 3), |
| but the Apple TrueType specification does not mention the subtable |
| formats available in the classic version. |
| |
| 5-2-2. Available subtable formats in classic version |
| ---------------------------------------------------- |
| |
| Although the Apple TrueType specification recommends to use the |
| classic version in the case if the font is designed for both the |
| Apple and Microsoft platforms, it does not document the available |
| subtable formats in the classic version. |
| |
| According to the Microsoft TrueType specification, the subtable |
| format assured for Windows and OS/2 support is only subtable |
| format 0. The Microsoft TrueType specification also describes |
| subtable format 2, but does not mention which platforms support |
| it. Aubtable formats 1, 3, and higher are documented as reserved |
| for future use. Therefore, the classic version can store subtable |
| formats 0 and 2, at least. `ttfdump.exe', a font tool provided by |
| Microsoft, ignores the subtable format written in the subtable |
| header, and parses the table as if all subtables are in format 0. |
| |
| `kern' subtable format 1 uses a StateTable, so it cannot be |
| utilized without a GX State Machine. Therefore, it is reasonable |
| to assume that format 1 (and 3) were introduced after Apple had |
| introduced GX and moved to the new 32bit version. |
| |
| 5-2-3. Apple and Microsoft dialects |
| ----------------------------------- |
| |
| The `kern' subtable has a 16bit `coverage' field to describe |
| kerning attributes, but bit interpretations by Apple and Microsoft |
| are different: For example, Apple uses bits 0-7 to identify the |
| subtable, while Microsoft uses bits 8-15. |
| |
| In addition, due to the output of DumpKERN and FontValidator, |
| Apple's bit interpretations of coverage in classic and new version |
| are incompatible also. In summary, there are three dialects: |
| classic Apple dialect, classic Microsoft dialect, and new Apple |
| dialect. The classic Microsoft dialect and the new Apple dialect |
| are documented by each vendors' TrueType font specification, but |
| the documentation for classic Apple dialect is not available. |
| |
| For example, in the new Apple dialect, bit 15 is documented as |
| `set to 1 if the kerning is vertical'. On the other hand, in |
| classic Microsoft dialect, bit 1 is documented as `set to 1 if the |
| kerning is horizontal'. From the outputs of DumpKERN and |
| FontValidator, classic Apple dialect recognizes 15 as `set to 1 |
| when the kerning is horizontal'. From the results of similar |
| experiments, classic Apple dialect seems to be the Endian reverse |
| of the classic Microsoft dialect. |
| |
| As a conclusion it must be noted that no font tool can identify |
| classic Apple dialect or classic Microsoft dialect automatically. |
| |
| 5-2-4. gxvalid auto dialect detection algorithm |
| ----------------------------------------------- |
| |
| The first 16 bits of the `kern' table are enough to identify the |
| version: |
| |
| - if the first 16 bits are 0x0000, the `kern' table is in |
| classic Apple dialect or classic Microsoft dialect |
| - if the first 16 bits are 0x0001, and next 16 bits are 0x0000, |
| the kern table is in new Apple dialect. |
| |
| If the `kern' table is a classic one, the 16bit `coverage' field |
| is checked next. Firstly, the coverage bits are decoded for the |
| classic Apple dialect using the following bit masks (this is based |
| on DumpKERN output): |
| |
| 0x8000: 1=horizontal, 0=vertical |
| 0x4000: not used |
| 0x2000: 1=cross-stream, 0=normal |
| 0x1FF0: reserved |
| 0x000F: subtable format |
| |
| If any of reserved bits are set or the subtable bits is |
| interpreted as format 1 or 3, we take it as `impossible in classic |
| Apple dialect' and retry, using the classic Microsoft dialect. |
| |
| The most popular coverage in new Apple-dialect: 0x8000, |
| The most popular coverage in classic Apple-dialect: 0x0000, |
| The most popular coverage in classic Microsoft dialect: 0x0001. |
| |
| 5-3. Tested fonts |
| ----------------- |
| |
| We checked 59 fonts bundled with MacOS and 38 fonts bundled with |
| Windows, where all font include a `kern' table. |
| |
| - fonts bundled with MacOS |
| * new Apple dialect |
| format 0: 18 |
| format 2: 1 |
| format 3: 1 |
| * classic Apple dialect |
| format 0: 14 |
| * classic Microsoft dialect |
| format 0: 15 |
| |
| - fonts bundled with Windows |
| * classic Microsoft dialect |
| format 0: 38 |
| |
| It looks strange that classic Microsoft-dialect fonts are bundled to |
| MacOS: they come from MSIE for MacOS, except of MarkerFelt.dfont. |
| |
| |
| ACKNOWLEDGEMENT |
| --------------- |
| |
| Some parts of gxvalid are derived from both the `gxlayout' module and |
| the `otvalid' module. Development of gxlayout was supported by the |
| Information-technology Promotion Agency(IPA), Japan. |
| |
| The detailed analysis of undefined glyph ID utilization in `mort' and |
| `morx' tables is provided by George Williams. |
| |
| ------------------------------------------------------------------------ |
| |
| Copyright 2004-2015 by |
| suzuki toshiya, Masatake YAMATO, Red hat K.K., |
| David Turner, Robert Wilhelm, and Werner Lemberg. |
| |
| This file is part of the FreeType project, and may only be used, |
| modified, and distributed under the terms of the FreeType project |
| license, LICENSE.TXT. By continuing to use, modify, or distribute this |
| file you indicate that you have read the license and understand and |
| accept it fully. |
| |
| |
| --- end of README --- |