| \input texinfo |
| @c %**start of header |
| @setfilename liblouis.info |
| @documentencoding UTF-8 |
| @include version.texi |
| @settitle Liblouis User's and Programmer's Manual |
| |
| @dircategory Misc |
| @direntry |
| * Liblouis: (liblouis). A braille translator and back-translator |
| @end direntry |
| |
| @finalout |
| |
| @c Macro definitions |
| |
| @defindex opcode |
| |
| @c Opcode. |
| @macro opcode{name, args} |
| @opcodeindex \name\ |
| @anchor{\name\ opcode} |
| @item \name\ \args\ |
| @end macro |
| |
| @macro opcoderef{name} |
| @code{\name\} opcode (@pxref{\name\ opcode,\name\,@code{\name\}}) |
| @end macro |
| |
| @c Opcode. |
| @macro deprecatedopcode{name, args, replacement} |
| @opcodeindex \name\ |
| @anchor{\name\ opcode} |
| @item \name\ \args\ |
| This opcode is deprecated. Use the @opcoderef{\replacement\} instead. |
| @end macro |
| |
| @copying |
| This manual is for liblouis (version @value{VERSION}, @value{UPDATED}), |
| a Braille Translation and Back-Translation Library derived from the |
| Linux screen reader @acronym{BRLTTY}. |
| |
| @vskip 10pt |
| |
| @noindent |
| Copyright @copyright{} 1999-2006 by the BRLTTY Team. |
| |
| @noindent |
| Copyright @copyright{} 2004-2007 ViewPlus Technologies, Inc. |
| @uref{www.viewplus.com}. |
| |
| @noindent |
| Copyright @copyright{} 2007, 2009 Abilitiessoft, Inc. |
| @uref{www.abilitiessoft.org}. |
| |
| @noindent |
| Copyright @copyright{} 2014, 2016 Swiss Library for the Blind, Visually |
| Impaired and Print Disabled. @uref{www.sbs.ch}. |
| |
| @vskip 10pt |
| |
| @quotation |
| This file is free software; you can redistribute it and/or modify it |
| under the terms of the GNU Lesser (or library) General Public License |
| (LGPL) as published by the Free Software Foundation; either version 3, |
| or (at your option) any later version. |
| |
| This file is distributed in the hope that it will be useful, but |
| WITHOUT ANY WARRANTY; without even the implied warranty of |
| MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU |
| Lesser (or Library) General Public License LGPL for more details. |
| |
| You should have received a copy of the GNU Lesser (or Library) General |
| Public License (LGPL) along with this program; see the file COPYING. |
| If not, write to the Free Software Foundation, 51 Franklin Street, |
| Fifth Floor, Boston, MA 02110-1301, USA. |
| @end quotation |
| @end copying |
| |
| @titlepage |
| @title Liblouis User's and Programmer's Manual |
| |
| @subtitle for version @value{VERSION}, @value{UPDATED} |
| @author by John J. Boyer |
| |
| @c The following two commands start the copyright page. |
| @page |
| @vskip 0pt plus 1filll |
| @insertcopying |
| @end titlepage |
| |
| @c Output the table of contents at the beginning. |
| @contents |
| |
| @ifnottex |
| @node Top |
| @top Liblouis User's and Programmer's Manual |
| |
| @insertcopying |
| @end ifnottex |
| |
| @menu |
| * Introduction:: |
| * How to Write Translation Tables:: |
| * Notes on Back-Translation:: |
| * Table Metadata:: |
| * Testing Translation Tables interactively:: |
| * Automated Testing of Translation Tables:: |
| * Programming with liblouis:: |
| * Concept Index:: |
| * Opcode Index:: |
| * Function Index:: |
| * Program Index:: |
| |
| @detailmenu |
| --- The Detailed Node Listing --- |
| |
| How to Write Translation Tables |
| |
| * Overview:: |
| * Hyphenation Tables:: |
| * Character-Definition Opcodes:: |
| * Braille Indicator Opcodes:: |
| * Emphasis Opcodes:: |
| * Special Symbol Opcodes:: |
| * Special Processing Opcodes:: |
| * Translation Opcodes:: |
| * Character-Class Opcodes:: |
| * Swap Opcodes:: |
| * The Context and Multipass Opcodes:: |
| * The correct Opcode:: |
| * The match Opcode:: |
| * Miscellaneous Opcodes:: |
| |
| Emphasis Opcodes |
| |
| * Emphasis class:: |
| * Contexts:: |
| * Fallback behavior:: |
| * Computer braille:: |
| |
| Contexts |
| |
| * None:: |
| * Letter:: |
| * Word:: |
| * Phrase:: |
| * Symbol:: |
| |
| Testing Translation Tables interactively |
| |
| * lou_debug:: |
| * lou_trace:: |
| * lou_checktable:: |
| * lou_allround:: |
| * lou_translate (program):: |
| * lou_checkhyphens:: |
| * lou_checkyaml:: |
| |
| Programming with liblouis |
| |
| * Overview (library):: |
| * Data structure of liblouis tables:: |
| * How tables are found:: |
| * Deprecation of the logging system:: |
| * lou_version:: |
| * lou_translateString:: |
| * lou_translate:: |
| * lou_backTranslateString:: |
| * lou_backTranslate:: |
| * lou_hyphenate:: |
| * lou_compileString:: |
| * lou_getTypeformForEmphClass:: |
| * lou_dotsToChar:: |
| * lou_charToDots:: |
| * lou_registerLogCallback:: |
| * lou_setLogLevel:: |
| * lou_logFile:: |
| * lou_logPrint:: |
| * lou_logEnd:: |
| * lou_setDataPath:: |
| * lou_getDataPath:: |
| * lou_getTable:: |
| * lou_findTable:: |
| * lou_indexTables:: |
| * lou_checkTable:: |
| * lou_readCharFromFile:: |
| * lou_free:: |
| * lou_charSize:: |
| * Python bindings:: |
| |
| @end detailmenu |
| @end menu |
| |
| @node Introduction |
| @chapter Introduction |
| |
| Liblouis is an open-source braille translator and back-translator |
| derived from the translation routines in the BRLTTY screen reader for |
| Linux. It has, however, gone far beyond these routines. It is named in |
| honor of Louis Braille. In Linux and Mac OSX it is a shared library, |
| and in Windows it is a DLL. For installation instructions see the |
| README file. Please report bugs and oddities to the mailing list, |
| @email{liblouis-liblouisxml@@freelists.org} |
| |
| This documentation is derived from the BRLTTY manual, but |
| it has been extensively rewritten to cover new features. |
| |
| @section Who is this manual for |
| |
| This manual has two main audiences: People who want to write or |
| improve a braille translation table and people who want to use the |
| braille translator library in their own programs. This manual is |
| probably not for people who are looking for some turn-key braille |
| translation software. |
| |
| @section How to read this manual |
| |
| If you are mostly interested in writing braille translation tables |
| then you want to focus on @ref{How to Write Translation Tables}. You |
| might want to look at @ref{Notes on Back-Translation} if you are |
| interested in back-translation. Read @ref{Table Metadata} if you want |
| to find out how you can augment your tables with metadata in order to |
| make them discoverable by programs. Finally @ref{Testing Translation |
| Tables interactively} and @ref{Automated Testing of Translation |
| Tables} will show how your braille translation tables can be tested |
| interactively and also in an automated fashion. |
| |
| If you want to use the braille translation library in your own program |
| or you are interested in enhancing the braille translation library |
| itself then you will want to look at @ref{Programming with liblouis}. |
| |
| @node How to Write Translation Tables |
| @chapter How to Write Translation Tables |
| |
| For many languages there is already a translation table, so before |
| creating a new table start by looking at existing tables to modify |
| them as needed. |
| |
| Typically, a braille translation table consists of several parts. |
| First are header and includes, in which you write what the table is |
| for, license information and include tables you need for your table. |
| |
| Following this, you'll write various translation rules and lastly you |
| write special rules to handle certain situations. |
| |
| @cindex Opcode |
| A translation rule is composed of at least three parts: the opcode |
| (translation command), character(s) and braille dots. An opcode is a |
| command you give to a machine or a program to perform something on |
| your behalf. In liblouis, an opcode tells it which rule to use when |
| translating characters into braille. An operand can be thought of as |
| parameters for the translation rule and is composed of two parts: the |
| character or word to be translated and the braille dots. |
| |
| For example, suppose you want to read the word @samp{world} using |
| braille dots @samp{456}, followed by the letter @samp{W} all the time. |
| Then you'd write: |
| |
| @example |
| always world 456-2456 |
| @end example |
| |
| The word @code{always} is an opcode which tells liblouis to always |
| honor this translation, that is to say when the word @samp{world} (an |
| operand) is encountered, always show braille dots @samp{456} followed |
| by the letter @samp{w} (@samp{2456}). |
| |
| When you write any braille table for any language, we'd recommend |
| working from some sort of official standard, and have a device or a |
| program in which you can test your work. |
| |
| @menu |
| * Overview:: |
| * Hyphenation Tables:: |
| * Character-Definition Opcodes:: |
| * Braille Indicator Opcodes:: |
| * Emphasis Opcodes:: |
| * Special Symbol Opcodes:: |
| * Special Processing Opcodes:: |
| * Translation Opcodes:: |
| * Character-Class Opcodes:: |
| * Swap Opcodes:: |
| * The Context and Multipass Opcodes:: |
| * The correct Opcode:: |
| * The match Opcode:: |
| * Miscellaneous Opcodes:: |
| @end menu |
| |
| @node Overview |
| @section Overview |
| |
| Many translation (contraction) tables have already been made up. They |
| are included in the distribution in the tables directory and can be |
| studied as part of the documentation. Some of the more helpful (and |
| normative) are listed in the following table: |
| |
| @table @file |
| @item chardefs.cti |
| Character definitions for U.S. tables |
| @item compress.ctb |
| Remove excessive whitespace |
| @item en-us-g1.ctb |
| Uncontracted American English |
| @item en-us-g2.ctb |
| Contracted or Grade 2 American English |
| @item en-us-brf.dis |
| Make liblouis output conform to BRF standard |
| @item en-us-comp8.ctb |
| 8-dot computer braille for use in coding examples |
| @item en-us-comp6.ctb |
| 6-dot computer braille |
| @item nemeth.ctb |
| Nemeth Code translation for use with liblouisutdml |
| @item nemeth_edit.ctb |
| Fixes errors at the boundaries of math and text |
| |
| @end table |
| |
| The names used for files containing translation tables are completely |
| arbitrary. They are not interpreted in any way by the translator. |
| Contraction tables may be 8-bit ASCII files, UTF-8, 16-bit big-endian |
| Unicode files or 16-bit little-endian Unicode files. Blank lines are |
| ignored. Any leading and trailing whitespace (any number of blanks |
| and/or tabs) is ignored. Lines which begin with a number sign or hatch |
| mark (@samp{#}) are ignored, i.e.@: they are comments. If the number |
| sign is not the first non-blank character in the line, it is treated |
| as an ordinary character. If the first non-blank character is |
| less-than (@samp{<}) the line is also treated as a comment. This makes |
| it possible to mark up tables as xhtml documents. Lines which are not |
| blank or comments define table entries. The general format of a table |
| entry is: |
| |
| @example |
| opcode operands comments |
| @end example |
| |
| Table entries may not be split between lines. The opcode is a mnemonic |
| that specifies what the entry does. The operands may be character |
| sequences, braille dot patterns or occasionally something else. They |
| are described for each opcode, please @pxref{Opcode Index}. With some |
| exceptions, opcodes expect a certain number of operands. Any text on |
| the line after the last operand is ignored, and may be a comment. A |
| few opcodes accept a variable number of operands. In this case a |
| number sign (@samp{#}) begins a comment unless it is preceded by a |
| backslash (@samp{\}). |
| |
| Here are some examples of table entries. |
| |
| @example |
| # This is a comment. |
| always world 456-2456 A word and the dot pattern of its contraction |
| @end example |
| |
| Most opcodes have both a "characters" operand and a "dots" operand, |
| though some have only one and a few have other types. |
| |
| @cindex Characters operand |
| The characters operand consists of any combination of characters and |
| escape sequences proceeded and followed by whitespace. Escape |
| sequences are used to represent difficult characters. They begin with |
| a backslash (@samp{\}). They are: |
| |
| @table @kbd |
| @item \ |
| backslash |
| @item \f |
| form feed |
| @item \n |
| new line |
| @item \r |
| carriage return |
| @item \s |
| blank (space) |
| @item \t |
| horizontal tab |
| @item \v |
| vertical tab |
| @item \e |
| "escape" character (hex 1b, dec 27) |
| @item \xhhhh |
| 4-digit hexadecimal value of a character |
| |
| @end table |
| |
| If liblouis has been compiled for 32-bit Unicode the following are |
| also recognized. |
| |
| @table @kbd |
| @item \yhhhhh |
| 5-digit (20 bit) character |
| @item \zhhhhhhhh |
| Full 32-bit value. |
| |
| Please take a look at the |
| @url{https://unicode.org/Public/UNIDATA/,public directory of the |
| Unicode Character Database} as well as at the |
| @url{https://unicode.org/Public/UNIDATA/NamesList.txt,Unicode names |
| list with their code points} to figure out the corresponding Unicode |
| code point for a given Unicode character. |
| |
| @end table |
| |
| @cindex Dots operand |
| The dots operand is a braille dot pattern. The real braille dots, 1 |
| through 8, must be specified with their standard numbers. |
| |
| @cindex Virtual dots |
| @anchor{virtual dots} |
| liblouis recognizes @emph{virtual dots}, which are used for special |
| purposes, such as distinguishing accent marks. There are seven virtual |
| dots. They are specified by the number 9 and the letters @samp{a} |
| through @samp{f}. |
| |
| @cindex Multi-cell dot pattern |
| For a multi-cell dot pattern, the cell specifications must be |
| separated from one another by a dash (@samp{-}). For example, the |
| contraction for the English word @samp{lord} (the letter @samp{l} |
| preceded by dot 5) would be specified as @samp{5-123}. A space may be |
| specified with the special dot number 0. |
| |
| An opcode which is helpful in writing translation tables is |
| @code{include}. Its format is: |
| |
| @example |
| include filename |
| @end example |
| |
| It reads the file indicated by @code{filename} and incorporates or |
| includes its entries into the table. Included files can include other |
| files, which can include other files, etc. For an example, see what |
| files are included by the entry @code{include en-us-g1.ctb} in the table |
| @file{en-us-g2.ctb}. If the included file is not in the same directory |
| as the main table, use a full path name for filename. Tables can also be |
| specified in a table list, in which the table names are separated by |
| commas and given as a single table name in calls to the translation |
| functions. |
| |
| The order of the various types of opcodes or table entries is |
| important. Character-definition opcodes should come first. However, if |
| the optional @opcoderef{display} is used it should precede |
| character-definition opcodes. Braille-indicator opcodes should come |
| next. Translation opcodes should follow. The @opcoderef{context} is a |
| translation opcode, even though it is considered along with the |
| multipass opcodes. These latter should follow the translation opcodes. |
| The @opcoderef{correct} can be used anywhere after the |
| character-definition opcodes, but it is probably a good idea to group |
| all @code{correct} opcodes together. The @opcoderef{include} can be |
| used anywhere, but the order of entries in the combined table must |
| conform to the order given above. Within each type of opcode, the |
| order of entries is generally unimportant. Thus the translation |
| entries can be grouped alphabetically or in any other order that is |
| convenient. Hyphenation tables may be specified either with an |
| @code{include} opcode or as part of a table list. They should come after |
| everything else. |
| |
| @node Hyphenation Tables |
| @section Hyphenation Tables |
| |
| Hyphenation tables are necessary to make opcodes such as the |
| @opcoderef{nocross} function properly. There are no opcodes for |
| hyphenation table entries because these tables have a special format. |
| Therefore, they cannot be specified as part of an ordinary table. |
| Rather, they must be included using the @opcoderef{include} or as part |
| of a table list. The liblouis hyphenation algorithm was adopted from the |
| one used by OpenOffice. Note that Hyphenation tables must follow |
| character definitions and should preferably be the last. For an example |
| of a hyphenation table, see @file{hyph_en_US.dic}. |
| |
| @node Character-Definition Opcodes |
| @section Character-Definition Opcodes |
| |
| These opcodes are needed to define attributes such as digit, |
| punctuation, letter, etc. for all characters and their dot patterns. |
| liblouis has no built-in character definitions, but such definitions |
| are essential to the operation of the @opcoderef{context}, the |
| @opcoderef{correct}, the multipass opcodes and the back-translator. If |
| the dot pattern is a single cell, it is used to define the mapping |
| between dot patterns and characters, unless a @opcoderef{display} for |
| that character-dot-pattern pair has been used previously. If only a |
| single-cell dot pattern has been given for a character, that dot |
| pattern is defined with the character's own attributes. |
| |
| You may have multiple definitions of a character using the same or |
| different dot patterns. If you use different dot patterns for the same |
| character, only the first dot pattern will be used during forward |
| translation. However, during back-translation, all the relevant dot |
| patterns will back-translate to the character you defined. |
| |
| You can also define a character multiple times using the same dot |
| pattern for the character, but using different character classes. The |
| following example would define the character @samp{*} (star) as both |
| @opcoderef{math} and @opcoderef{sign}. |
| |
| @example |
| math * 16 |
| sign * 16 |
| @end example |
| |
| Likewise, you can define multiple characters as the same dot pattern. |
| The characters you define this way will be forward translated to the |
| same dot pattern. However, when back-translating, the dot pattern will |
| always back-translate to the first character that was defined with |
| this pattern. |
| |
| This technique may be useful when defining characters that have one |
| representation in the Windows character set (CP1252) and another |
| representation in the Unicode character set, e.g. the Euro sign, |
| @samp{€}. It may also be of use when you have to define several |
| variants of the same letter with different accents, which may be |
| represented in your Braille code by the same dot pattern. This is a |
| very common practice for accented letters that are foreign to the |
| Braille code. In the following example using the @opcoderef{uplow} |
| opcode, both e acute (@samp{é}) and e grave (@samp{è}) are defined as |
| dot 4 followed by dots 1 and 5. |
| |
| @example |
| uplow \x00c9\x00e9 4-15 # E acute |
| uplow \x00c8\x00e8 4-15 # E grave |
| @end example |
| |
| In this example, the dot pattern would always back-translate to e |
| acute, since this is the first definition. You could use the |
| @opcoderef{correct} opcode to correct at least the most common errors |
| on that account. However, there is no fail-safe way to know what |
| accented letter to use when you back-translate from a dot pattern |
| representing more than one variant. |
| |
| @table @code |
| @opcode{space, character dots} |
| Defines a character as a space and also defines the dot pattern as |
| such. for example: |
| |
| @example |
| space \s 0 \s is the escape sequence for blank; 0 means no dots. |
| @end example |
| |
| @opcode{punctuation, character dots} |
| Associates a punctuation mark in the particular language with a |
| braille representation and defines the character and dot pattern as |
| punctuation. For example: |
| |
| @example |
| punctuation . 46 dot pattern for period in NAB computer braille |
| @end example |
| |
| @opcode{digit, character dots} |
| Associates a digit with a dot pattern and defines the character as a |
| digit. For example: |
| |
| @example |
| digit 0 356 NAB computer braille |
| @end example |
| |
| @opcode{uplow, characters dots [@comma{}dots]} |
| The characters operand must be a pair of letters, of which the first |
| is uppercase and the second lowercase. The first dots suboperand |
| indicates the dot pattern for the upper-case letter. It may have more |
| than one cell. The second dots suboperand must be separated from the |
| first by a comma and is optional, as indicated by the square brackets. |
| If present, it indicates the dot pattern for the lower-case letter. It |
| may also have more than one cell. If the second dots suboperand is not |
| present the first is used for the lower-case letter as well as the |
| upper-case letter. This opcode is needed because not all languages |
| follow a consistent pattern in assigning Unicode codes to upper and |
| lower case letters. It should be used even for languages that do. The |
| distinction is important in the forward translator. for example: |
| |
| @example |
| uplow Aa 17,1 |
| @end example |
| |
| @opcode{grouping, name characters dots @comma{}dots} |
| This opcode is used to indicate pairs of grouping symbols used in |
| processing mathematical expressions. These symbols are usually |
| generated by the MathML interpreter in liblouisutdml. They are used in |
| multipass opcodes. The name operand must contain only letters (a-z and |
| A-Z). The letters may be upper or lower-case but the case matters. The |
| characters operand must contain exactly two Unicode characters. The |
| dots operand must contain exactly two braille cells, separated by a |
| comma. Note that grouping dot patterns also need to be declared with |
| the @opcoderef{exactdots}. The characters may need to be declared with |
| the @opcoderef{math}. |
| |
| @example |
| grouping mrow \x0001\x0002 1e,2e |
| grouping mfrac \x0003\x0004 3e,4e |
| @end example |
| |
| @opcode{letter, character dots} |
| Associates a letter in the language with a braille representation and |
| defines the character as a letter. This is intended for letters which |
| are neither uppercase nor lowercase. |
| |
| @opcode{lowercase, character dots} |
| Associates a character with a dot pattern and defines the character as |
| a lowercase letter. Both the character and the dot pattern have the |
| attributes lowercase and letter. |
| |
| @opcode{uppercase, character dots} |
| Associates a character with a dot pattern and defines the character as |
| an uppercase letter. Both the character and the dot pattern have the |
| attributes uppercase and letter. @code{lowercase} and @code{uppercase} |
| should be used when a letter has only one case. Otherwise use the |
| @opcoderef{uplow}. |
| |
| @opcode{litdigit, digit dots} |
| Associates a digit with the dot pattern which should be used to |
| represent it in literary texts. For example: |
| |
| @example |
| litdigit 0 245 |
| litdigit 1 1 |
| @end example |
| |
| @opcode{sign, character dots} |
| Associates a character with a dot pattern and defines both as a sign. |
| This opcode should be used for things like at sign (@samp{@@}), |
| percent (@samp{%}), dollar sign (@samp{$}), etc. Do not use it to |
| define ordinary punctuation such as period and comma. For example: |
| |
| @example |
| sign % 4-25-1234 literary percent sign |
| @end example |
| |
| @opcode{math, character dots} |
| Associates a character and a dot pattern and defines them as a |
| mathematical symbol. It should be used for less than (@samp{<}), |
| greater than(@samp{>}), equals(@samp{=}), plus(@samp{+}), etc. For |
| example: |
| |
| @example |
| math + 346 plus |
| @end example |
| |
| @end table |
| |
| @node Braille Indicator Opcodes |
| @section Braille Indicator Opcodes |
| |
| Braille indicators are dot patterns which are inserted into the |
| braille text to indicate such things as capitalization, italic type, |
| computer braille, etc. The opcodes which define them are followed only |
| by a dot pattern, which may be one or more cells. |
| |
| @table @code |
| @opcode{capsletter, dots} |
| The dot pattern which indicates capitalization of a single letter. In |
| English, this is dot 6. For example: |
| |
| @example |
| capsletter 6 |
| @end example |
| |
| @opcode{begcapsword, dots} |
| The dot pattern which begins a block of capital letters at the |
| beginning or within a word. The block is automatically terminated |
| by any character that is not a capital letter, e.g. small letters, |
| punctuation, numbers etc. |
| |
| Apart from capital letters, you can define a list of characters that |
| can appear within a word in capitals without terminating the block. |
| Do this by using the @opcoderef{capsmodechars} opcode. |
| |
| Example: |
| |
| @example |
| begcapsword 6-6 |
| @end example |
| |
| @opcode{endcapsword, dots} |
| The dot pattern which ends a block of capital letters within a word. |
| It is used in cases where the block is not terminated automatically |
| by a word boundary, a number or punctuation. A common case is when |
| an uppercase block is followed directly by a lowercase letter. |
| |
| For example: |
| |
| @example |
| endcapsword 6-3 |
| @end example |
| |
| @opcode{capsmodechars, characters} |
| |
| Normally, any character other than a capital letter will cancel the |
| @opcoderef{begcapsword} indicator. However, by using the |
| @code{capsmodechars} opcode, you can specify a list of characters |
| that are legal within a capitalized word. In some Braille codes, |
| this might be the case for the hyphen character, @samp{-}. |
| |
| Example: |
| |
| @example |
| capsmodechars - |
| @end example |
| |
| @opcode{begcaps, dots} |
| The dot pattern which begins a block of capital letters defined by the |
| provided @code{typeform} without regard for any other rules. |
| This construct is sometimes also called a capsphrase. It is used |
| in some Braille codes to mark a whole phrase or sentence as capital |
| letters. The block can contain capital letters as well as |
| none-alphabetic characters, punctuation, numbers etc. The |
| block is terminated when a small letter is encountered or at the end of the input string. |
| |
| Example: |
| |
| @example |
| begcaps 6-6-6 |
| @end example |
| |
| @opcode{endcaps, dots} |
| The dot pattern which ends a block of capital letters defined by the |
| provided @code{typeform} without regard for any other rules. For |
| example: |
| |
| @example |
| endcaps 6-3 |
| @end example |
| |
| @opcode{letsign, dots} |
| This indicator is needed in Grade 2 to show that a single letter is |
| not a contraction. It is also used when an abbreviation happens to be |
| a sequence of letters that is the same as a contraction. For example: |
| |
| @example |
| letsign 56 |
| @end example |
| |
| @opcode{noletsign, letters} |
| |
| The letters in the operand will not be proceeded by a letter sign. |
| More than one @code{noletsign} opcode can be used. This is equivalent |
| to a single entry containing all the letters. In addition, if a single |
| letter, such as @samp{a} in English, is defined as a @code{word} |
| (@pxref{word opcode,word,@code{word}}) or @code{largesign} |
| (@pxref{largesign opcode,largesign,@code{largesign}}), it will be |
| treated as though it had also been specified in a @code{noletsign} |
| entry. |
| |
| @opcode{noletsignbefore, characters} |
| If any of the characters proceeds a single letter without a space a |
| letter sign is not used. By default the characters apostrophe |
| (@samp{'}) and period (@samp{.}) have this property. Use of a |
| @code{noletsignbefore} entry cancels the defaults. If more than one |
| @code{noletsignbefore} entry is used, the characters in all entries |
| are combined. |
| |
| @opcode{noletsignafter, characters} |
| If any of the characters follows a single letter without a space a |
| letter sign is not used. By default the characters apostrophe |
| (@samp{'}) and period (@samp{.}) have this property. Use of a |
| @code{noletsignafter} entry cancels the defaults. If more than one |
| @code{noletsignafter} entry is used the characters in all entries are |
| combined. |
| |
| @opcode{nocontractsign, dots} |
| |
| The dots in this opcode are used to indicate a letter or a sequence of |
| letters that are not a contraction, e.g. @samp{CD}. The opcode is |
| similar to the @opcoderef{letsign}. |
| |
| @c FIXME: In what way is the nocontractsign opcode different from the |
| @c letsign opcode, apart from apparently being a more focused version of |
| @c letsign? |
| |
| @opcode{numsign, dots} |
| The translator inserts this indicator before numbers made up of digits |
| defined with the @opcoderef{litdigit} to show that they are a number |
| and not letters or some other symbols. A number is terminated when a |
| space, a letter or any other none-@opcoderef{litdigit} character is |
| encountered. |
| |
| You can define characters or strings to be part of a number by using |
| the @opcoderef{midnum} opcode, the @opcoderef{numericmodechars} opcode |
| or the @opcoderef{midendnumericmodechars} opcode. |
| |
| Example: |
| |
| @example |
| numsign 3456 |
| @end example |
| |
| @opcode{numericnocontchars, characters} |
| |
| This opcode specifies the characters that require a |
| @opcoderef{nocontractsign} if they appear after a number with no |
| intervening space, e.g. @samp{1a} or @samp{2-B}. |
| |
| These characters will typically be the letters a-j, which usually |
| constitute the literary digits (see @opcoderef{litdigit}). However, |
| in some Braille codes, all letters fall in this category. |
| |
| Please, note that this opcode is case sensitive. So, if you need a |
| @opcoderef{nocontractsign} to also appear before the capital letters |
| A-j, you should include these letters in the definition. This is |
| especially relevant if you are also using the @opcoderef{begcaps} |
| and @opcoderef{endcaps} opcodes. In this case, you might otherwise |
| end up having numbers immediately followed by capital letters with no |
| indicator between. |
| |
| Example: |
| |
| @example |
| numericnocontchars abcdefghij |
| @end example |
| |
| @opcode{numericmodechars, characters} |
| |
| @opcode{midendnumericmodechars, characters} |
| |
| Any of these characters can appear within a number without terminating |
| the effect of the number sign (@pxref{numsign |
| opcode,numsign,@code{numsign}}). In other words, they don't cancel |
| numeric mode. |
| |
| The difference between the two opcodes is that |
| @opcoderef{numericmodechars} characters can appear anywhere in a |
| number whereas @opcoderef{midendnumericmodechars} characters can |
| appear only in the middle or at the end of a number. Like |
| @code{midendnumericmodechars}, @code{numericmodechars} characters keep |
| numeric mode active, but in addition they activate numeric mode |
| immediately when at least one digit follows, and the number sign will |
| precede the @code{numericmodechars} character in this case. |
| |
| Example: |
| |
| @example |
| numericmodechars ., |
| midendnumericmodechars -/ |
| @end example |
| |
| @end table |
| |
| @node Emphasis Opcodes |
| @section Emphasis Opcodes |
| |
| In many braille systems emphasis such as bold, italics or underline is |
| indicated using special dot patterns that mark the start and often |
| also the end. For some languages these braille indicators differ |
| depending on the context, i.e.@: here is an separate indicator for an |
| emphasized word and another one for an emphasized phrase. To |
| accommodate for all these usage scenarios liblouis provides a number of |
| opcodes for various contexts. |
| |
| At the same time some braille systems use different indicators for |
| different kinds of emphasis while others know only one kind of |
| emphasis. For that reason liblouis doesn't hard code any emphasis but |
| the table author defines which kind of emphasis exist for a specific |
| language using the @opcoderef{emphclass} opcode. |
| |
| @menu |
| * Emphasis class:: |
| * Contexts:: |
| * Fallback behavior:: |
| * Computer braille:: |
| @end menu |
| |
| @node Emphasis class |
| @subsection Emphasis class |
| |
| The @code{emphclass} opcode defines the classes of emphasis that are |
| relevant for a particular language. For all emphasis that need special |
| indicators an emphasis class has to be declared. |
| |
| @table @code |
| @opcode{emphclass, <emphasis class>} |
| Define an emphasis class to be used later in other emphasis related |
| opcodes in the table. |
| |
| @example |
| emphclass italic |
| emphclass underline |
| emphclass bold |
| emphclass transnote |
| @end example |
| |
| @end table |
| |
| @node Contexts |
| @subsection Contexts |
| |
| In order to understand the capabilities of Liblouis for emphasis |
| handling we have to look at the different contexts that are supported. |
| |
| @menu |
| * None:: |
| * Letter:: |
| * Word:: |
| * Phrase:: |
| * Symbol:: |
| @end menu |
| |
| @node None |
| @subsubsection None |
| |
| For some languages there is no such concept as contexts. Emphasis is |
| always handled the same regardless of context. There is simply an |
| indicator for the beginning of emphasis and another one for the end of |
| the emphasis. |
| |
| @table @code |
| @opcode{begemph, <emphasis class> <dot pattern>} |
| Braille dot pattern to indicate the beginning of emphasis. |
| |
| @example |
| begemph italic 46-3 |
| @end example |
| |
| @opcode{endemph, <emphasis class> <dot pattern>} |
| Braille dot pattern to indicate the end of emphasis. |
| |
| @example |
| endemph italic 46-36 |
| @end example |
| |
| @end table |
| |
| @node Letter |
| @subsubsection Letter |
| |
| Some languages have special indicators for single letter emphasis. |
| |
| @table @code |
| @opcode{emphletter, <emphasis class> <dot pattern>} |
| Braille dot pattern to indicate that the next character is emphasized. |
| |
| @example |
| emphletter italic 46-25 |
| @end example |
| |
| @end table |
| |
| @node Word |
| @subsubsection Word |
| |
| Many languages have special indicators for emphasized words. Usually |
| they start at the beginning of the word and and implicitly, i.e.@: |
| without a closing indicator at the end of the word. There are also use |
| cases where the emphasis starts in the middle of the word and an |
| explicit closing indicator is required. |
| |
| @table @code |
| @opcode{begemphword, <emphasis class> <dot pattern>} |
| Braille dot pattern to indicate the beginning of an emphasized word |
| or the beginning of emphasized characters within a word. |
| |
| @example |
| begemphword underline 456-36 |
| @end example |
| |
| @opcode{endemphword, <emphasis class> <dot pattern>} |
| Generally emphasis with word context ends when the word ends. However |
| when an indication is required to close a word emphasis then this |
| opcode defines the Braille dot pattern that indicates the end of a word |
| emphasis. |
| |
| @example |
| endemphword transnote 6-3 |
| @end example |
| |
| If emphasis ends in the middle of a word the Braille dot pattern |
| defined in this opcode is also used. |
| |
| @opcode{emphmodechars, characters} |
| |
| Normally, only space characters will cancel the |
| @opcoderef{begemphword} indicator. However, by using the |
| @code{emphmodechars} opcode, you can specify the list of characters |
| that are legal within a emphasized word. If @code{emphmodechars} is |
| specified, any character that is not in this list and is not a |
| @code{letter} will cancel the @opcoderef{begemphword} indicator. |
| |
| Example: |
| |
| @example |
| emphmodechars - |
| @end example |
| |
| @end table |
| |
| @node Phrase |
| @subsubsection Phrase |
| |
| Many languages have a concept of a phrase where the emphasis is valid |
| for a number of words. The beginning of the phase is indicated with a |
| braille dot pattern and a closing indicator is put before or after the |
| last word of the phrase. To define how many words are considered a |
| phrase in your language use the @opcoderef{lenemphphrase}. |
| |
| @table @code |
| @opcode{begemphphrase, <emphasis class> <dot pattern>} |
| Braille dot pattern to indicate the beginning of a phrase. |
| |
| @example |
| begemphphrase bold 456-46-46 |
| @end example |
| |
| @c define a special opcode macro that can handle the two-word nature |
| @c of the endemphphrase opcode |
| @macro endemphphraseopcode{where} |
| @opcodeindex endemphphrase \where\ |
| @anchor{endemphphrase \where\ opcode} |
| @item endemphphrase <emphasis class> \where\ <dot pattern> |
| @end macro |
| |
| @endemphphraseopcode{before} |
| Braille dot pattern to indicate the end of a phrase. The closing indicator |
| will be placed before the last word of the phrase. |
| |
| @example |
| endemphphrase bold before 456-46 |
| @end example |
| |
| @endemphphraseopcode{after} |
| Braille dot pattern to indicate the end of a phrase. The closing |
| indicator will be placed after the last word of the phrase. If both |
| @code{endemphphrase <emphasis class> before} and @code{endemphphrase |
| <emphasis class> after} are defined an error will be signaled. |
| |
| @example |
| endemphphrase underline after 6-3 |
| @end example |
| |
| @opcode{lenemphphrase, <emphasis class> <number>} |
| Define how many words are required before a sequence of words is |
| considered a phrase. |
| |
| @example |
| lenemphphrase underline 3 |
| @end example |
| |
| @end table |
| |
| @node Symbol |
| @subsubsection Symbol |
| UEB has a concept of symbols that need special indication. When the |
| translator detects an emphasis sequence that needs to be indicated |
| with the rules for a symbol then it will use the dots defined with the |
| @opcoderef{emphletter}. To indicate the end of the symbol it will use |
| the dots defined in the @opcoderef{endemphword}. |
| |
| @node Fallback behavior |
| @subsection Fallback behavior |
| |
| Many braille systems either handle emphasis using no contexts or |
| otherwise by employing a combination of the letter, word and phrase |
| contexts. So if a table defines any opcodes for the letter, word or |
| phrase contexts then liblouis will signal an error for opcodes that |
| define emphasis with no context. In other words contrary to previous |
| versions of liblouis there is no fallback behavior. |
| |
| As a consequence, there will only be emphasis for a context when the |
| table defines it. So for example when defining a braille dot pattern |
| for phrases and not for words liblouis will not indicate emphasis on |
| words that aren't part of a phrase. |
| |
| @node Computer braille |
| @subsection Computer braille |
| |
| For computer braille there are only two braille indicators, for the |
| beginning and end of a sequence of characters to be rendered in |
| computer braille. Such a sequence may also have other emphasis. The |
| computer braille indicators are applied not only when computer braille |
| is indicated in the @code{typeform} parameter, but also when a |
| sequence of characters is determined to be computer braille because it |
| contains a subsequence defined by the @opcoderef{compbrl}. |
| |
| @node Special Symbol Opcodes |
| @section Special Symbol Opcodes |
| |
| These opcodes define certain symbols, such as the decimal point, which |
| require special treatment. |
| |
| @table @code |
| @opcode{decpoint, character dots} |
| |
| This opcode defines the decimal point. It is useful if your Braille |
| code requires the decimal separator to show as a dot pattern different |
| from the normal representation of this character, i.e.@: period or |
| comma. In addition, it allows the notation @samp{.001} to be |
| translated correctly. This notation is common in some languages |
| instead of @samp{0.001} (no leading 0). When you use the |
| @code{decpoint} opcode, the decimal point will be taken to be part of |
| the number and correctly preceded by number sign. |
| |
| The character operand must have only one character. For example, in |
| @file{en-us-g1.ctb} we have: |
| |
| @example |
| decpoint . 46 |
| @end example |
| |
| @opcode{hyphen, character dots} |
| This opcode defines the hyphen, that is, the character used in |
| compound words such as @samp{have-nots}. The back-translator uses it |
| to determine the end of individual words. |
| |
| @end table |
| |
| @node Special Processing Opcodes |
| @section Special Processing Opcodes |
| |
| These opcodes cause special processing to be carried out. |
| |
| @table @code |
| @opcode{capsnocont,} |
| This opcode has no operands. If it is specified, words or parts of |
| words in all caps are not contracted. This is needed for languages |
| such as Norwegian. |
| |
| Note: If you use the capsnocont opcode and do not define the |
| @opcoderef{begcapsword} indicator, every cap will be marked with the |
| @opcoderef{capsletter} indicator. This is useful if you need to process caps |
| separately in a later pass. |
| |
| @end table |
| |
| @node Translation Opcodes |
| @section Translation Opcodes |
| |
| These opcodes define the braille representations for character |
| sequences. Each of them defines an entry within the contraction table. |
| These entries may be defined in any order except, as noted below, when |
| they define alternate representations for the same character sequence. |
| |
| Each of these opcodes specifies a condition under which the |
| translation is legal, and each also has a characters operand and a |
| dots operand. The text being translated is processed strictly from |
| left to right, character by character, with the most eligible entry |
| for each position being used. If there is more than one eligible entry |
| for a given position in the text, then the one with the longest |
| character string is used. If there is more than one eligible entry for |
| the same character string, then the one defined first is is tested for |
| legality first. (This is the only case in which the order of the |
| entries makes a difference.) |
| |
| The characters operand is a sequence or string of characters preceded |
| and followed by whitespace. Each character can be entered in the |
| normal way, or it can be defined as a four-digit hexadecimal number |
| preceded by @samp{\x}. |
| |
| The dots operand defines the braille representation for the characters |
| operand. It may also be specified as an equals sign (@samp{=}). This |
| means that the the default representation for each character |
| (@pxref{Character-Definition Opcodes}) within the sequence is to be |
| used. It is an error if not all the characters in the rule have been |
| previously defined in a character-definition rule. Note that the |
| @samp{=} shortcut for dot patterns has a known bug@footnote{See |
| @url{https://github.com/liblouis/liblouis/issues/500#issuecomment-365753137}.} |
| that might cause problems when back-translating. |
| |
| In what follows the word @samp{characters} means a sequence of one or |
| more consecutive letters between spaces and/or punctuation marks. |
| |
| @table @code |
| |
| @opcode{noback, opcode ...} |
| This is an opcode prefix, that is to say, it modifies the operation of |
| the opcode that follows it on the same line. noback specifies that |
| back-translation is not to use information on this line. |
| |
| @example |
| noback always ;\s; 0 |
| @end example |
| |
| @opcode{nofor, opcode ...} |
| This is an opcode prefix which modifies the operation of the opcode |
| following it on the same line. nofor specifies that forward translation |
| is not to use the information on this line. |
| |
| @opcode{compbrl, characters} |
| If the characters are found within a block of text surrounded by |
| whitespace the entire block is translated according to the default |
| braille representations defined by the @ref{Character-Definition |
| Opcodes}, if 8-dot computer braille is enabled or according to the dot |
| patterns given in the @opcoderef{comp6}, if 6-dot computer braille is |
| enabled. For example: |
| |
| @example |
| compbrl www translate URLs in computer braille |
| @end example |
| |
| @opcode{comp6, character dots} |
| This opcode specifies the translation of characters in 6-dot computer |
| braille. It is necessary because the translation of a single character |
| may require more than one cell. The first operand must be a character |
| with a decimal representation from 0 to 255 inclusive. The second |
| operand may specify as many cells as necessary. The opcode is somewhat |
| of a misnomer, since any dots, not just dots 1 through 6, can be |
| specified. This even includes virtual dots (@pxref{virtual dots}). |
| |
| @opcode{nocont, characters} |
| Like @code{compbrl}, except that the string is uncontracted. |
| @opcoderef{prepunc} and @opcoderef{postpunc} rules are applied, |
| however. This is useful for specifying that foreign words should not |
| be contracted in an entire document. |
| |
| @opcode{replace, characters @{characters@}} |
| Replace the first set of characters, no matter where they appear, with |
| the second. Note that the second operand is @emph{NOT} a dot pattern. |
| It is also optional. If it is omitted the character(s) in the first |
| operand will be discarded. This is useful for ignoring characters. It |
| is possible that the "ignored" characters may still affect the |
| translation indirectly. Therefore, it is preferable to use |
| @opcoderef{correct}. |
| |
| @opcode{always, characters dots} |
| Replace the characters with the dot pattern no matter where they |
| appear. Do @emph{NOT} use an entry such as @code{always a 1}. Use the |
| @code{uplow}, @code{letter}, etc. character definition opcodes |
| instead. For example: |
| |
| @example |
| always world 456-2456 unconditional translation |
| @end example |
| |
| @opcode{repeated, characters dots} |
| Replace the characters with the dot pattern no matter where they |
| appear. Ignore any consecutive repetitions of the same character |
| sequence. This is useful for shortening long strings of spaces or |
| hyphens or periods. For example: |
| |
| @example |
| repeated --- 36-36-36 shorten separator lines made with hyphens |
| @end example |
| |
| @opcode{repword, characters dots} |
| When characters are encountered check to see if the word before this |
| string matches the word after it. If so, replace characters with dots |
| and eliminate the second word and any word following another |
| occurrence of characters that is the same. This opcode is used in |
| Malaysian braille. In this case the rule is: |
| |
| @example |
| repword - 123456 |
| @end example |
| |
| @opcode{largesign, characters dots} |
| Replace the characters with the dot pattern no matter where they |
| appear. In addition, if two words defined as large signs follow each |
| other, remove the space between them. For example, in |
| @file{en-us-g2.ctb} the words @samp{and} and @samp{the} are both |
| defined as large signs. Thus, in the phrase @samp{the cat and the dog} |
| the space would be deleted between @samp{and} and @samp{the}, with the |
| result @samp{the cat andthe dog}. Of course, @samp{and} and @samp{the} |
| would be properly contracted. The term @code{largesign} is a bit of |
| braille jargon that pleases braille experts. |
| |
| @opcode{word, characters dots} |
| Replace the characters with the dot pattern if they are a word, that |
| is, are surrounded by whitespace and/or punctuation. |
| |
| @opcode{syllable, characters dots} |
| As its name indicates, this opcode defines a "syllable" which must be |
| represented by exactly the dot patterns given. Contractions may not |
| cross the boundaries of this "syllable" either from left or right. The |
| character string defined by this opcode need not be a lexical |
| syllable, though it usually will be. The equal sign in the following |
| example means that the the default representation for each character |
| within the sequence is to be used (@pxref{Translation Opcodes}): |
| |
| @example |
| syllable horse = sawhorse, horseradish |
| @end example |
| |
| @opcode{nocross, characters dots} |
| Replace the characters with the dot pattern if the characters are all |
| in one syllable (do not cross a syllable boundary). For this opcode to |
| work, a hyphenation table must be included. If this is not done, |
| @code{nocross} behaves like the @opcoderef{always}. For example, if |
| the English Grade 2 table is being used and the appropriate |
| hyphenation table has been included @code{nocross sh 146} will cause |
| the @samp{sh} in @samp{monkshood} not to be contracted. |
| |
| @opcode{joinword, characters dots} |
| Replace the characters with the dot pattern if they are a word which |
| is followed by whitespace and a letter. In addition remove the |
| whitespace. For example, @file{en-us-g2.ctb} has @code{joinword to |
| 235}. This means that if the word @samp{to} is followed by another |
| word the contraction is to be used and the space is to be omitted. If |
| these conditions are not met, the word is translated according to any |
| other opcodes that may apply to it. |
| |
| @opcode{lowword, characters dots} |
| Replace the characters with the dot pattern if they are a word |
| preceded and followed by whitespace. No punctuation either before or |
| after the word is allowed. The term @code{lowword} derives from the |
| fact that in English these contractions are written in the lower part |
| of the cell. For example: |
| |
| @example |
| lowword were 2356 |
| @end example |
| |
| @opcode{contraction, characters} |
| If you look at @file{en-us-g2.ctb} you will see that some words are |
| actually contracted into some of their own letters. A famous example |
| among braille transcribers is @samp{also}, which is contracted as |
| @samp{al}. But this is also the name of a person. To take another |
| example, @samp{altogether} is contracted as @samp{alt}, but this is |
| the abbreviation for the alternate key on a computer keyboard. |
| Similarly @samp{could} is contracted into @samp{cd}, but this is the |
| abbreviation for compact disk. To prevent confusion in such cases, the |
| letter sign (see @opcoderef{letsign}) is placed before such letter |
| combinations when they actually are abbreviations, not contractions. |
| The @code{contraction} opcode tells the translator to do this. |
| |
| @opcode{sufword, characters dots} |
| Replace the characters with the dot pattern if they are either a word |
| or at the beginning of a word. |
| |
| @opcode{prfword, characters dots} |
| Replace the characters with the dot pattern if they are either a word |
| or at the end of a word. |
| |
| @opcode{begword, characters dots} |
| Replace the characters with the dot pattern if they are at the |
| beginning of a word. |
| |
| @opcode{begmidword, characters dots} |
| Replace the characters with the dot pattern if they are either at the |
| beginning or in the middle of a word. |
| |
| @opcode{midword, characters dots} |
| Replace the characters with the dot pattern if they are in the middle |
| of a word. |
| |
| @opcode{midendword, characters dots} |
| Replace the characters with the dot pattern if they are either in the |
| middle or at the end of a word. |
| |
| @opcode{endword, characters dots} |
| Replace the characters with the dot pattern if they are at the end of |
| a word. |
| |
| @opcode{partword, characters dots} |
| Replace the characters with the dot pattern if the characters are |
| anywhere in a word, that is, if they are proceeded or followed by a |
| letter. |
| |
| @opcode{exactdots, @@dots} |
| Note that the operand must begin with an at sign (@samp{@@}). The dot |
| pattern following it is evaluated for validity. If it is valid, |
| whenever an at sign followed by this dot pattern appears in the source |
| document it is replaced by the characters corresponding to the dot |
| pattern in the output. This opcode is intended for use in liblouisutdml |
| semantic-action files to specify exact dot patterns, as in |
| mathematical codes. For example: |
| |
| @example |
| exactdots @@4-46-12356 |
| @end example |
| will produce the characters with these dot patterns in the output. |
| |
| @opcode{prepunc, characters dots} |
| Replace the characters with the dot pattern if they are part of |
| punctuation at the beginning of a word. |
| |
| @opcode{postpunc, characters dots} |
| Replace the characters with the dot pattern if they are part of |
| punctuation at the end of a word. |
| |
| @opcode{begnum, characters dots} |
| Replace the characters with the dot pattern if they are at the |
| beginning of a number, that is, before all its digits. For example, in |
| @file{en-us-g1.ctb} we have @code{begnum # 4}. |
| |
| @opcode{midnum, characters dots} |
| Replace the characters with the dot pattern if they are in the middle |
| of a number. For example, @file{en-us-g1.ctb} has @code{midnum . 46}. |
| This is because the decimal point has a different dot pattern than the |
| period. |
| |
| @opcode{endnum, characters dots} |
| Replace the characters with the dot pattern if they are at the end of |
| a number. For example @file{en-us-g1.ctb} has @code{endnum th 1456}. |
| This handles things like @samp{4th}. A letter sign is @emph{NOT} |
| inserted. |
| |
| @opcode{joinnum, characters dots} |
| Replace the characters with the dot pattern. In addition, if |
| whitespace and a number follows omit the whitespace. This opcode can |
| be used to join currency symbols to numbers for example: |
| |
| @example |
| joinnum \x20AC 15 (EURO SIGN) |
| joinnum \x0024 145 (DOLLAR SIGN) |
| joinnum \x00A3 1234 (POUND SIGN) |
| joinnum \x00A5 13456 (YEN SIGN) |
| @end example |
| |
| @end table |
| |
| @node Character-Class Opcodes |
| @section Character-Class Opcodes |
| |
| These opcodes define and use character classes. A character class |
| associates a set of characters with a name. The name then refers to |
| any character within the class. A character may belong to more than |
| one class. |
| |
| The basic character classes correspond to the character definition |
| opcodes, with the exception of the @opcoderef{uplow}, which defines |
| characters belonging to the two classes @code{uppercase} and |
| @code{lowercase}. These classes are: |
| |
| @table @code |
| @item space |
| Whitespace characters such as blank and tab |
| @item digit |
| Numeric characters |
| @item letter |
| Both uppercase and lowercase alphabetic characters |
| @item lowercase |
| Lowercase alphabetic characters |
| @item uppercase |
| Uppercase alphabetic characters |
| @item punctuation |
| Punctuation marks |
| @item sign |
| Signs such as percent (@samp{%}) |
| @item math |
| Mathematical symbols |
| @item litdigit |
| Literary digit |
| @item undefined |
| Not properly defined |
| |
| @end table |
| |
| The opcodes which define and use character classes are shown below. |
| For examples see @file{el.ctb}. |
| |
| @table @code |
| |
| @opcode{class, name characters} |
| Define a new character class. The name operand must contain only |
| letters (a-z and A-Z). The letters may be upper or lower-case but the |
| case matters. The characters operand must be specified as a string. A |
| character class may not be used until it has been defined. |
| |
| @opcode{after, class opcode ...} |
| The specified opcode is further constrained in that the matched |
| character sequence must be immediately preceded by a character |
| belonging to the specified class. If this opcode is used more than |
| once on the same line then the union of the characters in all the |
| classes is used. |
| |
| @opcode{before, class opcode ...} |
| The specified opcode is further constrained in that the matched |
| character sequence must be immediately followed by a character |
| belonging to the specified class. If this opcode is used more than |
| once on the same line then the union of the characters in all the |
| classes is used. |
| |
| @end table |
| |
| @node Swap Opcodes |
| @section Swap Opcodes |
| |
| The swap opcodes are needed to tell the @opcoderef{context}, the |
| @opcoderef{correct} and multipass opcodes which dot patterns to swap |
| for which characters. There are three, @code{swapcd}, @code{swapdd} |
| and @code{swapcc}. The first swaps dot patterns for characters. The |
| second swaps dot patterns for dot patterns and the third swaps |
| characters for characters. The first is used in the @code{context} |
| opcode and the second is used in the multipass opcodes. |
| |
| All the swap opcodes have a name so they can be refered to from the |
| @code{context}, @code{correct} and multipass opcodes. The name operand |
| must contain only letters (a-z and A-Z). The letters may be upper or |
| lower-case but the case matters. |
| |
| Dot patterns are separated by commas and may contain more than one |
| cell. |
| |
| @table @code |
| |
| @opcode{swapcd, name characters dots@comma{} dots@comma{} dots@comma{} ...} |
| See above paragraph for explanation. For example: |
| |
| @example |
| swapcd dropped 0123456789 356,2,23,... |
| @end example |
| |
| @opcode{swapdd, name dots@comma{} dots@comma{} dots ... dotpattern1@comma{} dotpattern2@comma{} dotpattern3@comma{} ...} |
| The @code{swapdd} opcode defines substitutions for the multipass |
| opcodes. In the second operand the dot patterns must be single cells, |
| but in the third operand multi-cell dot patterns are allowed. This is |
| because multi-cell patterns in the second operand would lead to |
| ambiguities. |
| |
| @opcode{swapcc, name characters characters} |
| The @code{swapcc} opcode swaps characters in its second operand for |
| characters in the corresponding places in its third operand. It is |
| intended for use with @code{correct} opcodes and can solve problems |
| such as formatting phone numbers. |
| |
| @end table |
| |
| @node The Context and Multipass Opcodes |
| @section The Context and Multipass Opcodes |
| |
| The @code{context} and multipass opcodes (@code{pass2}, @code{pass3} |
| and @code{pass4}) provide translation capabilities beyond those of the |
| basic translation opcodes (@pxref{Translation Opcodes}) discussed |
| previously. The multipass opcodes cause additional passes to be made |
| over the string to be translated. The number after the word |
| @code{pass} indicates in which pass the entry is to be applied. If no |
| multipass opcodes are given, only the first translation pass is made. |
| The @code{context} opcode is basically a multipass opcode for the |
| first pass. It differs slightly from the multipass opcodes per se. |
| When back-translating, the passes are performed in the reverse order, |
| i.e.@: @code{pass4}, @code{pass3}, @code{pass2}, @code{context}. Each |
| of these opcodes must be prefixed by either the @opcoderef{noback} or |
| the @opcoderef{nofor}. The format of all these opcodes is @code{opcode |
| test action}. The specific opcodes are invoked as follows: |
| |
| @table @code |
| @anchor{context opcode} |
| @opcodeindex context |
| @opcodeindex pass2 |
| @opcodeindex pass3 |
| @opcodeindex pass4 |
| @item context test action |
| @itemx pass2 test action |
| @itemx pass3 test action |
| @itemx pass4 test action |
| @end table |
| |
| The @code{test} and @code{action} operands have suboperands. Each |
| suboperand begins with a non-alphanumeric character and ends when |
| another non-alphanumeric character is encountered. The suboperands and |
| their initial characters are as follows. |
| |
| @table @kbd |
| @item " (double quote) |
| a string of characters. This string must be terminated by another |
| double quote. It may contain any characters. If a double quote is |
| needed within the string, it must be preceded by a backslash |
| (@samp{\}). If a space is needed, it must be represented by the escape |
| sequence \s. This suboperand is valid |
| in the test and action parts of the @code{correct} opcode, |
| in the test part of the @code{context} opcode when forward translating, |
| and in the action part of the @code{context} opcode when back translating. |
| |
| @item @@ (at sign) |
| a sequence of dot patterns. Cells are separated by hyphens as usual. |
| This suboperand is valid in the test and action parts of |
| the @code{pass2}, @code{pass3}, and @code{pass4} opcodes, |
| in the action part of the @code{context} opcode when forward translating, |
| and in the test part of the @code{context} opcode when back translating. |
| |
| @item ` (accent mark) |
| If this is the beginning of the string being translated this |
| suboperand is true. It is valid only in the test part and must be the |
| first thing in this operand. |
| |
| @item ~ (tilde) |
| If this is the end of the string being translated this suboperand is |
| true. It is valid only in the test part and must be the last thing in |
| this operand. |
| |
| @item $ (dollar sign) |
| a string of attributes, such as @samp{d} for digit, @samp{l} for |
| letter, etc. For a list of all valid attributes @pxref{valid attribute |
| characters}. More than one attribute can be given. If you wish to |
| check characters with any attribute, use the letter @samp{a}. Input |
| characters are checked to see if they have at least one of the |
| attributes. The attribute string can be followed by numbers specifying |
| how many characters are to be checked. If no numbers are given, 1 is |
| assumed. If two numbers separated by a hyphen are given, the input is |
| checked to make sure that at least the first number of characters with |
| the attributes are present, but no more than the second number. If |
| only one number is present, then exactly that many characters must |
| have the attributes. A period instead of the numbers indicates an |
| indefinite number of characters (for technical reasons the number of |
| characters that are actually matched is limited to 65535). |
| |
| This suboperand is valid in all test parts but not in action parts. |
| For the characters which can be used in attribute strings, see the |
| following table. |
| |
| @item ! (exclamation point) |
| reverses the logical meaning of the suboperand which follows. For |
| example, !$d is true only if the character is @emph{NOT} a digit. This |
| suboperand is valid in test parts only. |
| |
| @item % (percent sign) |
| the name of a class defined by the @opcoderef{class} or the name of a |
| swap set defined by the swap opcodes (@pxref{Swap Opcodes}). Names |
| must contain only letters (a-z and A-Z). The letters may be upper or |
| lower-case but the case matters. Class names may be used in test parts |
| only. Swap names are valid everywhere. |
| |
| @item @{ (left brace) |
| Name: the name of a grouping pair. The left brace indicates that the |
| first (or left) member of the pair is to be used in matching. If this |
| is between replacement brackets it must be the only item. This is also |
| valid in the action part. |
| |
| The brace actions, @code{@{name} and @code{@}name}, refer to named |
| groupings. A grouping is created with the @opcoderef{grouping} and |
| contains exactly two characters which represent the opening character |
| and the matching closing character for a character grouping. The first |
| operand is the grouping name, the second is the two (opening and |
| closing) characters, and the third is the two dot patterns separated |
| by a comma. |
| |
| Let's say that you'd like to define the opening and closing |
| parentheses via multipass rules, and that you'd like to use dots |
| 123478 for the opening parenthesis and dots 145678 for the closing |
| parenthesis. One way to do so is like this: |
| |
| @example |
| grouping parentheses () 123478,145678 |
| noback correct @{parentheses @{parentheses |
| noback correct @}parentheses @}parentheses |
| @end example |
| |
| The references within the test part of the multipass rule match |
| against the characters (the second operand) of the grouping rule, and |
| the references within the action part replace with the dot patterns |
| (the third operand) of the grouping. |
| |
| @item @} (right brace) |
| Name: the name of a grouping pair. The right brace indicates that the |
| second (or right) member is to be used in matching. See the remarks on |
| the left brace immediately above. |
| |
| @item / (slash) |
| Search the input for the expression following the slash and return |
| true if found. This can be used to set a variable. |
| |
| @item _ (underscore) |
| Move backward. If a number follows, move backward that number of |
| characters. The default is to move backward one character. This |
| suboperand is valid only in test parts. The test fails if moving |
| backward beyond the beginning of the input string. |
| |
| @item [ (left bracket) |
| start replacement here. This suboperand must always be paired with a |
| right bracket and is valid only in test parts. Multiple pairs of |
| square brackets in a single expression are not allowed. |
| |
| @item ] (right bracket) |
| end replacement here. This suboperand must always be paired with a |
| left bracket and is valid only in test parts. |
| |
| @item # (number sign or crosshatch) |
| test or set a variable. Variables are referred to by numbers |
| (0 through 49), e.g. @code{#1}, @code{#2}, @code{#25}. |
| Variables may be set by one @code{context} or multipass opcode and tested |
| by another. Thus, an operation that occurs at one place in a translation |
| can tell an operation that occurs later within the same pass about itself. |
| This feature is used in math translation, and may also help to alleviate |
| the need for new opcodes. This suboperand is valid everywhere. |
| |
| Variables are set in the action part. To set a variable, use an |
| expression like @code{#1=1}. All of the variables are initialized to 0 |
| at the start of each pass. |
| |
| Variables can also be incremented and decremented by one in the action |
| part with expressions like @code{#1+} and @code{#3-} respectively. |
| An attempt to decrement a variable below 0 is silently ignored. |
| |
| Variables are tested in the test part with conditional expressions like: |
| @code{#1=2}, @code{#3<4}, @code{#5>6}, @code{#7<=8}, @code{#9>=10}. |
| |
| @item * (asterisk) |
| Copy the input characters or dot patterns within the replacement brackets |
| into the output, and discard anything else that was matched. If there are |
| no replacement brackets then copy all of the matched input. This |
| suboperand is only valid within the action part. It may be specified any |
| number of times. This feature is used, for example, for handling numeric |
| subscripts in Nemeth. |
| |
| @item ? (question mark) |
| Valid only in the action part. The characters to be replaced are |
| simply ignored. That is, they are replaced with nothing. If either |
| member of a grouping pair is in the replace brackets the other member |
| at the same level is also removed. |
| |
| @end table |
| |
| @anchor{valid attribute characters} |
| The valid characters which can be used in attribute strings are as |
| follows: |
| |
| @table @kbd |
| @item a |
| any attribute |
| @item d |
| digit |
| @item D |
| literary digit |
| @item l |
| letter |
| @item m |
| math |
| @item p |
| punctuation |
| @item S |
| sign |
| @item s |
| space |
| @item U |
| uppercase |
| @item u |
| lowercase |
| @item w |
| first user-defined class |
| @item x |
| second user-defined class |
| @item y |
| third user-defined class |
| @item z |
| fourth user-defined class |
| @end table |
| |
| The following illustrates the algorithm how text is evaluated with |
| multipass expressions: |
| |
| @noindent |
| Loop over context, pass2, pass3 and pass4 and do the following for each pass: |
| |
| @enumerate a |
| @item |
| Match the text following the cursor against all expressions in the |
| current pass. If an expression has square brackets to indicate the |
| part to be replaced, and the opening bracket would correspond with a |
| position before the cursor, it is not a match. |
| @item |
| If there is no match: shift the cursor one position to the right and |
| continue the loop |
| @item |
| If there are matches: choose the longest match |
| @item |
| Do the replacement. If the expression has square brackets, the part of |
| the input that matches the part in between the brackets is replaced |
| with the right-hand side of the rule. If the expression has no square |
| brackets, the whole match is replaced. |
| @item |
| Place the cursor after the replaced text |
| @item |
| continue loop |
| @end enumerate |
| |
| Normally, when a rule is applied, the characters in the input that the |
| rule applies to are "consumed", i.e. the position of the input string |
| is stepped forward, and the characters are no longer available for |
| subsequent rules. However, with the multipass opcodes, the |
| @opcoderef{context} opcode and the @opcoderef{correct} opcode, it is |
| possible to make rules which don't consume any characters from the |
| input. This could happen, e.g. if you use the @opcoderef{context} |
| opcode to insert a dot pattern before a special group of characters. |
| In these cases, Liblouis will always advance the position by one |
| character to make sure that the program doesn't apply a rule to the |
| same characters again and again. |
| |
| @node The correct Opcode |
| @section The correct Opcode |
| |
| @table @code |
| @opcode{correct, test action} |
| Because some input (such as that from an OCR program) may contain |
| systematic errors, it is sometimes advantageous to use a |
| pre-translation pass to remove them. The errors and their corrections |
| are specified by the @code{correct} opcode. If there are no |
| @code{correct} opcodes in a table, the pre-translation pass is not used. |
| If any back-translation corrections have been specified then they are |
| applied in a post-translation (i.e.@: the very last) pass. |
| |
| Note that like the @opcoderef{context} and multi-pass opcodes, the |
| @code{correct} opcode must be preceded by @opcoderef{noback} or |
| @opcoderef{nofor}. |
| |
| The format of the @code{correct} opcode is very similar to that |
| of the @opcoderef{context}. The only difference is that in the action |
| part strings may be used and dot patterns may not be used. Some |
| examples of @code{correct} opcode entries are: |
| |
| @example |
| noback correct "\\" ? Eliminate backslashes |
| noback correct "cornf" "comf" fix a common "scano" |
| noback correct "cornm" "comm" |
| noback correct "cornp" "comp" |
| noback correct "*" ? Get rid of stray asterisks |
| noback correct "|" ? ditto for vertical bars |
| noback correct "\s?" "?" drop space before question mark |
| @end example |
| |
| @end table |
| |
| @node The match Opcode |
| @section The match Opcode |
| |
| The match opcode is similar the multipass opcodes and can be seen as |
| the more low-level and powerful cousin to the @opcoderef{context}. |
| |
| @strong{Note:} For historical reasons despite being fairly similar in |
| syntax and functionality both the @opcoderef{context} and the |
| @opcoderef{match} exist and are in use in modern braille tables. But |
| in the future they might be merged under some common opcode. For that |
| reason consider the match opcode @emph{somewhat experimental}. |
| |
| @table @code |
| @opcode{match, pre-pattern characters post-pattern dots} |
| |
| This opcode allows for matching a string of characters via @emph{pre} |
| and @emph{post patterns}. The patterns are specified using an |
| expression syntax somewhat like regular expressions (@pxref{pattern |
| expression syntax}). A single hyphen (@samp{-}) by itself means no |
| pattern is specified. |
| |
| The following will replace @samp{xyz} with the dots |
| @samp{1346-13456-1356} when it appears in the string @samp{abxyzcd}. |
| |
| @example |
| match ab xyz cd 1346-13456-1356 |
| @end example |
| |
| The following will replace @samp{ONE} with @samp{3456-1} when it |
| starts the input and is followed by @samp{:} |
| |
| @example |
| match ^ ONE : 3456-1 |
| @end example |
| @end table |
| |
| @anchor{pattern expression syntax} |
| The @code{pre-pattern} and the @code{post-pattern} can contain |
| any of the following expressions: |
| |
| @table @samp |
| @item [ ] |
| Expression can be any of the characters between the brackets. If only |
| one character present then the brackets are not needed unless it is a |
| special character, in which it should be escaped with the backslash. |
| |
| @item . |
| Expression can be any character. |
| |
| @item %[ ] |
| Expression is a character with the attributes listed between the |
| brackets. If only one character is present then the brackets are not |
| needed. The set of attributes are specified as follows: |
| |
| @table @samp |
| @item _ |
| space |
| @item # |
| digit |
| @item a |
| letter |
| @item u |
| uppercase |
| @item l |
| lowercase |
| @item . |
| punctuation |
| @item $ |
| sign |
| @end table |
| |
| @item ^ |
| Match at the end of input processing (or beginning depending of the |
| direction pre or post). |
| |
| @item $ |
| Same as @samp{^}. |
| @end table |
| |
| For example the following will replace @samp{bb} with the dots @samp{23} when it |
| is between letters. |
| |
| @example |
| match %a bb %a 23 |
| @end example |
| |
| The following will replace @samp{con} with the dots @samp{25} when it |
| is preceded by a space or beginning of input, and followed by an |
| @samp{s} and then any letter. |
| |
| @example |
| match %[^_] con s%a 25 |
| @end example |
| |
| Similar to regular expressions the pattern expressions can contain |
| grouping, quantifiers and even negation: |
| |
| @table @samp |
| @item ( ) |
| Expressions between parentheses are grouped together as one |
| expression. |
| |
| @item ! |
| The following expression is negated. |
| |
| @item ? |
| The previous expression must match zero or one times. |
| |
| @item * |
| The previous expression must match zero or more times. |
| |
| @item + |
| The previous expression must match one or more times. |
| |
| @item | |
| Either the previous or the following expressions must match. |
| @end table |
| |
| For example the following will replace @samp{ing} with the dots |
| @samp{346} when it is @emph{not} preceded by a space or beginning of |
| input. What follows after the @samp{ing} does not matter, hence the |
| @samp{-}. |
| |
| @example |
| match !%[^_] ing - 346 |
| @end example |
| |
| The following will replace @samp{con} with the dots @samp{25} when it |
| is preceded by a space, or beginning of input; then followed by a |
| @samp{c} that is followed by any character but @samp{h}. |
| |
| @example |
| match %[^_] con c!h 25 |
| @end example |
| |
| @node Miscellaneous Opcodes |
| @section Miscellaneous Opcodes |
| |
| @table @code |
| @opcode{include, filename} |
| Read the file indicated by @code{filename} and incorporate or include |
| its entries into the table. Included files can include other files, |
| which can include other files, etc. For an example, see what files are |
| included by the entry include @file{en-us-g1.ctb} in the table |
| @file{en-us-g2.ctb}. If the included file is not in the same directory |
| as the main table, use a full path name for filename. |
| |
| @opcode{undefined, dots} |
| If this opcode is used in a table any characters which have not been |
| handled in the table but are encountered in the text will be replaced |
| by the dot pattern. If this opcode is not used, any undefined |
| characters are replaced by @code{'\xhhhh'}, where the h's are |
| hexadecimal digits. |
| |
| @opcode{display, character dots} |
| Associates dot patterns with the characters which will be sent to a |
| braille embosser, display or screen font. The character must be in the |
| range 0-255 and the dots must specify a single cell. Here are some |
| examples: |
| |
| @example |
| # When the character a is sent to the embosser or display, |
| # it will produce a dot 1. |
| display a 1 |
| @end example |
| |
| @example |
| # When the character L is sent to the display or embosser |
| # it will produce dots 1-2-3. |
| display L 123 |
| @end example |
| |
| The @code{display} opcode is optional. It is used when the embosser or |
| display has a different mapping of characters to dot patterns than |
| that given in @ref{Character-Definition Opcodes}. If used, display |
| entries must proceed character-definition entries. |
| |
| A possible use case would be to define display opcodes so that the |
| result is Unicode braille for use on a display and a second set of |
| display opcodes (in a different file) to produce plain ASCII braille |
| for use with an embosser. |
| |
| @opcode{multind, dots opcode opcode ...} |
| The @code{multind} opcode tells the back-translator that a sequence of |
| braille cells represents more than one braille indicator. For example, |
| in @file{en-us-g2.ctb} we have @code{multind 56-6 letsign capsletter}. |
| The back-translator can generally handle single braille indicators, |
| but it cannot apply them when they immediately follow each other. It |
| recognizes the letter sign if it is followed by a letter and takes |
| appropriate action. It also recognizes the capital sign if it is |
| followed by a letter. But when there is a letter sign followed by a |
| capital sign it fails to recognize the letter sign unless the sequence |
| has been defined with @code{multind}. A @code{multind} entry may not |
| contain a comment because liblouis would attempt to interpret it as an |
| opcode. |
| |
| @end table |
| |
| @node Notes on Back-Translation |
| @chapter Notes on Back-Translation |
| |
| @anchor{General Notes} |
| @section General Notes |
| |
| Back-translation refers to the process of translating backwards, |
| i.e.@: from Braille to text. For many years, Liblouis was mainly |
| concerned with forward translation, and so were most of the authors of |
| the translation tables. Today however, Liblouis is being used |
| extensively in conjunction with screen reading programs like NVDA and |
| JAWS for Windows as well as Braille note-takers like BrailleSense from |
| HIMS and BrailleNote from HumanWare. So when writing a translation |
| table for Liblouis, it is indeed relevant to consider how the table |
| will work when used for back-translation, if anything special must be |
| done, or if you want to write separate tables for forward translation |
| and back-translation. |
| |
| Back-translation is generally harder to do in a computer program than |
| forward translation. Ideally, any text could be translated to Braille |
| and then translated back to text giving exactly the same result as the |
| original. However, many Braille codes omit a lot of information and |
| leaves it to the reader to fill in the missing bits. An example of this |
| is letters with accents. In languages where accents are uncommon, e.g. |
| English, Accented letters are usually just marked with a Braille |
| indicator stating that there is an accent, but not which accent, even |
| though this may be crucial to the meaning of the word or the sentence. |
| Another example of this is when not all capital letters are marked in |
| the Braille code, but only the "important" capital letters. A third |
| example is when a Braille character serves as both a punctuation sign, |
| a math sign, and perhaps even as a contraction, and the Braille code |
| then leaves it up to the reader to use his/her knowledge of the context |
| to decide the meaning of the Braille character. |
| |
| In some cases, you may need to bend the rules of the Braille code if it |
| is important to create Braille that can be properly back-translated. |
| This may include marking all capital letters instead of just the |
| "important" ones, or perhaps marking a Braille character with an |
| indicator stating that this character should in fact be interpreted as |
| a math sign and not a punctuation or Braille contraction. In some |
| cases, the best solution may be to create two separate sets of tables |
| for forward translation: One set for Braille that must be |
| back-translatable (for use with screen readers and note-takers), and |
| another for good and nice literary Braille (for embossing). |
| But no matter how you bend the Braille code, the back-translation |
| process may not be perfect. |
| |
| @anchor{Back-translation with Liblouis} |
| @section Back-translation with Liblouis |
| |
| Back-translation is carried out by the function |
| @code{lou_backTranslateString}. Its calling sequence is described in |
| @ref{Programming with liblouis}. @code{lou_backTranslateString} first |
| performs @code{pass4}, if |
| present, then @code{pass3}, then @code{pass2}, then the |
| backtranslation, then corrections. Note that this is exactly the |
| inverse of forward translation. |
| |
| Most opcodes can be preceded by @opcoderef{noback} or @opcoderef{nofor}, |
| and the @code{correct}, @code{context} and multi-pass opcodes must be |
| preceded with either @code{noback} or @code{nofor}. So in most cases, |
| it will be perfectly possible to make one table for translation in both |
| directions, although a separate table for forward and backward |
| translation might be more readable in some cases. |
| |
| Most of the opcodes associated with pass 1 have two operands, a |
| character operand to the left and a dots operand to the right. During |
| forward translation, these operands are used to replace the characters |
| with the dot pattern according to the conditions of the opcode. The |
| opcode works from left to right. When back-translating, these opcodes |
| work the opposite way. The dot patterns are replaced by the text. The |
| opcodes work from right to left. |
| |
| On the other hand, the @code{correct}, @code{context} and multi-pass |
| opcodes have a test part to the left and an action part to the right. |
| These opcodes work from left to right in both translation directions. |
| The test is performed, and if true, the action is executed, i.e.@: |
| replacing, inserting or deleting characters or dots. This is why a |
| translation direction always has to be specified with these opcodes |
| using @code{noback} or @code{nofor}. |
| |
| @node Table Metadata |
| @chapter Table Metadata |
| |
| Translation tables may contain metadata. This makes them |
| discoverable. Programs may for example use the Liblouis function |
| @ref{lou_findTable,@code{lou_findTable}} to find a table based on a |
| special query of which the @ref{Query Syntax,syntax} is described |
| below. |
| |
| @section Syntax |
| |
| Metadata must be defined in special comments within the table |
| header. The table header is the area at the top of the file, before |
| the first translation rule, consisting of only comments or empty |
| lines. Any metadata within included tables is ignored. |
| |
| A metadata field must be defined on its own line, starting with |
| @code{#+}. It has the following syntax: |
| |
| @example |
| #+<key>: <value> |
| @end example |
| |
| where @samp{<key>} and @samp{<value>} are sequences of one or more |
| characters @code{a} to @code{z}, @code{A} to @code{Z}, @code{0} to |
| @code{9}, @code{.}, @code{-}, and @code{_}. The colon that separates |
| the key and value may have zero or more spaces or tabs on either side. |
| |
| A value is optional. In case of no value the colon must be omitted as |
| well: |
| |
| @example |
| #+<key> |
| @end example |
| |
| There is no restriction on which keys and values are allowed, as long |
| as the syntax is correct. However in order to be really useful there |
| must be some standard keys and values. A possible grammar is proposed |
| on the wiki page |
| @url{https://github.com/liblouis/liblouis/wiki/Table-discovery-based-on-table-metadata#standard-metadata-tags, Standard metadata tags}. |
| |
| @anchor{Query Syntax} |
| @section Query Syntax |
| |
| A query that is passed to the @ref{lou_findTable,@code{lou_findTable}} |
| function must have the following syntax: |
| |
| @example |
| <feature1> <feature2> <feature3> ... |
| @end example |
| |
| where @samp{<feature>} is either: |
| |
| @example |
| <key>: <value> |
| @end example |
| |
| or: |
| |
| @example |
| <key> |
| @end example |
| |
| Features are separated by one or more spaces or tabs. No spaces are |
| allowed around colons. |
| |
| @node Testing Translation Tables interactively |
| @chapter Testing Translation Tables interactively |
| |
| A number of test programs are provided as part of the liblouis |
| package. They are intended for testing liblouis and for debugging |
| tables. None of them is suitable for braille transcription. An |
| application that can be used for transcription is @command{file2brl}, |
| which is part of the liblouisutdml package (@pxref{Top, , Introduction, |
| liblouisutdml, Liblouisutdml User's and Programmer's Manual}). The source |
| code of the test programs can be studied to learn how to use the |
| liblouis library and they can be used to perform the following |
| functions. |
| |
| @anchor{common options} |
| All of these programs recognize the @option{--help} and |
| @option{--version} options. |
| |
| @table @option |
| |
| @item --help |
| @itemx -h |
| Print a usage message listing all available options, then exit |
| successfully. |
| |
| @item --version |
| @itemx -v |
| Print the version number, then exit successfully. |
| |
| @end table |
| |
| Most test programs let you specify one or multiple tables to use. |
| These tables are usually found in standard locations in the file |
| system or local to where the command is executed. @xref{How tables are |
| found}, for a description on how the tables are located. |
| |
| @menu |
| * lou_debug:: |
| * lou_trace:: |
| * lou_checktable:: |
| * lou_allround:: |
| * lou_translate (program):: |
| * lou_checkhyphens:: |
| * lou_checkyaml:: |
| @end menu |
| |
| @node lou_debug |
| @section lou_debug |
| @pindex lou_debug |
| |
| The @command{lou_debug} tool is intended for debugging liblouis |
| translation tables. The command line for @command{lou_debug} is: |
| |
| @example |
| lou_debug [OPTIONS] TABLE[,TABLE,...] |
| @end example |
| |
| The command line options that are accepted by @command{lou_debug} are |
| described in @ref{common options}. |
| |
| The table (or comma-separated list of tables) is compiled. If no |
| errors are found a brief command summary is printed, then the prompt |
| @samp{Command:}. You can then input one of the command letters and get |
| output, as described below. |
| |
| Most of the commands print information in the various arrays of |
| @code{TranslationTableHeader}. Since these arrays are pointers to |
| chains of hashed items, the commands first print the hash number, then |
| the first item, then the next item chained to it, and so on. After |
| each item there is a prompt indicated by @samp{=>}. You can then press |
| enter (@kbd{@key{RET}}) to see the next item in the chain or the first |
| item in the next chain. Or you can press @kbd{h} (for next-(h)ash) to |
| skip to the next hash chain. You can also press @kbd{e} to exit the |
| command and go back to the @samp{command:} prompt. |
| |
| @table @kbd |
| @item h |
| Brings up a screen of somewhat more extensive help. |
| |
| @item f |
| Display the first forward-translation rule in the first non-empty hash |
| bucket. The number of the bucket is displayed at the beginning of the |
| chain. Each rule is identified by the word @samp{Rule:}. The fields |
| are displayed by phrases consisting of the name of the field, an equal |
| sign, and its value. The before and after fields are displayed only if |
| they are nonzero. Special opcodes such as the @opcoderef{correct} and |
| the multipass opcodes are shown with the code that instructs the |
| virtual machine that interprets them. If you want to see only the |
| rules for a particular character string you can type @kbd{p} at the |
| @samp{command:} prompt. This will take you to the @samp{particular:} |
| prompt, where you can press @kbd{f} and then type in the string. The |
| whole hash chain containing the string will be displayed. |
| |
| @item b |
| Display back-translation rules. This display is very similar to that |
| of forward translation rules except that the dot pattern is displayed |
| before the character string. |
| |
| @item c |
| Display character definitions, again within their hash chains. |
| |
| @item d |
| Displays single-cell dot definitions. If a character-definition opcode |
| gives a multi-cell dot pattern, it is displayed among the |
| back-translation rules. |
| |
| @item C |
| Display the character-to-dots map. This is set up by the |
| character-definition opcodes and can also be influenced by the |
| @opcoderef{display}. |
| |
| @item D |
| Display the dot to character map, which shows which single-cell dot |
| patterns map to which characters. |
| |
| @item z |
| Show the multi-cell dot patterns which have been assigned to the |
| characters from 0 to 255 to comply with computer braille codes such as |
| a 6-dot code. Note that the character-definition opcodes should use |
| 8-dot computer braille. |
| |
| @item p |
| Bring up a secondary (@samp{particular:}) prompt from which you can |
| examine particular character strings, dot patterns, etc. The commands |
| (given in its own command summary) are very similar to those of the |
| main @samp{command:} prompt, but you can type a character string or |
| dot pattern. They include @kbd{h}, @kbd{f}, @kbd{b}, @kbd{c}, @kbd{d}, |
| @kbd{C}, @kbd{D}, @kbd{z} and @kbd{x} (to exit this prompt), but not |
| @kbd{p}, @kbd{i} and @kbd{m}. |
| |
| @item i |
| Show braille indicators. This shows the dot patterns for various |
| opcodes such as the @opcoderef{capsletter} and the @opcoderef{numsign}. |
| It also shows emphasis dot patterns, such as those for the |
| @opcoderef{begemphword}, the @opcoderef{begemphphrase}, etc. If a |
| given opcode has not been used nothing is printed for it. |
| |
| @item m |
| Display various miscellaneous information about the table, such as the |
| number of passes, whether certain opcodes have been used, and whether |
| there is a hyphenation table. |
| |
| @item q |
| Exit the program. |
| @end table |
| |
| @node lou_trace |
| @section lou_trace |
| @pindex lou_trace |
| |
| When working on translation tables it is sometimes useful to determine |
| what rules were applied when translating a string. @command{lou_trace} |
| helps with exactly that. It list all the the applied rules for a given |
| translation table and an input string. |
| |
| @example |
| lou_trace [OPTIONS] TABLE[,TABLE,...] |
| @end example |
| |
| Aside from the standard options (@pxref{common options}) |
| @command{lou_trace} also accepts the following options: |
| |
| @table @option |
| |
| @item --forward |
| @itemx -f |
| Trace a forward translation. |
| |
| @item --backward |
| @itemx -b |
| Trace a backward translation. |
| |
| @end table |
| |
| If no options are given forward translation is assumed. |
| |
| Once started you can type an input string followed by @kbd{@key{RET}}. |
| @command{lou_trace} will print the braille translation followed by |
| list of rules that were applied to produce the translation. A possible |
| invocation is listed in the following example: |
| |
| @example |
| $ lou_trace tables/en-us-g2.ctb |
| the u.s. postal service |
| ! u4s4 po/al s@}vice |
| 1. largesign the 2346 |
| 2. repeated 0 |
| 3. lowercase u 136 |
| 4. punctuation . 46 |
| 5. context _$l["."]$l @@256 |
| 6. lowercase s 234 |
| 7. postpunc . 256 |
| 8. repeated 0 |
| 9. begword post 1234-135-34 |
| 10. largesign a 1 |
| 11. lowercase l 123 |
| 12. repeated 0 |
| 13. lowercase s 234 |
| 14. always er 12456 |
| 15. lowercase v 1236 |
| 16. lowercase i 24 |
| 17. lowercase c 14 |
| 18. lowercase e 15 |
| 19. pass2 $s1-10 @@0 |
| 20. pass2 $s1-10 @@0 |
| 21. pass2 $s1-10 @@0 |
| @end example |
| |
| @node lou_checktable |
| @section lou_checktable |
| @pindex lou_checktable |
| |
| To use this program type the following: |
| |
| @example |
| lou_checktable [OPTIONS] TABLE |
| @end example |
| |
| Aside from the standard options (@pxref{common options}) |
| @command{lou_checktable} also accepts the following options: |
| |
| @table @option |
| |
| @item --quiet |
| @itemx -q |
| Do not write to standard error if there are no errors. |
| |
| @end table |
| |
| If the table contains errors, appropriate messages will be displayed. |
| If there are no errors the message @samp{no errors found.} will be |
| shown. |
| |
| @node lou_allround |
| @section lou_allround |
| @pindex lou_allround |
| |
| This program tests every capability of the liblouis library. It is |
| completely interactive. Invoke it as follows: |
| |
| @example |
| lou_allround [OPTIONS] |
| @end example |
| |
| The command line options that are accepted by @command{lou_allround} |
| are described in @ref{common options}. |
| |
| You will see a few lines telling you how to use the program. Pressing |
| one of the letters in parentheses and then enter will take you to a |
| message asking for more information or for the answer to a yes/no |
| question. Typing the letter @samp{r} and then @key{RET} will take you |
| to a screen where you can enter a line to be processed by the library |
| and then view the results. |
| |
| @node lou_translate (program) |
| @section lou_translate |
| @pindex lou_translate |
| |
| This program translates whatever is on the standard input unit and |
| prints it on the standard output unit. It is intended for large-scale |
| testing of the accuracy of translation and back-translation. The |
| command line for @command{lou_translate} is: |
| |
| @example |
| lou_translate [OPTION] TABLE[,TABLE,...] |
| @end example |
| |
| Aside from the standard options (@pxref{common options}) this program |
| also accepts the following options: |
| |
| @table @option |
| |
| @item --forward |
| @itemx -f |
| Do a forward translation. |
| |
| @item --backward |
| @itemx -b |
| Do a backward translation. |
| |
| @end table |
| |
| If no options are given forward translation is assumed. |
| |
| Use the following command to do a forward translation with translation |
| table @file{en-us-g2.ctb}. The resulting braille is ASCII encoded (as |
| defined in @file{en-us-g2.ctb}). |
| |
| @example |
| lou_translate --forward en-us-g2.ctb < input.txt |
| @end example |
| |
| The next example illustrates a forward translation with translation |
| table @file{en-us-g2.ctb} and display table @file{unicode.dis}. The |
| resulting braille is encoded as Unicode dot patterns (as defined in |
| @file{unicode.dis}). |
| |
| @example |
| lou_translate --forward unicode.dis,en-us-g2.ctb < input.txt |
| @end example |
| |
| Use a pipe if you would rather just pass some given text to the |
| translator. |
| |
| @example |
| echo "The quick brown fox jumps over the lazy dog" | lou_translate -f unicode.dis,en-us-g2.ctb |
| @end example |
| |
| The result will be written to standard output: |
| |
| @example |
| ⠠⠮ ⠟⠅ ⠃⠗⠪⠝ ⠋⠕⠭ ⠚⠥⠍⠏⠎ ⠕⠧⠻ ⠮ ⠇⠁⠵⠽ ⠙⠕⠛ |
| @end example |
| |
| Backward translation can be done as follows: |
| |
| @example |
| echo ",! qk br@{n fox jumps ov@} ! lazy dog" | lou_translate --backward en-us-g2.ctb |
| @end example |
| |
| which results in |
| |
| @example |
| The quick brown fox jumps over the lazy dog |
| @end example |
| |
| You can also do a backward translation using Unicode dot patterns |
| |
| @example |
| echo "⠠⠮ ⠟⠅ ⠃⠗⠪⠝ ⠋⠕⠭" | lou_translate --backward unicode.dis,en-us-g2.ctb |
| @end example |
| |
| resulting in |
| |
| @example |
| The quick brown fox |
| @end example |
| |
| @node lou_checkhyphens |
| @section lou_checkhyphens |
| @pindex lou_checkhyphens |
| |
| This program checks the accuracy of hyphenation in Braille translation |
| for both translated and untranslated words. It is completely |
| interactive. Invoke it as follows: |
| |
| @example |
| lou_checkhyphens [OPTIONS] |
| @end example |
| |
| The command line options that are accepted by |
| @command{lou_checkhyphens} are described in @ref{common options}. |
| |
| You will see a few lines telling you how to use the program. |
| |
| @node lou_checkyaml |
| @section lou_checkyaml |
| @pindex lou_checkyaml |
| |
| This program tests a liblouis table against a corpus of known good |
| Braille translations defined in YAML format. For a description of the |
| format refer to @ref{YAML Tests}. The program returns 0 if all tests |
| pass or 1 if any of the tests fail. If @code{libyaml} is not installed |
| the program will simply skip all tests. Invoke it as follows: |
| |
| @example |
| lou_checkyaml YAML_TEST_FILE |
| @end example |
| |
| The command line options that are accepted by |
| @command{lou_checkyaml} are described in @ref{common options}. |
| |
| @cindex Running YAML tests manually |
| @cindex Running individual YAML tests |
| Due to some technical limitations the YAML tests work best if the |
| @env{LOUIS_TABLEPATH} is set up correctly. By running @command{make} |
| this is all taken care for you. You can also run individual YAML tests |
| as shown in the following example: |
| |
| @example |
| cd tests |
| make check TESTS=yaml/en-ueb-g2_backward.yaml |
| @end example |
| |
| @node Automated Testing of Translation Tables |
| @chapter Automated Testing of Translation Tables |
| |
| There are a number of automated tests for liblouis and they are |
| proving to be of tremendous value. When changing the code the |
| developers can run the tests to see if anything broke. |
| |
| The easiest way to test the translation tables is to write a YAML file |
| where you define the table that is to be tested and any number of |
| words or phrases to translate together with their respective expected |
| translation. |
| |
| The YAML tests are data driven, i.e.@: you give the test data, a string |
| to translate and the expected output. The data is in a standard format |
| namely YAML. If you have @file{libyaml} installed they will |
| automatically be invoked as part of the standard @command{make check} |
| command. |
| |
| @anchor{YAML Tests} |
| @section YAML Tests |
| @url{http://yaml.org/,YAML} is a human readable data serialization |
| format that allows for an easy and compact way to define tests. |
| |
| A YAML file first defines which tables are to be used for the tests. |
| Then it optionally defines flags such as the @samp{testmode}. Finally |
| all the tests are defined. |
| |
| You can repeat the cycle as many times as you like (tables, optional |
| flags, tests). You can also define several rounds of tests for any |
| table, with or without the optional flags. Just remember that the |
| flags are reset to their default values each time you start a new |
| round of tests or load a new set of tables. |
| |
| Let's just look at a simple example how tests could be defined: |
| |
| @iftex |
| @emph{(For technical reasons the Unicode braille in the expected |
| translation in the following YAML examples is not displayed correctly. |
| Please refer to the example YAML file @file{example_test.yaml} in the |
| @file{tests} directory of the source distribution or read these |
| examples in another version of the documentation such as HTML)} |
| @end iftex |
| |
| @example |
| # comments start with '#' anywhere on a line |
| # first define which tables will be used for your tests |
| table: [unicode.dis, en-ueb-g1.ctb] |
| |
| # then optionally define flags such as testmode. If no flags are |
| # defined forward translation is assumed |
| |
| # now define the tests |
| tests: |
| - # each test is a list. |
| # The first item is the string to translate. Quoting of strings is |
| # optional |
| - hello |
| # The second item is the expected translation |
| - ⠓⠑⠇⠇⠕ |
| - # optionally you can define additional parameters in a third |
| # item such as typeform or expected failure, etc |
| - Hello |
| - ⠨⠶⠠⠓⠑⠇⠇⠕⠨⠄ |
| - @{typeform: @{italic: '++++ '@}, xfail: true@} |
| - # a simple, no-frills test |
| - Good bye |
| - ⠠⠛⠕⠕⠙ ⠃⠽⠑ |
| # same as above using "flow style" notation |
| - [Good bye, ⠠⠛⠕⠕⠙ ⠃⠽⠑] |
| @end example |
| |
| The four basic components of a test file are as follows: |
| |
| @table @samp |
| @item table |
| A list containing table files, which the tests should be run against. |
| This is usually just one file, but for some situations more than one |
| file can be required. For example: |
| |
| @example |
| table: [hu-hu-g1.ctb, hyph_hu_HU.dic] |
| @end example |
| |
| It is also possible to specify a table inline. @ref{Inline definition |
| of tables} below explains how to do this. |
| |
| A third way to specify a table is by its metadata. A table query, |
| which is essentially as list of ``features'', is matched against the |
| @ref{Table Metadata,table metadata} defined inside the tables |
| contained in @env{LOUIS_TABLEPATH}. Only the best match is used for |
| the test. |
| |
| The syntax of the query is a variation of the @ref{Query |
| Syntax,syntax} used for the @ref{lou_findTable,@code{lou_findTable}} |
| function: |
| |
| @example |
| table: |
| locale: fr |
| grade: 1 |
| @end example |
| |
| @item display |
| A display table, which should be used to encode braille in the |
| test. This item is optional. If it is present it should be the first |
| item of the file. If it is not present, the braille encoding of each |
| test is determined by the table that is being tested. |
| |
| The next example shows how to test the @file{en-ueb-g1.ctb} table |
| using ASCII notation (as defined in @file{en-ueb-g1.ctb} itself): |
| |
| @example |
| table: [en-ueb-g1.ctb] |
| @end example |
| |
| If you wanted to test the @file{en-ueb-g1.ctb} table using Unicode dot |
| patterns then you would use the following definition: |
| |
| @example |
| display: unicode.dis |
| table: [en-ueb-g1.ctb] |
| @end example |
| |
| @item flags |
| The flags that apply for all tests in this file. At the moment only |
| the @samp{testmode} flag is supported. It can have four possible |
| values: |
| |
| @table @samp |
| @item forward |
| This indicates that the tests are for forward translation |
| @item backward |
| This indicates that the tests are for backward translation |
| @item bothDirections |
| This indicates that the tests are for both forward and backward translation. |
| @item hyphenate |
| This indicates that the tests are for hyphenation |
| @item hyphenateBraille |
| This indicates that the tests are for hyphenation and the input is braille |
| @end table |
| |
| If no flags are defined forward translation is assumed. |
| |
| @item tests |
| A list of tests. Each test consists of a list of two, three or in some |
| cases even four items. The first item is the unicode text to be |
| tested. The second item is the expected braille output. This can be |
| either unicode braille or an ASCII-braille like encoding. Quoting |
| strings is optional. Comments can be inserted almost anywhere using |
| the @samp{#} sign. A simple test would look at follows: |
| |
| @example |
| - # a simple, no-frills test |
| - Good bye |
| - ⠠⠛⠕⠕⠙ ⠃⠽⠑ |
| @end example |
| |
| Using the more compact ``flow style'' notation it would look like the |
| following: |
| |
| @example |
| - [Good bye, ⠠⠛⠕⠕⠙ ⠃⠽⠑] |
| @end example |
| |
| An optional third item can contain additional options for a test such |
| as the typeform, or whether a test is expected to fail. The following |
| shows a typical example: |
| |
| @example |
| - |
| - Hello |
| - ⠨⠶⠠⠓⠑⠇⠇⠕⠨⠄ |
| - @{typeform: @{italic: '++++ '@}, xfail: true@} |
| # same test more compact |
| - [Hello, ⠨⠶⠠⠓⠑⠇⠇⠕⠨⠄, @{typeform: @{italic: '++++ '@}, xfail: true@}] |
| @end example |
| |
| The valid additional options for a test are as follows: |
| |
| @table @samp |
| @item xfail |
| Whether a test is expected to fail. If you expect a test to fail, set |
| this to @samp{true}. If you prefer you can also specify a reason for |
| the failure: |
| |
| @example |
| - [Hello, ⠨, @{xfail: Test case is not complete@}] |
| @end example |
| |
| If you expect a test case to pass then just don't mark it with |
| @samp{xfail} or if you really have to, set @samp{xfail} to |
| @samp{false} or @samp{off}. |
| |
| @item typeform |
| The typeform used for a translation. It consists of one or more |
| emphasis specifications. For each character in the specifications that |
| is not a space the corresponding emphasis will be set. Valid options |
| for emphasis are @samp{italic}, @samp{underline}, @samp{bold}, |
| @samp{computer_braille}, @samp{passage_break}, @samp{word_reset}, |
| @samp{script}, @samp{trans_note}, @samp{trans_note_1}, |
| @samp{trans_note_2}, @samp{trans_note_3}, @samp{trans_note_4} or |
| @samp{trans_note_5}. The following shows an example where both |
| @samp{italic} and @samp{underline} are specified: |
| |
| @example |
| - |
| - Hello |
| - ⠨⠶⠠⠓⠑⠇⠇⠕⠨⠄ |
| - typeform: |
| italic: '++++ ' |
| underline: ' +' |
| @end example |
| |
| @item inputPos |
| A list of 0-based input positions, one for each output position. |
| Useful when simulating screen reader interaction, to debug contraction |
| and cursor behavior as in the following example. Note that all |
| positions in this and the following examples start at 0. Also note |
| that in these examples the additional options are not passed using the |
| ``flow style'' notation. |
| |
| @example |
| - |
| - went |
| - ⠺⠢⠞ |
| - inputPos: [0,1,3] |
| @end example |
| |
| @item outputPos |
| A list of 0-based output positions, one for each input position. Useful when |
| simulating screen reader interaction, to debug contraction and cursor |
| behavior as in the following example. |
| |
| @example |
| - |
| - went |
| - ⠺⠢⠞ |
| - outputPos: [0,1,1,2] |
| @end example |
| |
| @item cursorPos |
| The cursor position for the given translation and optionally an |
| expected cursor position where the cursor is supposed to be after the |
| translation. Useful when simulating screen reader interaction, to |
| debug contraction and cursor behavior: |
| |
| The cursor position can take two forms: You can either specify a |
| single number or alternatively you can give a tuple of two numbers. |
| |
| @table @asis |
| |
| @item single number (e.g. @samp{4}) |
| When you simply want to specify the cursor position for the given |
| translation you pass a number as in the following example: |
| |
| @example |
| - |
| - you went to |
| - ⠽ ⠺⠑⠝⠞ ⠞⠕ |
| - mode: [compbrlAtCursor] |
| cursorPos: 4 |
| @end example |
| |
| @item a tuple (e.g. @samp{[4,2]}) |
| When you expect the cursor to be in a particular position after the |
| translation and you want to check this then pass a tuple of cursor |
| positions as in the following example: |
| |
| @example |
| - |
| - you went to |
| - ⠽ ⠺⠑⠝⠞ ⠞⠕ |
| - mode: [compbrlAtCursor] |
| cursorPos: [4,2] |
| @end example |
| @end table |
| |
| @item mode |
| A list of translation modes that should be used for this test. If not |
| defined defaults to 0. Valid mode values are @samp{noContractions}, |
| @samp{compbrlAtCursor}, @samp{dotsIO}, @samp{compbrlLeftCursor}, |
| @samp{ucBrl}, @samp{noUndefined} or @samp{partialTrans}. |
| |
| For a description of the various translation mode flags, please see |
| the function @ref{lou_translateString}. |
| |
| @item maxOutputLength |
| Define a maximum length of the output. This can be used to test the |
| behavior of liblouis in the face of a limited output buffer, for |
| example the length of the refreshable braille display. |
| |
| @end table |
| |
| @end table |
| |
| @subsection Optional test description |
| When a test contains three or four items the first item is assumed to |
| be a test description, the second item is the unicode text to be |
| tested and the third item is the expected braille output. Again an |
| optional fourth item can contain additional options for the test. The |
| following shows an example: |
| |
| @example |
| - |
| - Number-text-transitions with italic |
| - 123abc |
| - ⠼⠁⠃⠉⠨⠶⠰⠁⠃⠉⠨⠄ |
| - @{typeform: '000111'@} |
| @end example |
| |
| In case the test fails the description will be printed together with |
| the expected and the actual braille output. |
| |
| For more examples and inspiration please see the YAML tests |
| (@file{*.yaml}) in the @file{tests} directory of the source |
| distribution. |
| |
| @subsection Testing multiple tables within the same YAML test file |
| Sometimes you are more focused on testing a particular feature across |
| several tables rather than just testing one table. For that reason the |
| following is also allowed: |
| |
| @example |
| table: ... |
| tests: |
| - [..., ...] |
| - [..., ...] |
| table: ... |
| tests: |
| - [..., ...] |
| - [..., ...] |
| @end example |
| |
| If you specify flags for the tests, remember that the flags are reset |
| to their default values when you specify a new table. |
| |
| @subsection Multiple test sections for each table |
| You can specify several sections of tests for each table, with or |
| without the optional flags. This is useful e.g. if you want to have |
| various tests for both forward and backward translation for the same |
| set of tables, especially if you are defining the table as part of |
| the yaml file (see next section). This feature is also useful if you |
| simply want to devide your tests into multiple sections for better |
| overview. All flags are reset to their default values when you start |
| a new test section. |
| |
| Thus, a yaml file might look as follows: |
| |
| @example |
| table: ... |
| tests: |
| - [..., ...] |
| - [..., ...] |
| |
| # Some more tests |
| tests: |
| - [..., ...] |
| - [..., ...] |
| |
| # Some tests for back-translation - same table |
| flags: @{testmode: backward@} |
| - [..., ...] |
| - [..., ...] |
| @end example |
| |
| @anchor{Inline definition of tables} |
| @subsection Inline definition of tables |
| When testing very specific opcode combinations it is sometimes tedious |
| to create specific test tables just for that. Hence the YAML tests |
| allow for specification of table definitions inline. Instead of |
| referring to a table by name you just define the table inline by using |
| what the YAML spec calls a |
| @url{http://www.yaml.org/spec/1.2/spec.html#id2795688,Literal Style |
| Block}. Start the definition with a @samp{|}, then list the opcodes |
| with an indentation. The inline table ends when the indentation ends. |
| |
| @example |
| table: | |
| sign a 1 |
| ... |
| tests: |
| - ... |
| - ... |
| @end example |
| |
| @subsection Running the same test data on multiple tables |
| Sometimes you maintain multiple tables which are very similar and |
| basically contain the same test data. Instead of copying the YAML test |
| and changing the table name you can also define multiple tables. This |
| will cause the YAML tests to be checked against both tables. |
| |
| @example |
| table: nl-NL |
| table: nl-BE |
| tests: |
| - [..., ...] |
| - [..., ...] |
| @end example |
| |
| @node Programming with liblouis |
| @chapter Programming with liblouis |
| |
| @menu |
| * Overview (library):: |
| * Data structure of liblouis tables:: |
| * How tables are found:: |
| * Deprecation of the logging system:: |
| * lou_version:: |
| * lou_translateString:: |
| * lou_translate:: |
| * lou_backTranslateString:: |
| * lou_backTranslate:: |
| * lou_hyphenate:: |
| * lou_compileString:: |
| * lou_getTypeformForEmphClass:: |
| * lou_dotsToChar:: |
| * lou_charToDots:: |
| * lou_registerLogCallback:: |
| * lou_setLogLevel:: |
| * lou_logFile:: |
| * lou_logPrint:: |
| * lou_logEnd:: |
| * lou_setDataPath:: |
| * lou_getDataPath:: |
| * lou_getTable:: |
| * lou_findTable:: |
| * lou_indexTables:: |
| * lou_checkTable:: |
| * lou_readCharFromFile:: |
| * lou_free:: |
| * lou_charSize:: |
| * Python bindings:: |
| @end menu |
| |
| @node Overview (library) |
| @section Overview |
| |
| You use the liblouis library by calling the following functions, |
| @code{lou_translateString}, @code{lou_backTranslateString}, |
| @code{lou_translate}, @code{lou_backTranslate}, |
| @code{lou_registerLogCallback}, @code{lou_setLogLevel}, |
| @code{lou_logFile}, @code{lou_logPrint}, @code{lou_logEnd}, |
| @code{lou_getTable}, @code{lou_findTable}, @code{lou_indexTables}, |
| @code{lou_checkTable}, @code{lou_hyphenate}, @code{lou_charToDots}, |
| @code{lou_dotsToChar}, @code{lou_compileString}, |
| @code{lou_getTypeformForEmphClass}, @code{lou_readCharFromFile}, |
| @code{lou_version}, @code{lou_free} and @code{lou_charSize}. These are |
| described below. The header file, @file{liblouis.h}, also contains |
| brief descriptions. Liblouis is written in straight C. It has four |
| code modules, @file{compileTranslationTable.c}, @file{logging.c}, |
| @file{lou_translateString.c} and @file{lou_backTranslateString.c}. In |
| addition, there are two header files, @file{liblouis.h}, which defines |
| the API, and @file{louis.h}, used only internally and by |
| liblouisutdml. The latter includes @file{liblouis.h}. |
| |
| Persons who wish to use liblouis from Python may want to skip ahead to |
| @ref{Python bindings}. |
| |
| @file{compileTranslationTable.c} keeps track of all translation tables |
| which an application has used. It is called by the translation, |
| hyphenation and checking functions when they start. If a table has not |
| yet been compiled @file{compileTranslationTable.c} checks it for |
| correctness and compiles it into an efficient internal representation. |
| The main entry point is @code{lou_getTable}. Since it is the module |
| that keeps track of memory usage, it also contains the @code{lou_free} |
| function. In addition, it contains the @code{lou_checkTable} function, |
| plus some utility functions which are used by the other modules. |
| |
| By default, liblouis handles all characters internally as 16-bit |
| unsigned integers. It can be compiled for 32-bit characters as |
| explained below. The meanings of these integers are not hard-coded. |
| Rather they are defined by the character-definition opcodes. However, |
| the standard printable characters, from decimal 32 to 126 are |
| recognized for the purpose of processing the opcodes. Hence, the |
| following definition is included in @file{liblouis.h}. It is correct |
| for computers with at least 32-bit processors. |
| |
| @example |
| typedef unsigned short int widechar |
| @end example |
| |
| To make liblouis handle 32-bit Unicode simply remove the word |
| @code{short} in the above @code{typedef}. This will cause the translate and |
| back-translate functions to expect input in 32-bit form and to deliver |
| their output in this form. The input to the compiler (tables) is |
| unaffected except that two new escape sequences for 20-bit and 32-bit |
| characters are recognized. |
| |
| At runtime, the width of a character specified during compilation may |
| be obtained using @code{lou_charSize}. |
| |
| Here are the definitions of the eleven liblouis functions and their |
| parameters. They are given in terms of 16-bit Unicode. If liblouis has |
| been compiled for 32-bit Unicode simply read 32 instead of 16. |
| |
| @node Data structure of liblouis tables |
| @section Data structure of liblouis tables |
| |
| The data structure @code{TranslationTableHeader} is defined by a |
| @code{typedef} statement in @file{louis.h}. To find the beginning, |
| search for the word @samp{header}. As its name implies, this is |
| actually the table header. Data are placed in the @code{ruleArea} |
| array, which is the last item defined in this structure. This array is |
| declared with a length of 1 and is expanded as needed. The table |
| header consists mostly of arrays of pointers of size @code{HASHNUM}. |
| These pointers are actually offsets into @code{ruleArea} and point to |
| chains of items which have been placed in the same hash bucket by a |
| simple hashing algorithm. @code{HASHNUM} should be a prime and is |
| currently 1123. The structure of the table was chosen to optimize |
| speed rather than memory usage. |
| |
| The first part of the table contains miscellaneous information, such |
| as the number of passes and whether various opcodes have been used. It |
| also contains the amount of memory allocated to the table and the |
| amount actually used. |
| |
| The next section contains pointers to various braille indicators and |
| begins with @code{capitalSign}. The rules pointed to contain the |
| dot pattern for the indicator and an opcode which is used by the |
| back-translator but does not appear in the list of opcodes. The |
| braille indicators also include various kinds of emphasis, such as |
| italic and bold and information about the length of emphasized |
| phrases. The latter is contained directly in the table item instead of |
| in a rule. |
| |
| After the braille indicators comes information about when a letter |
| sign should be used. |
| |
| Next is an array of size @code{HASHNUM} which points to character |
| definitions. These are created by the character-definition opcodes. |
| |
| Following this is a similar array pointing to definitions of |
| single-cell dot patterns. This is also created from the |
| character-definition opcodes. If a character definition contains a |
| multi-cell dot pattern this is compiled into ordinary forward and |
| backward rules. If such a multi-cell dot pattern contains a single |
| cell which has not previously been defined that cell is placed in this |
| array, but is given the attribute @code{space}. |
| |
| Next come arrays that map characters to single-cell dot patterns and |
| dots to characters. These are created from both character-definition |
| opcodes and display opcodes. |
| |
| Next is an array of size 256 which maps characters in this range to |
| dot patterns which may consist of multiple cells. It is used, for |
| example, to map @samp{@{} to dots 456-246. These mappings are created |
| @c FIXME: the compdots opcode should be documented |
| @c by the @opcoderef{compdots} |
| by the @code{compdots} |
| or the @opcoderef{comp6}. |
| |
| Next are two small arrays that held pointers to chains of rules |
| produced by the @opcoderef{swapcd} and the @opcoderef{swapdd} and by |
| some multipass, @code{context} and @code{correct} opcodes. |
| |
| Now we get to an array of size @code{HASHNUM} which points to chains |
| of rules for forward translation. |
| |
| Following this is a similar array for back-translation. |
| |
| Finally is the @code{ruleArea}, an array of variable size to which |
| various structures are mapped and to which almost everything else |
| points. |
| |
| @node How tables are found |
| @section How tables are found |
| @cindex Table search path |
| @cindex LOUIS_TABLEPATH |
| liblouis knows where to find all the tables that have been distributed |
| with it. So you can just give a table name such as @code{en-us-g2.ctb} |
| and liblouis will load it. You can also give a table name which |
| includes a path. If this is the first table in a list, all the tables |
| in the list must be on the same path. You can specify a path on which |
| liblouis will look for table names by setting the environment variable |
| @env{LOUIS_TABLEPATH}. This environment variable can contain one or |
| more paths separated by commas. On receiving a table name liblouis |
| first checks to see if it can be found on any of these paths. If not, |
| it then checks to see if it can be found in the current directory, or, |
| if the first (or only) name in a table list, if it contains a |
| path name, can be found on that path. If not, it checks to see if it |
| can be found on the path where the distributed tables have been |
| installed. If a table has already been loaded and compiled this |
| path-checking is skipped. |
| |
| @node Deprecation of the logging system |
| @section Deprecation of the logging system |
| |
| As of version 2.6.0 @code{lou_logFile}, @code{lou_logPrint} and |
| @code{lou_logEnd} are deprecated. They are replaced by a more powerful, |
| abstract API consisting of @code{lou_registerLogCallback} and |
| @code{lou_setLogLevel}. |
| |
| Usage of @code{lou_logFile}, @code{lou_logPrint} and @code{lou_logEnd} is |
| discouraged as they may not be part of future releases. Applications using |
| Liblouis should implement their own logging system. |
| |
| During the transitional phase, @code{lou_logPrint} is registered as default |
| callback in @code{lou_registerLogCallback}. @code{lou_logPrint} is overwritten |
| by the first call to @code{lou_registerLogCallback} and reattached when |
| @code{NULL} is set as callback. Note that calling @code{lou_logPrint} directly |
| will not cause an invocation of the registered callback. |
| |
| @node lou_version |
| @section lou_version |
| @findex lou_version |
| |
| @example |
| char *lou_version () |
| @end example |
| |
| This function returns a pointer to a character string containing the |
| version of liblouis, plus other information, such as the release date |
| and perhaps notable changes. |
| |
| @node lou_translateString |
| @section lou_translateString |
| @findex lou_translateString |
| |
| @example |
| int lou_translateString( |
| const char *tableList, |
| const widechar *inbuf, |
| int *inlen, |
| widechar *outbuf, |
| int *outlen, |
| formtype *typeform, |
| char *spacing, |
| int mode); |
| @end example |
| |
| This function takes a string of 16-bit Unicode characters in |
| @code{inbuf} and translates it into a string of 16-bit characters in |
| @code{outbuf}. Each 16-bit character produces a particular dot pattern |
| in one braille cell when sent to an embosser or braille display or to |
| a screen type font. Which 16-bit character represents which dot pattern |
| is indicated by the character-definition and display opcodes in the |
| translation table. |
| |
| @anchor{translation-tables} |
| The @code{tableList} parameter points to a list of translation tables |
| separated by commas. @xref{How tables are found}, for a description on |
| how the tables are located in the file system. If only one table is |
| given, no comma should be used after it. It is these tables which |
| control just how the translation is made, whether in Grade 2, Grade 1, |
| or something else. |
| |
| The tables in a list are all compiled into the same internal table. |
| The list is then regarded as the name of this table. As explained in |
| @ref{How to Write Translation Tables}, each table is a file which may |
| be plain text, big-endian Unicode or little-endian Unicode. A table |
| (or list of tables) is compiled into an internal representation the |
| first time it is used. Liblouis keeps track of which tables have been |
| compiled. For this reason, it is essential to call the @code{lou_free} |
| function at the end of your application to avoid memory leaks. Do |
| @emph{NOT} call @code{lou_free} after each translation. This will |
| force liblouis to compile the translation tables each time they are |
| used, leading to great inefficiency. |
| |
| Note that both the @code{*inlen} and @code{*outlen} parameters are |
| pointers to integers. When the function is called, these integers |
| contain the maximum input and output lengths, respectively. When it |
| returns, they are set to the actual lengths used. |
| |
| The @code{typeform} parameter is used to indicate italic type, |
| boldface type, computer braille, etc. It is an array of @code{formtype} |
| with the same length as the input buffer pointed to by @code{*inbuf}. |
| However, it is used to pass back character-by-character results, so |
| enough space must be provided to match the @code{*outlen} parameter. |
| Each element indicates the typeform of the corresponding character |
| in the input buffer. The values and their meaning can be consulted in the |
| @code{typeforms} enum in @file{liblouis.h}. These values can be |
| added for multiple emphasis. If this parameter is @code{NULL}, no |
| checking for type forms is done. In addition, if this parameter is not |
| @code{NULL}, it is set on return to have an 8 at every position |
| corresponding to a character in @code{outbuf} which was defined to |
| have a dot representation containing dot 7, dot 8 or both, and to 0 |
| otherwise. |
| |
| The @code{spacing} parameter is used to indicate differences in |
| spacing between the input string and the translated output string. It |
| is also of the same length as the string pointed to by @code{*inbuf}. |
| If this parameter is @code{NULL}, no spacing information is computed. |
| |
| The @code{mode} parameter specifies how the translation should be |
| done. The valid values of mode are defined in @file{liblouis.h}. They |
| are all powers of 2, so that a combined mode can be specified by |
| adding up different values. |
| |
| Note that the @code{mode} parameter is an integer, not a pointer to |
| an integer. |
| |
| A combination of the following mode flags can be used with the |
| @code{lou_translateString} function: |
| |
| @table @code |
| @item compbrlAtCursor |
| If this bit is set in the @code{mode} parameter the space-bounded |
| characters containing the cursor will be translated in computer |
| braille. |
| |
| @item compbrlLeftCursor |
| If this bit is set, only the characters to the left of the cursor will |
| be in computer braille. This bit overrides @code{compbrlAtCursor}. |
| |
| @item dotsIO |
| When this bit is set, during forward translation, Liblouis will produce |
| output as dot patterns. During back-translation Liblouis accepts input |
| as dot patterns. Note that the produced dot patterns are affected if |
| you have any @opcoderef{display} defined in any of your tables. |
| |
| @item ucBrl |
| The @code{ucBrl} (Unicode Braille) bit is used by the functions |
| @code{lou_charToDots} and @code{lou_translate}. It causes the dot |
| patterns to be Unicode Braille rather than the liblouis representation. |
| Note that you will not notice any change when setting @code{ucBrl} |
| unless @code{dotsIO} is also set. @code{lou_dotsToChar} and |
| @code{lou_backTranslate} recognize Unicode braille automatically. |
| |
| @item partialTrans |
| This flag specifies that back-translation input should be treated as an |
| incomplete word. Rules that apply only for complete words or at the end |
| of a word will not take effect. This is intended to be used when |
| translating input typed on a braille keyboard to provide a rough idea |
| to the user of the characters they are typing before the word is |
| complete. |
| |
| @item noUndefined |
| Setting this bit disables the output of hexadecimal values when |
| forward-translating undefined characters (characters that are not |
| matched by any rule), and dot numbers when back-translating undefined |
| braille patterns (braille patterns that are not matched by any |
| rule). The default is for liblouis to output the hexadecimal value (as |
| '\xhhhh') of an undefined character when forward-translating and the |
| dot numbers (as \ddd/) of an undefined braille pattern when |
| back-translating. |
| |
| When back translating input from a braille keyboard cell by cell, it |
| is desirable to output characters as soon as they are |
| produced. Similarly, when back translating contracted braille, it is |
| desirable to provide a "guess" to the user of the characters they |
| typed. To achieve this, liblouis needs to have the ability to produce |
| no text when indicators (which don't produce a character by |
| themselves) are not followed by another cell. This works automatically |
| for indicators liblouis knows about such as capital sign, number sign, |
| etc., but it does not work for indicators which are not (and cannot |
| be) specifically defined as indicators. For example, in UEB, dots 4 5 |
| 6 alone produces the text "\456/". Setting the noUndefined mode |
| suppresses this dot number output. |
| |
| @end table |
| |
| The function returns 1 if no errors were encountered@footnote{When the |
| output buffer is not big enough, @code{lou_translateString} returns a |
| partial translation that is more or less accurate up until the |
| returned @code{inlen}/@code{outlen}, and treats it as a successful |
| translation, i.e.@: also returns 1.} and 0 otherwise. |
| |
| @node lou_translate |
| @section lou_translate |
| @findex lou_translate |
| |
| @example |
| int lou_translate( |
| const char *tableList, |
| const widechar *inbuf, |
| int *inlen, |
| widechar *outbuf, |
| int *outlen, |
| formtype *typeform, |
| char *spacing, |
| int *outputPos, |
| int *inputPos, |
| int *cursorPos, |
| int mode); |
| @end example |
| |
| This function adds the parameters @code{outputPos}, @code{inputPos} |
| and @code{cursorPos}, to facilitate use in screen reader programs. The |
| @code{outputPos} parameter must point to an array of integers with at |
| least @code{inlen} elements. On return, this array will contain the |
| position in @code{outbuf} corresponding to each input position. |
| Similarly, @code{inputPos} must point to an array of integers of at |
| least @code{outlen} elements. On return, this array will contain the |
| position in @code{inbuf} corresponding to each position in |
| @code{outbuf}. @code{cursorPos} must point to an integer containing |
| the position of the cursor in the input. On return, it will contain |
| the cursor position in the output. Any parameter after @code{outlen} |
| may be @code{NULL}. In this case, the actions corresponding to it will |
| not be carried out. |
| |
| For a description of all other parameters, please see |
| @ref{lou_translateString}. |
| |
| @node lou_backTranslateString |
| @section lou_backTranslateString |
| @findex lou_backTranslateString |
| |
| @example |
| int lou_backTranslateString( |
| const char *tableList, |
| const widechar *inbuf, |
| int *inlen, |
| widechar *outbuf, |
| int *outlen, |
| formtype *typeform, |
| char *spacing, |
| int mode); |
| @end example |
| |
| This is exactly the opposite of @code{lou_translateString}. |
| @code{inbuf} is a string of 16-bit Unicode characters representing |
| braille. @code{outbuf} will contain a string of 16--bit Unicode |
| characters. @code{typeform} will indicate any emphasis found in the |
| input string, while @code{spacing} will indicate any differences in |
| spacing between the input and output strings. The @code{typeform} and |
| @code{spacing} parameters may be @code{NULL} if this information is |
| not needed. @code{mode} specifies how the back-translation |
| should be done. |
| |
| By default, if a dot pattern in the input is undefined |
| then its dot numbers will be included in the output (as \ddd/). |
| This does not occur if the @code{noUndefined} mode is set; |
| an undefined dot pattern simply produces no output. |
| |
| The @code{partialTrans} mode specifies that the input should be |
| treated as an incomplete word. That is, rules that apply only for |
| complete words or at the end of a word will not take effect. This is |
| intended to be used when translating input typed on a braille keyboard |
| to provide a rough idea to the user of the characters they are typing |
| before the word is complete. |
| |
| @node lou_backTranslate |
| @section lou_backTranslate |
| @findex lou_backTranslate |
| |
| @example |
| int lou_backTranslate( |
| const char *tableList, |
| const widechar *inbuf, |
| int *inlen, |
| widechar *outbuf, |
| int *outlen, |
| formtype *typeform, |
| char *spacing, |
| int *outputPos, |
| int *inputPos, |
| int *cursorPos, |
| int mode); |
| @end example |
| |
| This function is exactly the inverse of @code{lou_translate}. |
| |
| @node lou_hyphenate |
| @section lou_hyphenate |
| @findex lou_hyphenate |
| |
| @example |
| int lou_hyphenate ( |
| const char *tableList, |
| const widechar *inbuf, |
| int inlen, |
| char *hyphens, |
| int mode); |
| @end example |
| |
| This function looks at the characters in @code{inbuf} and if it finds |
| a sequence of letters attempts to hyphenate it as a word. Note that |
| lou_hyphenate operates on single words only, and spaces or punctuation |
| marks between letters are not allowed. Leading and trailing |
| punctuation marks are ignored. The table named by the @code{tableList} |
| parameter must contain a hyphenation table. If it does not, the |
| function does nothing. @code{inlen} is the length of the character |
| string in @code{inbuf}. @code{hyphens} is an array of characters and |
| must be of size @code{inlen} + 1 (to account for the NULL terminator). |
| If hyphenation is successful it will have a 1 at the beginning of each |
| syllable and a 0 elsewhere. If the @code{mode} parameter is 0 |
| @code{inbuf} is assumed to contain untranslated characters. Any |
| nonzero value means that @code{inbuf} contains a translation. In this |
| case, it is back-translated, hyphenation is performed, and it is |
| re-translated so that the hyphens can be placed correctly. The |
| @code{lou_translate} and @code{lou_backTranslate} functions are used |
| in this process. @code{lou_hyphenate} returns 1 if hyphenation was |
| successful and 0 otherwise. In the latter case, the contents of the |
| @code{hyphens} parameter are undefined. This function was provided for |
| use in liblouisutdml. |
| |
| @node lou_compileString |
| @section lou_compileString |
| @findex lou_compileString |
| |
| @example |
| int lou_compileString (const char *tableList, const char *inString) |
| @end example |
| |
| This function enables you to compile a table entry on the fly at |
| run-time. The new entry is added to @code{tableList} and remains in force |
| until @code{lou_free} is called. If @code{tableList} has not previously |
| been loaded it is loaded and compiled. @code{inString} contains the |
| table entry to be added. It may be anything valid. Error messages |
| will be produced if it is invalid. The function returns 1 on success and |
| 0 on failure. |
| |
| @node lou_getTypeformForEmphClass |
| @section lou_getTypeformForEmphClass |
| @findex lou_getTypeformForEmphClass |
| |
| @example |
| int lou_getTypeformForEmphClass (const char *tableList, const char *emphClass); |
| @end example |
| |
| This function returns the typeform bit associated with the given |
| emphasis class. If the emphasis class is undefined this function |
| returns @code{0}. If errors are found error messages are logged to the |
| log callback (see @code{lou_registerLogCallback}) and the return value |
| is @code{0}. @code{tableList} is a list of names of table files |
| separated by commas, as explained previously |
| (@pxref{translation-tables,,@code{tableList} parameter in |
| @code{lou_translateString}}). @code{emphClass} is the name of an |
| emphasis class. |
| |
| @node lou_dotsToChar |
| @section lou_dotsToChar |
| @findex lou_dotsToChar |
| |
| @example |
| int lou_dotsToChar ( |
| const char *tableList, |
| const widechar *inbuf, |
| widechar *outbuf, |
| int length, |
| int mode) |
| @end example |
| |
| This function takes a widechar string in @code{inbuf} consisting of dot |
| patterns and converts it to a widechar string in @code{outbuf} |
| consisting of characters according to the specifications in |
| @code{tableList}. @code{length} is the length of both @code{inbuf} and |
| @code{outbuf}. The dot patterns in @code{inbuf} can be in either |
| liblouis format or Unicode braille. The function returns 1 on success |
| and 0 on failure. |
| |
| Note that the @code{mode} parameter has no effect and is deprecated. |
| |
| @node lou_charToDots |
| @section lou_charToDots |
| @findex lou_charToDots |
| |
| @example |
| int lou_charToDots ( |
| const char *tableList, |
| const widechar *inbuf, |
| widechar *outbuf, |
| int length, |
| int mode) |
| @end example |
| |
| This function is the inverse of @code{lou_dotsToChar}. It takes a |
| widechar string in @code{inbuf} consisting of characters and converts it |
| to a widechar string in @code{outbuf} consisting of dot patterns |
| according to the specifications in @code{tableList}. @code{length} is the |
| length of both @code{inbuf} and @code{outbuf}. The dot patterns in |
| @code{outbufbuf} are in liblouis format if the mode bit @code{ucBrl} is |
| not set and in Unicode format if it is set. The function returns 1 on |
| success and 0 on failure. |
| |
| @node lou_registerLogCallback |
| @section lou_registerLogCallback |
| @findex lou_registerLogCallback |
| |
| @example |
| typedef void (*logcallback) ( |
| int level, |
| const char *message); |
| |
| void lou_registerLogCallback ( |
| logcallback callback); |
| @end example |
| |
| This function can be used to register a custom logging callback. The |
| callback must take two arguments, the log level and the message string. By default |
| log messages are printed to stderr, or if a filename was specified |
| with @code{lou_logFile} then messages are logged to that |
| file. @code{lou_registerLogCallback} overrides the default |
| callback. Passing @code{NULL} resets to the default callback. |
| |
| @node lou_setLogLevel |
| @section lou_setLogLevel |
| @findex lou_setLogLevel |
| |
| @example |
| typedef enum |
| @{ |
| LOU_LOG_ALL = 0, |
| LOU_LOG_DEBUG = 10000, |
| LOU_LOG_INFO = 20000, |
| LOU_LOG_WARN = 30000, |
| LOU_LOG_ERROR = 40000, |
| LOU_LOG_FATAL = 50000, |
| LOU_LOG_OFF = 60000 |
| @} logLevels; |
| void lou_setLogLevel ( |
| logLevels level); |
| @end example |
| |
| This function can be used to influence the amount of logging, from |
| fatal error messages only to detailed debugging messages. Supported |
| values are @code{LOU_LOG_DEBUG}, @code{LOU_LOG_INFO}, |
| @code{LOU_LOG_WARN}, @code{LOU_LOG_ERROR}, @code{LOU_LOG_FATAL} and |
| @code{LOU_LOG_OFF}. Enabling logging at a given level also enables |
| logging at all higher levels. Setting the level to @code{LOU_LOG_OFF} |
| disables logging. The default level is @code{LOU_LOG_INFO}. |
| |
| @node lou_logFile |
| @section lou_logFile (deprecated) |
| @findex lou_logFile |
| |
| @example |
| void lou_logFile ( |
| char *fileName); |
| @end example |
| |
| This function is used when it is not convenient either to let messages |
| be printed on stderr or to use redirection, as when liblouis is used |
| in a GUI application or in liblouisutdml. Any error messages generated |
| will be printed to the file given in this call. The entire path name of |
| the file must be given. |
| |
| This function is deprecated. See @ref{Deprecation of the logging system}. |
| |
| @node lou_logPrint |
| @section lou_logPrint (deprecated) |
| @findex lou_logPrint |
| |
| @example |
| void lou_logPrint ( |
| char *format, |
| ...); |
| @end example |
| |
| This function is called like @code{fprint}. It can be used by other |
| libraries to print messages to the file specified by the call to |
| @code{lou_logFile}. In particular, it is used by the companion |
| library liblouisutdml. |
| |
| This function is deprecated. See @ref{Deprecation of the logging system}. |
| |
| @node lou_logEnd |
| @section lou_logEnd (deprecated) |
| @findex lou_logEnd |
| |
| @example |
| lou_logEnd (); |
| @end example |
| |
| This function is used at the end of processing a document to close the |
| log file, so that it can be read by the rest of the program. |
| |
| This function is deprecated. See @ref{Deprecation of the logging system}. |
| |
| @node lou_setDataPath |
| @section lou_setDataPath |
| @findex lou_setDataPath |
| |
| @example |
| char *lou_setDataPath ( |
| char *path); |
| @end example |
| |
| This function is used to tell liblouis and liblouisutdml where tables |
| and files are located. It thus makes them completely relocatable, even |
| on Linux. The @code{path} is the directory where the subdirectories |
| @code{liblouis/tables} and @code{liblouisutdml/lbu_files} are rooted |
| or located. The function returns a pointer to the @code{path}. |
| |
| @node lou_getDataPath |
| @section lou_getDataPath |
| @findex lou_getDataPath |
| |
| @example |
| char *lou_getDataPath (); |
| @end example |
| |
| This function returns a pointer to the path set by |
| @code{lou_setDataPath}. If no path has been set it returns |
| @code{NULL}. |
| |
| @node lou_getTable |
| @section lou_getTable |
| @findex lou_getTable |
| |
| @example |
| void *lou_getTable ( |
| char *tableList); |
| @end example |
| |
| @code{tableList} is a list of names of table files separated by |
| commas, as explained previously |
| (@pxref{translation-tables,,@code{tableList} parameter in |
| @code{lou_translateString}}). If no errors are found this function |
| returns a pointer to the compiled table. If errors are found error |
| messages are logged to the log callback (see |
| @code{lou_registerLogCallback}). Errors result in a @code{NULL} |
| pointer being returned. |
| |
| @node lou_findTable |
| @section lou_findTable |
| @findex lou_findTable |
| |
| @example |
| char *lou_findTable (const char *query); |
| @end example |
| |
| This function can be used to find a table based on |
| metadata. @code{query} is a string in the special @ref{Query |
| Syntax,query syntax}. It is matched against @ref{Table Metadata,table |
| metadata} inside the tables that were previously indexed with |
| @ref{lou_indexTables,@code{lou_indexTables}}. Returns the file name of |
| the best match. Returns @code{NULL} if the query is invalid or if no |
| match can be found. |
| |
| The match algorithm works as follows: |
| |
| @itemize @bullet |
| @item |
| For every table a match quotient with the query is computed. The table |
| with the highest (positive) match quotient wins. If no table has a |
| positive quotient, there is no match. |
| @item |
| A query is a list of features. Features defined first have a higher |
| importance (have a higher impact on the final quotient) than features |
| defined later. |
| @item |
| A feature that matches a metadata field in the table (keys equal and |
| values equal, or both values absent) adds to the quotient. |
| @item |
| A feature that is undefined in the table (no field with that key) |
| creates a medium penalty. |
| @item |
| A feature that is defined in the table but does not match (keys equal |
| but values not equal) creates the highest penalty. |
| @item |
| Every field in the table that has no corresponding feature in the |
| query creates a very small penalty. |
| @end itemize |
| |
| @node lou_indexTables |
| @section lou_indexTables |
| @findex lou_indexTables |
| |
| @example |
| void lou_indexTables (const char **tables); |
| @end example |
| |
| This function must be called prior to |
| @ref{lou_findTable,@code{lou_findTable}}. It parses, analyzes and |
| indexes all specified tables. @code{tables} must be an array of file |
| names. Tables that contain invalid metadata are ignored. |
| |
| @node lou_checkTable |
| @section lou_checkTable |
| @findex lou_checkTable |
| |
| @example |
| int lou_checkTable (const char *tableList); |
| @end example |
| |
| This function does the same as @code{lou_getTable} but does not return |
| a pointer to the resulting table. It is to be preferred if only the |
| validity of a table needs to be checked. @code{tableList} is a list of |
| names of table files separated by commas, as explained previously |
| (@pxref{translation-tables,,@code{tableList} parameter in |
| @code{lou_translateString}}). If no errors are found this function |
| returns a non-zero. If errors are found error messages are logged to |
| the log callback (see @code{lou_registerLogCallback}) and the return |
| value is @code{0}. |
| |
| @node lou_readCharFromFile |
| @section lou_readCharFromFile |
| @findex lou_readCharFromFile |
| |
| @example |
| int lou_readCharFromFile ( |
| const char *fileName, |
| int *mode); |
| @end example |
| |
| This function is provided for situations where it is necessary to read |
| a file which may contain little-endian or big-endian 16-bit Unicode |
| characters or ASCII8 characters. The return value is a little-endian |
| character, encoded as an integer. The @code{fileName} parameter is the |
| name of the file to be read. The @code{mode} parameter is a pointer to |
| an integer which must be set to 1 on the first call. After that, the |
| function takes care of it. On end-of-file the function returns |
| @code{EOF}. |
| |
| @node lou_free |
| @section lou_free |
| @findex lou_free |
| |
| @example |
| void lou_free (); |
| @end example |
| |
| This function should be called at the end of the application to free |
| all memory allocated by liblouis. Failure to do so will result in |
| memory leaks. Do @emph{NOT} call @code{lou_free} after each |
| translation. This will force liblouis to compile the translation |
| tables every time they are used, resulting in great inefficiency. |
| |
| @node lou_charSize |
| @section lou_charSize |
| @findex lou_charSize |
| |
| @example |
| int lou_charSize (); |
| @end example |
| |
| This function returns the size of @code{widechar} in bytes and can |
| therefore be used to differentiate between 16-bit and 32bit-Unicode |
| builds of liblouis. |
| |
| @node Python bindings |
| @section Python bindings |
| |
| There are Python bindings for @code{lou_translateString}, |
| @code{lou_translate}, @code{lou_backTranslateString}, |
| @code{lou_backTranslate}, @code{lou_hyphenate}, @code{checkTable}, |
| @code{lou_compileString} and @code{lou_version}. For installation |
| instructions see the the @file{README} file in the @file{python} |
| directory. Usage information is included in the Python module itself. |
| |
| |
| @node Concept Index |
| @unnumbered Concept Index |
| @printindex cp |
| |
| @node Opcode Index |
| @unnumbered Opcode Index |
| @printindex opcode |
| |
| @node Function Index |
| @unnumbered Function Index |
| @printindex fn |
| |
| @node Program Index |
| @unnumbered Program Index |
| @printindex pg |
| |
| @bye |
| |
| @c The following list is a list of exceptions for the ispell spell |
| @c checker |
| |
| @c LocalWords: liblouis opcode args BRLTTY ViewPlus Abilitiessoft LGPL lou |
| @c LocalWords: checktable allround checkhyphens Opcodes Multipass dotsToChar |
| @c LocalWords: translateString backTranslateString backTranslate charToDots |
| @c LocalWords: compileString logFile logPrint checkyaml findTable |
| @c LocalWords: getTable checkTable readCharFromFile itemx charSize |
| @c LocalWords: README liblouisxml pindex samp kbd opcodes opcoderef numsign |
| @c LocalWords: FIXME ctb nemeth filename multipass suboperand uplow litdigit |
| @c LocalWords: begcaps endcaps letsign noletsign largesign typeform |
| @c LocalWords: noletsignbefore noletsignafter compbrl firstwordital |
| @c LocalWords: lenitalphrase doubleOpcode lastworditalbefore firstletterital |
| @c LocalWords: lastworditalafter lastletterital firstwordbold UTF |
| @c LocalWords: singleletterital lastwordboldbefore lastwordboldafter |
| @c LocalWords: firstletterbold lastletterbold lenboldphrase filll |
| @c LocalWords: singleletterbold firstwordunder lastwordunderbefore |
| @c LocalWords: lastwordunderafter firstletterunder lastletterunder |
| @c LocalWords: singleletterunder lenunderphrase begcomp endcomp decpoint texi |
| @c LocalWords: capsnocont noback nofor texinfo setfilename settitle direntry |
| @c LocalWords: dircategory finalout defindex opcodeindex noindent uref vskip |
| @c LocalWords: titlepage insertcopying ifnottex dir detailmenu italword RET |
| @c LocalWords: TranslationTableHeader txt cti nocross exactdots nocont emph |
| @c LocalWords: prepunc postpunc repword joinword lowword sufword prfword API |
| @c LocalWords: begword begmidword midword midendword endword partword begnum |
| @c LocalWords: midnum endnum joinnum swapcd swapdd swapcc multind endLog |
| @c LocalWords: backtranslation compileTranslationTable typedef louis ruleArea |
| @c LocalWords: HASHNUM capitalSign compdots findex const inbuf outbuf outlen |
| @c LocalWords: tableList TABLEPATH widechar inputPos cursorPos outputPos |
| @c LocalWords: inlen compbrlAtCursor compbrlLeftCursor trantab stderr endian |
| @c LocalWords: tablelist fileName printindex deprecatedopcode setDataPath |
| @c LocalWords: getDataPath MathML suboperands logEnd liblouisutdml whitespace |
| @c LocalWords: xhhhh yhhhhh zhhhhhhhh OpenOffice documentencoding |
| @c LocalWords: YAML JSON logLevels nocontractsign OSX DLL env NVDA |
| @c LocalWords: MERCHANTABILITY registerLogCallback setLogLevel brf |
| @c LocalWords: cindex chardefs xhtml pxref dec multi hyph dic Aa al |
| @c LocalWords: mrow mfrac emphclass transnote subsubsection begemph |
| @c LocalWords: endemph emphletter begemphword endemphword www cd th |
| @c LocalWords: lenemphphrase begemphphrase endemphphrase andthe se |
| @c LocalWords: abrege decrement pre cornf comf scano cornm cornp po |
| @c LocalWords: h's brl testtrans UCS asis libyaml url yaml formtype |
| @c LocalWords: testmode iftex unicode ueb xfail eo noContractions |
| @c LocalWords: dotsIO ucBrl noUndefined partialTrans capsletter |
| @c LocalWords: abc doctest inString enum cp outbufbuf logcallback fprint |
| @c LocalWords: lbu EOF heckTable fn ispell getTypeformForEmphClass |
| @c LocalWords: indexTables begcapsword endcapsword typeforms |
| @c LocalWords: endemphphraseopcode emphClass BrailleSense HumanWare |
| @c LocalWords: BrailleNote refreshable |