blob: fecfe65ec2d958c2d9c0227d61d003a938b6a1df [file] [log] [blame]
\input texinfo
@c %**start of header
@setfilename liblouis.info
@documentencoding UTF-8
@include version.texi
@settitle Liblouis User's and Programmer's Manual
@dircategory Misc
@direntry
* Liblouis: (liblouis). A braille translator and back-translator
@end direntry
@finalout
@c Macro definitions
@defindex opcode
@c Opcode.
@macro opcode{name, args}
@opcodeindex \name\
@anchor{\name\ opcode}
@item \name\ \args\
@end macro
@macro opcoderef{name}
@code{\name\} opcode (@pxref{\name\ opcode,\name\,@code{\name\}})
@end macro
@c Opcode.
@macro deprecatedopcode{name, args, replacement}
@opcodeindex \name\
@anchor{\name\ opcode}
@item \name\ \args\
This opcode is deprecated. Use the @opcoderef{\replacement\} instead.
@end macro
@copying
This manual is for liblouis (version @value{VERSION}, @value{UPDATED}),
a Braille Translation and Back-Translation Library derived from the
Linux screen reader @acronym{BRLTTY}.
@vskip 10pt
@noindent
Copyright @copyright{} 1999-2006 by the BRLTTY Team.
@noindent
Copyright @copyright{} 2004-2007 ViewPlus Technologies, Inc.
@uref{www.viewplus.com}.
@noindent
Copyright @copyright{} 2007, 2009 Abilitiessoft, Inc.
@uref{www.abilitiessoft.org}.
@noindent
Copyright @copyright{} 2014, 2016 Swiss Library for the Blind, Visually
Impaired and Print Disabled. @uref{www.sbs.ch}.
@vskip 10pt
@quotation
This file is free software; you can redistribute it and/or modify it
under the terms of the GNU Lesser (or library) General Public License
(LGPL) as published by the Free Software Foundation; either version 3,
or (at your option) any later version.
This file is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
Lesser (or Library) General Public License LGPL for more details.
You should have received a copy of the GNU Lesser (or Library) General
Public License (LGPL) along with this program; see the file COPYING.
If not, write to the Free Software Foundation, 51 Franklin Street,
Fifth Floor, Boston, MA 02110-1301, USA.
@end quotation
@end copying
@titlepage
@title Liblouis User's and Programmer's Manual
@subtitle for version @value{VERSION}, @value{UPDATED}
@author by John J. Boyer
@c The following two commands start the copyright page.
@page
@vskip 0pt plus 1filll
@insertcopying
@end titlepage
@c Output the table of contents at the beginning.
@contents
@ifnottex
@node Top
@top Liblouis User's and Programmer's Manual
@insertcopying
@end ifnottex
@menu
* Introduction::
* How to Write Translation Tables::
* Notes on Back-Translation::
* Table Metadata::
* Testing Translation Tables interactively::
* Automated Testing of Translation Tables::
* Programming with liblouis::
* Concept Index::
* Opcode Index::
* Function Index::
* Program Index::
@detailmenu
--- The Detailed Node Listing ---
How to Write Translation Tables
* Overview::
* Hyphenation Tables::
* Character-Definition Opcodes::
* Braille Indicator Opcodes::
* Emphasis Opcodes::
* Special Symbol Opcodes::
* Special Processing Opcodes::
* Translation Opcodes::
* Character-Class Opcodes::
* Swap Opcodes::
* The Context and Multipass Opcodes::
* The correct Opcode::
* The match Opcode::
* Miscellaneous Opcodes::
Emphasis Opcodes
* Emphasis class::
* Contexts::
* Fallback behavior::
* Computer braille::
Contexts
* None::
* Letter::
* Word::
* Phrase::
* Symbol::
Testing Translation Tables interactively
* lou_debug::
* lou_trace::
* lou_checktable::
* lou_allround::
* lou_translate (program)::
* lou_checkhyphens::
* lou_checkyaml::
Programming with liblouis
* Overview (library)::
* Data structure of liblouis tables::
* How tables are found::
* Deprecation of the logging system::
* lou_version::
* lou_translateString::
* lou_translate::
* lou_backTranslateString::
* lou_backTranslate::
* lou_hyphenate::
* lou_compileString::
* lou_getTypeformForEmphClass::
* lou_dotsToChar::
* lou_charToDots::
* lou_registerLogCallback::
* lou_setLogLevel::
* lou_logFile::
* lou_logPrint::
* lou_logEnd::
* lou_setDataPath::
* lou_getDataPath::
* lou_getTable::
* lou_findTable::
* lou_indexTables::
* lou_checkTable::
* lou_readCharFromFile::
* lou_free::
* lou_charSize::
* Python bindings::
@end detailmenu
@end menu
@node Introduction
@chapter Introduction
Liblouis is an open-source braille translator and back-translator
derived from the translation routines in the BRLTTY screen reader for
Linux. It has, however, gone far beyond these routines. It is named in
honor of Louis Braille. In Linux and Mac OSX it is a shared library,
and in Windows it is a DLL. For installation instructions see the
README file. Please report bugs and oddities to the mailing list,
@email{liblouis-liblouisxml@@freelists.org}
This documentation is derived from the BRLTTY manual, but
it has been extensively rewritten to cover new features.
@section Who is this manual for
This manual has two main audiences: People who want to write or
improve a braille translation table and people who want to use the
braille translator library in their own programs. This manual is
probably not for people who are looking for some turn-key braille
translation software.
@section How to read this manual
If you are mostly interested in writing braille translation tables
then you want to focus on @ref{How to Write Translation Tables}. You
might want to look at @ref{Notes on Back-Translation} if you are
interested in back-translation. Read @ref{Table Metadata} if you want
to find out how you can augment your tables with metadata in order to
make them discoverable by programs. Finally @ref{Testing Translation
Tables interactively} and @ref{Automated Testing of Translation
Tables} will show how your braille translation tables can be tested
interactively and also in an automated fashion.
If you want to use the braille translation library in your own program
or you are interested in enhancing the braille translation library
itself then you will want to look at @ref{Programming with liblouis}.
@node How to Write Translation Tables
@chapter How to Write Translation Tables
For many languages there is already a translation table, so before
creating a new table start by looking at existing tables to modify
them as needed.
Typically, a braille translation table consists of several parts.
First are header and includes, in which you write what the table is
for, license information and include tables you need for your table.
Following this, you'll write various translation rules and lastly you
write special rules to handle certain situations.
@cindex Opcode
A translation rule is composed of at least three parts: the opcode
(translation command), character(s) and braille dots. An opcode is a
command you give to a machine or a program to perform something on
your behalf. In liblouis, an opcode tells it which rule to use when
translating characters into braille. An operand can be thought of as
parameters for the translation rule and is composed of two parts: the
character or word to be translated and the braille dots.
For example, suppose you want to read the word @samp{world} using
braille dots @samp{456}, followed by the letter @samp{W} all the time.
Then you'd write:
@example
always world 456-2456
@end example
The word @code{always} is an opcode which tells liblouis to always
honor this translation, that is to say when the word @samp{world} (an
operand) is encountered, always show braille dots @samp{456} followed
by the letter @samp{w} (@samp{2456}).
When you write any braille table for any language, we'd recommend
working from some sort of official standard, and have a device or a
program in which you can test your work.
@menu
* Overview::
* Hyphenation Tables::
* Character-Definition Opcodes::
* Braille Indicator Opcodes::
* Emphasis Opcodes::
* Special Symbol Opcodes::
* Special Processing Opcodes::
* Translation Opcodes::
* Character-Class Opcodes::
* Swap Opcodes::
* The Context and Multipass Opcodes::
* The correct Opcode::
* The match Opcode::
* Miscellaneous Opcodes::
@end menu
@node Overview
@section Overview
Many translation (contraction) tables have already been made up. They
are included in the distribution in the tables directory and can be
studied as part of the documentation. Some of the more helpful (and
normative) are listed in the following table:
@table @file
@item chardefs.cti
Character definitions for U.S. tables
@item compress.ctb
Remove excessive whitespace
@item en-us-g1.ctb
Uncontracted American English
@item en-us-g2.ctb
Contracted or Grade 2 American English
@item en-us-brf.dis
Make liblouis output conform to BRF standard
@item en-us-comp8.ctb
8-dot computer braille for use in coding examples
@item en-us-comp6.ctb
6-dot computer braille
@item nemeth.ctb
Nemeth Code translation for use with liblouisutdml
@item nemeth_edit.ctb
Fixes errors at the boundaries of math and text
@end table
The names used for files containing translation tables are completely
arbitrary. They are not interpreted in any way by the translator.
Contraction tables may be 8-bit ASCII files, UTF-8, 16-bit big-endian
Unicode files or 16-bit little-endian Unicode files. Blank lines are
ignored. Any leading and trailing whitespace (any number of blanks
and/or tabs) is ignored. Lines which begin with a number sign or hatch
mark (@samp{#}) are ignored, i.e.@: they are comments. If the number
sign is not the first non-blank character in the line, it is treated
as an ordinary character. If the first non-blank character is
less-than (@samp{<}) the line is also treated as a comment. This makes
it possible to mark up tables as xhtml documents. Lines which are not
blank or comments define table entries. The general format of a table
entry is:
@example
opcode operands comments
@end example
Table entries may not be split between lines. The opcode is a mnemonic
that specifies what the entry does. The operands may be character
sequences, braille dot patterns or occasionally something else. They
are described for each opcode, please @pxref{Opcode Index}. With some
exceptions, opcodes expect a certain number of operands. Any text on
the line after the last operand is ignored, and may be a comment. A
few opcodes accept a variable number of operands. In this case a
number sign (@samp{#}) begins a comment unless it is preceded by a
backslash (@samp{\}).
Here are some examples of table entries.
@example
# This is a comment.
always world 456-2456 A word and the dot pattern of its contraction
@end example
Most opcodes have both a "characters" operand and a "dots" operand,
though some have only one and a few have other types.
@cindex Characters operand
The characters operand consists of any combination of characters and
escape sequences proceeded and followed by whitespace. Escape
sequences are used to represent difficult characters. They begin with
a backslash (@samp{\}). They are:
@table @kbd
@item \
backslash
@item \f
form feed
@item \n
new line
@item \r
carriage return
@item \s
blank (space)
@item \t
horizontal tab
@item \v
vertical tab
@item \e
"escape" character (hex 1b, dec 27)
@item \xhhhh
4-digit hexadecimal value of a character
@end table
If liblouis has been compiled for 32-bit Unicode the following are
also recognized.
@table @kbd
@item \yhhhhh
5-digit (20 bit) character
@item \zhhhhhhhh
Full 32-bit value.
Please take a look at the
@url{https://unicode.org/Public/UNIDATA/,public directory of the
Unicode Character Database} as well as at the
@url{https://unicode.org/Public/UNIDATA/NamesList.txt,Unicode names
list with their code points} to figure out the corresponding Unicode
code point for a given Unicode character.
@end table
@cindex Dots operand
The dots operand is a braille dot pattern. The real braille dots, 1
through 8, must be specified with their standard numbers.
@cindex Virtual dots
@anchor{virtual dots}
liblouis recognizes @emph{virtual dots}, which are used for special
purposes, such as distinguishing accent marks. There are seven virtual
dots. They are specified by the number 9 and the letters @samp{a}
through @samp{f}.
@cindex Multi-cell dot pattern
For a multi-cell dot pattern, the cell specifications must be
separated from one another by a dash (@samp{-}). For example, the
contraction for the English word @samp{lord} (the letter @samp{l}
preceded by dot 5) would be specified as @samp{5-123}. A space may be
specified with the special dot number 0.
An opcode which is helpful in writing translation tables is
@code{include}. Its format is:
@example
include filename
@end example
It reads the file indicated by @code{filename} and incorporates or
includes its entries into the table. Included files can include other
files, which can include other files, etc. For an example, see what
files are included by the entry @code{include en-us-g1.ctb} in the table
@file{en-us-g2.ctb}. If the included file is not in the same directory
as the main table, use a full path name for filename. Tables can also be
specified in a table list, in which the table names are separated by
commas and given as a single table name in calls to the translation
functions.
The order of the various types of opcodes or table entries is
important. Character-definition opcodes should come first. However, if
the optional @opcoderef{display} is used it should precede
character-definition opcodes. Braille-indicator opcodes should come
next. Translation opcodes should follow. The @opcoderef{context} is a
translation opcode, even though it is considered along with the
multipass opcodes. These latter should follow the translation opcodes.
The @opcoderef{correct} can be used anywhere after the
character-definition opcodes, but it is probably a good idea to group
all @code{correct} opcodes together. The @opcoderef{include} can be
used anywhere, but the order of entries in the combined table must
conform to the order given above. Within each type of opcode, the
order of entries is generally unimportant. Thus the translation
entries can be grouped alphabetically or in any other order that is
convenient. Hyphenation tables may be specified either with an
@code{include} opcode or as part of a table list. They should come after
everything else.
@node Hyphenation Tables
@section Hyphenation Tables
Hyphenation tables are necessary to make opcodes such as the
@opcoderef{nocross} function properly. There are no opcodes for
hyphenation table entries because these tables have a special format.
Therefore, they cannot be specified as part of an ordinary table.
Rather, they must be included using the @opcoderef{include} or as part
of a table list. The liblouis hyphenation algorithm was adopted from the
one used by OpenOffice. Note that Hyphenation tables must follow
character definitions and should preferably be the last. For an example
of a hyphenation table, see @file{hyph_en_US.dic}.
@node Character-Definition Opcodes
@section Character-Definition Opcodes
These opcodes are needed to define attributes such as digit,
punctuation, letter, etc. for all characters and their dot patterns.
liblouis has no built-in character definitions, but such definitions
are essential to the operation of the @opcoderef{context}, the
@opcoderef{correct}, the multipass opcodes and the back-translator. If
the dot pattern is a single cell, it is used to define the mapping
between dot patterns and characters, unless a @opcoderef{display} for
that character-dot-pattern pair has been used previously. If only a
single-cell dot pattern has been given for a character, that dot
pattern is defined with the character's own attributes.
You may have multiple definitions of a character using the same or
different dot patterns. If you use different dot patterns for the same
character, only the first dot pattern will be used during forward
translation. However, during back-translation, all the relevant dot
patterns will back-translate to the character you defined.
You can also define a character multiple times using the same dot
pattern for the character, but using different character classes. The
following example would define the character @samp{*} (star) as both
@opcoderef{math} and @opcoderef{sign}.
@example
math * 16
sign * 16
@end example
Likewise, you can define multiple characters as the same dot pattern.
The characters you define this way will be forward translated to the
same dot pattern. However, when back-translating, the dot pattern will
always back-translate to the first character that was defined with
this pattern.
This technique may be useful when defining characters that have one
representation in the Windows character set (CP1252) and another
representation in the Unicode character set, e.g. the Euro sign,
@samp{€}. It may also be of use when you have to define several
variants of the same letter with different accents, which may be
represented in your Braille code by the same dot pattern. This is a
very common practice for accented letters that are foreign to the
Braille code. In the following example using the @opcoderef{uplow}
opcode, both e acute (@samp{é}) and e grave (@samp{è}) are defined as
dot 4 followed by dots 1 and 5.
@example
uplow \x00c9\x00e9 4-15 # E acute
uplow \x00c8\x00e8 4-15 # E grave
@end example
In this example, the dot pattern would always back-translate to e
acute, since this is the first definition. You could use the
@opcoderef{correct} opcode to correct at least the most common errors
on that account. However, there is no fail-safe way to know what
accented letter to use when you back-translate from a dot pattern
representing more than one variant.
@table @code
@opcode{space, character dots}
Defines a character as a space and also defines the dot pattern as
such. for example:
@example
space \s 0 \s is the escape sequence for blank; 0 means no dots.
@end example
@opcode{punctuation, character dots}
Associates a punctuation mark in the particular language with a
braille representation and defines the character and dot pattern as
punctuation. For example:
@example
punctuation . 46 dot pattern for period in NAB computer braille
@end example
@opcode{digit, character dots}
Associates a digit with a dot pattern and defines the character as a
digit. For example:
@example
digit 0 356 NAB computer braille
@end example
@opcode{uplow, characters dots [@comma{}dots]}
The characters operand must be a pair of letters, of which the first
is uppercase and the second lowercase. The first dots suboperand
indicates the dot pattern for the upper-case letter. It may have more
than one cell. The second dots suboperand must be separated from the
first by a comma and is optional, as indicated by the square brackets.
If present, it indicates the dot pattern for the lower-case letter. It
may also have more than one cell. If the second dots suboperand is not
present the first is used for the lower-case letter as well as the
upper-case letter. This opcode is needed because not all languages
follow a consistent pattern in assigning Unicode codes to upper and
lower case letters. It should be used even for languages that do. The
distinction is important in the forward translator. for example:
@example
uplow Aa 17,1
@end example
@opcode{grouping, name characters dots @comma{}dots}
This opcode is used to indicate pairs of grouping symbols used in
processing mathematical expressions. These symbols are usually
generated by the MathML interpreter in liblouisutdml. They are used in
multipass opcodes. The name operand must contain only letters (a-z and
A-Z). The letters may be upper or lower-case but the case matters. The
characters operand must contain exactly two Unicode characters. The
dots operand must contain exactly two braille cells, separated by a
comma. Note that grouping dot patterns also need to be declared with
the @opcoderef{exactdots}. The characters may need to be declared with
the @opcoderef{math}.
@example
grouping mrow \x0001\x0002 1e,2e
grouping mfrac \x0003\x0004 3e,4e
@end example
@opcode{letter, character dots}
Associates a letter in the language with a braille representation and
defines the character as a letter. This is intended for letters which
are neither uppercase nor lowercase.
@opcode{lowercase, character dots}
Associates a character with a dot pattern and defines the character as
a lowercase letter. Both the character and the dot pattern have the
attributes lowercase and letter.
@opcode{uppercase, character dots}
Associates a character with a dot pattern and defines the character as
an uppercase letter. Both the character and the dot pattern have the
attributes uppercase and letter. @code{lowercase} and @code{uppercase}
should be used when a letter has only one case. Otherwise use the
@opcoderef{uplow}.
@opcode{litdigit, digit dots}
Associates a digit with the dot pattern which should be used to
represent it in literary texts. For example:
@example
litdigit 0 245
litdigit 1 1
@end example
@opcode{sign, character dots}
Associates a character with a dot pattern and defines both as a sign.
This opcode should be used for things like at sign (@samp{@@}),
percent (@samp{%}), dollar sign (@samp{$}), etc. Do not use it to
define ordinary punctuation such as period and comma. For example:
@example
sign % 4-25-1234 literary percent sign
@end example
@opcode{math, character dots}
Associates a character and a dot pattern and defines them as a
mathematical symbol. It should be used for less than (@samp{<}),
greater than(@samp{>}), equals(@samp{=}), plus(@samp{+}), etc. For
example:
@example
math + 346 plus
@end example
@end table
@node Braille Indicator Opcodes
@section Braille Indicator Opcodes
Braille indicators are dot patterns which are inserted into the
braille text to indicate such things as capitalization, italic type,
computer braille, etc. The opcodes which define them are followed only
by a dot pattern, which may be one or more cells.
@table @code
@opcode{capsletter, dots}
The dot pattern which indicates capitalization of a single letter. In
English, this is dot 6. For example:
@example
capsletter 6
@end example
@opcode{begcapsword, dots}
The dot pattern which begins a block of capital letters at the
beginning or within a word. The block is automatically terminated
by any character that is not a capital letter, e.g. small letters,
punctuation, numbers etc.
Apart from capital letters, you can define a list of characters that
can appear within a word in capitals without terminating the block.
Do this by using the @opcoderef{capsmodechars} opcode.
Example:
@example
begcapsword 6-6
@end example
@opcode{endcapsword, dots}
The dot pattern which ends a block of capital letters within a word.
It is used in cases where the block is not terminated automatically
by a word boundary, a number or punctuation. A common case is when
an uppercase block is followed directly by a lowercase letter.
For example:
@example
endcapsword 6-3
@end example
@opcode{capsmodechars, characters}
Normally, any character other than a capital letter will cancel the
@opcoderef{begcapsword} indicator. However, by using the
@code{capsmodechars} opcode, you can specify a list of characters
that are legal within a capitalized word. In some Braille codes,
this might be the case for the hyphen character, @samp{-}.
Example:
@example
capsmodechars -
@end example
@opcode{begcaps, dots}
The dot pattern which begins a block of capital letters defined by the
provided @code{typeform} without regard for any other rules.
This construct is sometimes also called a capsphrase. It is used
in some Braille codes to mark a whole phrase or sentence as capital
letters. The block can contain capital letters as well as
none-alphabetic characters, punctuation, numbers etc. The
block is terminated when a small letter is encountered or at the end of the input string.
Example:
@example
begcaps 6-6-6
@end example
@opcode{endcaps, dots}
The dot pattern which ends a block of capital letters defined by the
provided @code{typeform} without regard for any other rules. For
example:
@example
endcaps 6-3
@end example
@opcode{letsign, dots}
This indicator is needed in Grade 2 to show that a single letter is
not a contraction. It is also used when an abbreviation happens to be
a sequence of letters that is the same as a contraction. For example:
@example
letsign 56
@end example
@opcode{noletsign, letters}
The letters in the operand will not be proceeded by a letter sign.
More than one @code{noletsign} opcode can be used. This is equivalent
to a single entry containing all the letters. In addition, if a single
letter, such as @samp{a} in English, is defined as a @code{word}
(@pxref{word opcode,word,@code{word}}) or @code{largesign}
(@pxref{largesign opcode,largesign,@code{largesign}}), it will be
treated as though it had also been specified in a @code{noletsign}
entry.
@opcode{noletsignbefore, characters}
If any of the characters proceeds a single letter without a space a
letter sign is not used. By default the characters apostrophe
(@samp{'}) and period (@samp{.}) have this property. Use of a
@code{noletsignbefore} entry cancels the defaults. If more than one
@code{noletsignbefore} entry is used, the characters in all entries
are combined.
@opcode{noletsignafter, characters}
If any of the characters follows a single letter without a space a
letter sign is not used. By default the characters apostrophe
(@samp{'}) and period (@samp{.}) have this property. Use of a
@code{noletsignafter} entry cancels the defaults. If more than one
@code{noletsignafter} entry is used the characters in all entries are
combined.
@opcode{nocontractsign, dots}
The dots in this opcode are used to indicate a letter or a sequence of
letters that are not a contraction, e.g. @samp{CD}. The opcode is
similar to the @opcoderef{letsign}.
@c FIXME: In what way is the nocontractsign opcode different from the
@c letsign opcode, apart from apparently being a more focused version of
@c letsign?
@opcode{numsign, dots}
The translator inserts this indicator before numbers made up of digits
defined with the @opcoderef{litdigit} to show that they are a number
and not letters or some other symbols. A number is terminated when a
space, a letter or any other none-@opcoderef{litdigit} character is
encountered.
You can define characters or strings to be part of a number by using
the @opcoderef{midnum} opcode, the @opcoderef{numericmodechars} opcode
or the @opcoderef{midendnumericmodechars} opcode.
Example:
@example
numsign 3456
@end example
@opcode{numericnocontchars, characters}
This opcode specifies the characters that require a
@opcoderef{nocontractsign} if they appear after a number with no
intervening space, e.g. @samp{1a} or @samp{2-B}.
These characters will typically be the letters a-j, which usually
constitute the literary digits (see @opcoderef{litdigit}). However,
in some Braille codes, all letters fall in this category.
Please, note that this opcode is case sensitive. So, if you need a
@opcoderef{nocontractsign} to also appear before the capital letters
A-j, you should include these letters in the definition. This is
especially relevant if you are also using the @opcoderef{begcaps}
and @opcoderef{endcaps} opcodes. In this case, you might otherwise
end up having numbers immediately followed by capital letters with no
indicator between.
Example:
@example
numericnocontchars abcdefghij
@end example
@opcode{numericmodechars, characters}
@opcode{midendnumericmodechars, characters}
Any of these characters can appear within a number without terminating
the effect of the number sign (@pxref{numsign
opcode,numsign,@code{numsign}}). In other words, they don't cancel
numeric mode.
The difference between the two opcodes is that
@opcoderef{numericmodechars} characters can appear anywhere in a
number whereas @opcoderef{midendnumericmodechars} characters can
appear only in the middle or at the end of a number. Like
@code{midendnumericmodechars}, @code{numericmodechars} characters keep
numeric mode active, but in addition they activate numeric mode
immediately when at least one digit follows, and the number sign will
precede the @code{numericmodechars} character in this case.
Example:
@example
numericmodechars .,
midendnumericmodechars -/
@end example
@end table
@node Emphasis Opcodes
@section Emphasis Opcodes
In many braille systems emphasis such as bold, italics or underline is
indicated using special dot patterns that mark the start and often
also the end. For some languages these braille indicators differ
depending on the context, i.e.@: here is an separate indicator for an
emphasized word and another one for an emphasized phrase. To
accommodate for all these usage scenarios liblouis provides a number of
opcodes for various contexts.
At the same time some braille systems use different indicators for
different kinds of emphasis while others know only one kind of
emphasis. For that reason liblouis doesn't hard code any emphasis but
the table author defines which kind of emphasis exist for a specific
language using the @opcoderef{emphclass} opcode.
@menu
* Emphasis class::
* Contexts::
* Fallback behavior::
* Computer braille::
@end menu
@node Emphasis class
@subsection Emphasis class
The @code{emphclass} opcode defines the classes of emphasis that are
relevant for a particular language. For all emphasis that need special
indicators an emphasis class has to be declared.
@table @code
@opcode{emphclass, <emphasis class>}
Define an emphasis class to be used later in other emphasis related
opcodes in the table.
@example
emphclass italic
emphclass underline
emphclass bold
emphclass transnote
@end example
@end table
@node Contexts
@subsection Contexts
In order to understand the capabilities of Liblouis for emphasis
handling we have to look at the different contexts that are supported.
@menu
* None::
* Letter::
* Word::
* Phrase::
* Symbol::
@end menu
@node None
@subsubsection None
For some languages there is no such concept as contexts. Emphasis is
always handled the same regardless of context. There is simply an
indicator for the beginning of emphasis and another one for the end of
the emphasis.
@table @code
@opcode{begemph, <emphasis class> <dot pattern>}
Braille dot pattern to indicate the beginning of emphasis.
@example
begemph italic 46-3
@end example
@opcode{endemph, <emphasis class> <dot pattern>}
Braille dot pattern to indicate the end of emphasis.
@example
endemph italic 46-36
@end example
@end table
@node Letter
@subsubsection Letter
Some languages have special indicators for single letter emphasis.
@table @code
@opcode{emphletter, <emphasis class> <dot pattern>}
Braille dot pattern to indicate that the next character is emphasized.
@example
emphletter italic 46-25
@end example
@end table
@node Word
@subsubsection Word
Many languages have special indicators for emphasized words. Usually
they start at the beginning of the word and and implicitly, i.e.@:
without a closing indicator at the end of the word. There are also use
cases where the emphasis starts in the middle of the word and an
explicit closing indicator is required.
@table @code
@opcode{begemphword, <emphasis class> <dot pattern>}
Braille dot pattern to indicate the beginning of an emphasized word
or the beginning of emphasized characters within a word.
@example
begemphword underline 456-36
@end example
@opcode{endemphword, <emphasis class> <dot pattern>}
Generally emphasis with word context ends when the word ends. However
when an indication is required to close a word emphasis then this
opcode defines the Braille dot pattern that indicates the end of a word
emphasis.
@example
endemphword transnote 6-3
@end example
If emphasis ends in the middle of a word the Braille dot pattern
defined in this opcode is also used.
@opcode{emphmodechars, characters}
Normally, only space characters will cancel the
@opcoderef{begemphword} indicator. However, by using the
@code{emphmodechars} opcode, you can specify the list of characters
that are legal within a emphasized word. If @code{emphmodechars} is
specified, any character that is not in this list and is not a
@code{letter} will cancel the @opcoderef{begemphword} indicator.
Example:
@example
emphmodechars -
@end example
@end table
@node Phrase
@subsubsection Phrase
Many languages have a concept of a phrase where the emphasis is valid
for a number of words. The beginning of the phase is indicated with a
braille dot pattern and a closing indicator is put before or after the
last word of the phrase. To define how many words are considered a
phrase in your language use the @opcoderef{lenemphphrase}.
@table @code
@opcode{begemphphrase, <emphasis class> <dot pattern>}
Braille dot pattern to indicate the beginning of a phrase.
@example
begemphphrase bold 456-46-46
@end example
@c define a special opcode macro that can handle the two-word nature
@c of the endemphphrase opcode
@macro endemphphraseopcode{where}
@opcodeindex endemphphrase \where\
@anchor{endemphphrase \where\ opcode}
@item endemphphrase <emphasis class> \where\ <dot pattern>
@end macro
@endemphphraseopcode{before}
Braille dot pattern to indicate the end of a phrase. The closing indicator
will be placed before the last word of the phrase.
@example
endemphphrase bold before 456-46
@end example
@endemphphraseopcode{after}
Braille dot pattern to indicate the end of a phrase. The closing
indicator will be placed after the last word of the phrase. If both
@code{endemphphrase <emphasis class> before} and @code{endemphphrase
<emphasis class> after} are defined an error will be signaled.
@example
endemphphrase underline after 6-3
@end example
@opcode{lenemphphrase, <emphasis class> <number>}
Define how many words are required before a sequence of words is
considered a phrase.
@example
lenemphphrase underline 3
@end example
@end table
@node Symbol
@subsubsection Symbol
UEB has a concept of symbols that need special indication. When the
translator detects an emphasis sequence that needs to be indicated
with the rules for a symbol then it will use the dots defined with the
@opcoderef{emphletter}. To indicate the end of the symbol it will use
the dots defined in the @opcoderef{endemphword}.
@node Fallback behavior
@subsection Fallback behavior
Many braille systems either handle emphasis using no contexts or
otherwise by employing a combination of the letter, word and phrase
contexts. So if a table defines any opcodes for the letter, word or
phrase contexts then liblouis will signal an error for opcodes that
define emphasis with no context. In other words contrary to previous
versions of liblouis there is no fallback behavior.
As a consequence, there will only be emphasis for a context when the
table defines it. So for example when defining a braille dot pattern
for phrases and not for words liblouis will not indicate emphasis on
words that aren't part of a phrase.
@node Computer braille
@subsection Computer braille
For computer braille there are only two braille indicators, for the
beginning and end of a sequence of characters to be rendered in
computer braille. Such a sequence may also have other emphasis. The
computer braille indicators are applied not only when computer braille
is indicated in the @code{typeform} parameter, but also when a
sequence of characters is determined to be computer braille because it
contains a subsequence defined by the @opcoderef{compbrl}.
@node Special Symbol Opcodes
@section Special Symbol Opcodes
These opcodes define certain symbols, such as the decimal point, which
require special treatment.
@table @code
@opcode{decpoint, character dots}
This opcode defines the decimal point. It is useful if your Braille
code requires the decimal separator to show as a dot pattern different
from the normal representation of this character, i.e.@: period or
comma. In addition, it allows the notation @samp{.001} to be
translated correctly. This notation is common in some languages
instead of @samp{0.001} (no leading 0). When you use the
@code{decpoint} opcode, the decimal point will be taken to be part of
the number and correctly preceded by number sign.
The character operand must have only one character. For example, in
@file{en-us-g1.ctb} we have:
@example
decpoint . 46
@end example
@opcode{hyphen, character dots}
This opcode defines the hyphen, that is, the character used in
compound words such as @samp{have-nots}. The back-translator uses it
to determine the end of individual words.
@end table
@node Special Processing Opcodes
@section Special Processing Opcodes
These opcodes cause special processing to be carried out.
@table @code
@opcode{capsnocont,}
This opcode has no operands. If it is specified, words or parts of
words in all caps are not contracted. This is needed for languages
such as Norwegian.
Note: If you use the capsnocont opcode and do not define the
@opcoderef{begcapsword} indicator, every cap will be marked with the
@opcoderef{capsletter} indicator. This is useful if you need to process caps
separately in a later pass.
@end table
@node Translation Opcodes
@section Translation Opcodes
These opcodes define the braille representations for character
sequences. Each of them defines an entry within the contraction table.
These entries may be defined in any order except, as noted below, when
they define alternate representations for the same character sequence.
Each of these opcodes specifies a condition under which the
translation is legal, and each also has a characters operand and a
dots operand. The text being translated is processed strictly from
left to right, character by character, with the most eligible entry
for each position being used. If there is more than one eligible entry
for a given position in the text, then the one with the longest
character string is used. If there is more than one eligible entry for
the same character string, then the one defined first is is tested for
legality first. (This is the only case in which the order of the
entries makes a difference.)
The characters operand is a sequence or string of characters preceded
and followed by whitespace. Each character can be entered in the
normal way, or it can be defined as a four-digit hexadecimal number
preceded by @samp{\x}.
The dots operand defines the braille representation for the characters
operand. It may also be specified as an equals sign (@samp{=}). This
means that the the default representation for each character
(@pxref{Character-Definition Opcodes}) within the sequence is to be
used. It is an error if not all the characters in the rule have been
previously defined in a character-definition rule. Note that the
@samp{=} shortcut for dot patterns has a known bug@footnote{See
@url{https://github.com/liblouis/liblouis/issues/500#issuecomment-365753137}.}
that might cause problems when back-translating.
In what follows the word @samp{characters} means a sequence of one or
more consecutive letters between spaces and/or punctuation marks.
@table @code
@opcode{noback, opcode ...}
This is an opcode prefix, that is to say, it modifies the operation of
the opcode that follows it on the same line. noback specifies that
back-translation is not to use information on this line.
@example
noback always ;\s; 0
@end example
@opcode{nofor, opcode ...}
This is an opcode prefix which modifies the operation of the opcode
following it on the same line. nofor specifies that forward translation
is not to use the information on this line.
@opcode{compbrl, characters}
If the characters are found within a block of text surrounded by
whitespace the entire block is translated according to the default
braille representations defined by the @ref{Character-Definition
Opcodes}, if 8-dot computer braille is enabled or according to the dot
patterns given in the @opcoderef{comp6}, if 6-dot computer braille is
enabled. For example:
@example
compbrl www translate URLs in computer braille
@end example
@opcode{comp6, character dots}
This opcode specifies the translation of characters in 6-dot computer
braille. It is necessary because the translation of a single character
may require more than one cell. The first operand must be a character
with a decimal representation from 0 to 255 inclusive. The second
operand may specify as many cells as necessary. The opcode is somewhat
of a misnomer, since any dots, not just dots 1 through 6, can be
specified. This even includes virtual dots (@pxref{virtual dots}).
@opcode{nocont, characters}
Like @code{compbrl}, except that the string is uncontracted.
@opcoderef{prepunc} and @opcoderef{postpunc} rules are applied,
however. This is useful for specifying that foreign words should not
be contracted in an entire document.
@opcode{replace, characters @{characters@}}
Replace the first set of characters, no matter where they appear, with
the second. Note that the second operand is @emph{NOT} a dot pattern.
It is also optional. If it is omitted the character(s) in the first
operand will be discarded. This is useful for ignoring characters. It
is possible that the "ignored" characters may still affect the
translation indirectly. Therefore, it is preferable to use
@opcoderef{correct}.
@opcode{always, characters dots}
Replace the characters with the dot pattern no matter where they
appear. Do @emph{NOT} use an entry such as @code{always a 1}. Use the
@code{uplow}, @code{letter}, etc. character definition opcodes
instead. For example:
@example
always world 456-2456 unconditional translation
@end example
@opcode{repeated, characters dots}
Replace the characters with the dot pattern no matter where they
appear. Ignore any consecutive repetitions of the same character
sequence. This is useful for shortening long strings of spaces or
hyphens or periods. For example:
@example
repeated --- 36-36-36 shorten separator lines made with hyphens
@end example
@opcode{repword, characters dots}
When characters are encountered check to see if the word before this
string matches the word after it. If so, replace characters with dots
and eliminate the second word and any word following another
occurrence of characters that is the same. This opcode is used in
Malaysian braille. In this case the rule is:
@example
repword - 123456
@end example
@opcode{largesign, characters dots}
Replace the characters with the dot pattern no matter where they
appear. In addition, if two words defined as large signs follow each
other, remove the space between them. For example, in
@file{en-us-g2.ctb} the words @samp{and} and @samp{the} are both
defined as large signs. Thus, in the phrase @samp{the cat and the dog}
the space would be deleted between @samp{and} and @samp{the}, with the
result @samp{the cat andthe dog}. Of course, @samp{and} and @samp{the}
would be properly contracted. The term @code{largesign} is a bit of
braille jargon that pleases braille experts.
@opcode{word, characters dots}
Replace the characters with the dot pattern if they are a word, that
is, are surrounded by whitespace and/or punctuation.
@opcode{syllable, characters dots}
As its name indicates, this opcode defines a "syllable" which must be
represented by exactly the dot patterns given. Contractions may not
cross the boundaries of this "syllable" either from left or right. The
character string defined by this opcode need not be a lexical
syllable, though it usually will be. The equal sign in the following
example means that the the default representation for each character
within the sequence is to be used (@pxref{Translation Opcodes}):
@example
syllable horse = sawhorse, horseradish
@end example
@opcode{nocross, characters dots}
Replace the characters with the dot pattern if the characters are all
in one syllable (do not cross a syllable boundary). For this opcode to
work, a hyphenation table must be included. If this is not done,
@code{nocross} behaves like the @opcoderef{always}. For example, if
the English Grade 2 table is being used and the appropriate
hyphenation table has been included @code{nocross sh 146} will cause
the @samp{sh} in @samp{monkshood} not to be contracted.
@opcode{joinword, characters dots}
Replace the characters with the dot pattern if they are a word which
is followed by whitespace and a letter. In addition remove the
whitespace. For example, @file{en-us-g2.ctb} has @code{joinword to
235}. This means that if the word @samp{to} is followed by another
word the contraction is to be used and the space is to be omitted. If
these conditions are not met, the word is translated according to any
other opcodes that may apply to it.
@opcode{lowword, characters dots}
Replace the characters with the dot pattern if they are a word
preceded and followed by whitespace. No punctuation either before or
after the word is allowed. The term @code{lowword} derives from the
fact that in English these contractions are written in the lower part
of the cell. For example:
@example
lowword were 2356
@end example
@opcode{contraction, characters}
If you look at @file{en-us-g2.ctb} you will see that some words are
actually contracted into some of their own letters. A famous example
among braille transcribers is @samp{also}, which is contracted as
@samp{al}. But this is also the name of a person. To take another
example, @samp{altogether} is contracted as @samp{alt}, but this is
the abbreviation for the alternate key on a computer keyboard.
Similarly @samp{could} is contracted into @samp{cd}, but this is the
abbreviation for compact disk. To prevent confusion in such cases, the
letter sign (see @opcoderef{letsign}) is placed before such letter
combinations when they actually are abbreviations, not contractions.
The @code{contraction} opcode tells the translator to do this.
@opcode{sufword, characters dots}
Replace the characters with the dot pattern if they are either a word
or at the beginning of a word.
@opcode{prfword, characters dots}
Replace the characters with the dot pattern if they are either a word
or at the end of a word.
@opcode{begword, characters dots}
Replace the characters with the dot pattern if they are at the
beginning of a word.
@opcode{begmidword, characters dots}
Replace the characters with the dot pattern if they are either at the
beginning or in the middle of a word.
@opcode{midword, characters dots}
Replace the characters with the dot pattern if they are in the middle
of a word.
@opcode{midendword, characters dots}
Replace the characters with the dot pattern if they are either in the
middle or at the end of a word.
@opcode{endword, characters dots}
Replace the characters with the dot pattern if they are at the end of
a word.
@opcode{partword, characters dots}
Replace the characters with the dot pattern if the characters are
anywhere in a word, that is, if they are proceeded or followed by a
letter.
@opcode{exactdots, @@dots}
Note that the operand must begin with an at sign (@samp{@@}). The dot
pattern following it is evaluated for validity. If it is valid,
whenever an at sign followed by this dot pattern appears in the source
document it is replaced by the characters corresponding to the dot
pattern in the output. This opcode is intended for use in liblouisutdml
semantic-action files to specify exact dot patterns, as in
mathematical codes. For example:
@example
exactdots @@4-46-12356
@end example
will produce the characters with these dot patterns in the output.
@opcode{prepunc, characters dots}
Replace the characters with the dot pattern if they are part of
punctuation at the beginning of a word.
@opcode{postpunc, characters dots}
Replace the characters with the dot pattern if they are part of
punctuation at the end of a word.
@opcode{begnum, characters dots}
Replace the characters with the dot pattern if they are at the
beginning of a number, that is, before all its digits. For example, in
@file{en-us-g1.ctb} we have @code{begnum # 4}.
@opcode{midnum, characters dots}
Replace the characters with the dot pattern if they are in the middle
of a number. For example, @file{en-us-g1.ctb} has @code{midnum . 46}.
This is because the decimal point has a different dot pattern than the
period.
@opcode{endnum, characters dots}
Replace the characters with the dot pattern if they are at the end of
a number. For example @file{en-us-g1.ctb} has @code{endnum th 1456}.
This handles things like @samp{4th}. A letter sign is @emph{NOT}
inserted.
@opcode{joinnum, characters dots}
Replace the characters with the dot pattern. In addition, if
whitespace and a number follows omit the whitespace. This opcode can
be used to join currency symbols to numbers for example:
@example
joinnum \x20AC 15 (EURO SIGN)
joinnum \x0024 145 (DOLLAR SIGN)
joinnum \x00A3 1234 (POUND SIGN)
joinnum \x00A5 13456 (YEN SIGN)
@end example
@end table
@node Character-Class Opcodes
@section Character-Class Opcodes
These opcodes define and use character classes. A character class
associates a set of characters with a name. The name then refers to
any character within the class. A character may belong to more than
one class.
The basic character classes correspond to the character definition
opcodes, with the exception of the @opcoderef{uplow}, which defines
characters belonging to the two classes @code{uppercase} and
@code{lowercase}. These classes are:
@table @code
@item space
Whitespace characters such as blank and tab
@item digit
Numeric characters
@item letter
Both uppercase and lowercase alphabetic characters
@item lowercase
Lowercase alphabetic characters
@item uppercase
Uppercase alphabetic characters
@item punctuation
Punctuation marks
@item sign
Signs such as percent (@samp{%})
@item math
Mathematical symbols
@item litdigit
Literary digit
@item undefined
Not properly defined
@end table
The opcodes which define and use character classes are shown below.
For examples see @file{el.ctb}.
@table @code
@opcode{class, name characters}
Define a new character class. The name operand must contain only
letters (a-z and A-Z). The letters may be upper or lower-case but the
case matters. The characters operand must be specified as a string. A
character class may not be used until it has been defined.
@opcode{after, class opcode ...}
The specified opcode is further constrained in that the matched
character sequence must be immediately preceded by a character
belonging to the specified class. If this opcode is used more than
once on the same line then the union of the characters in all the
classes is used.
@opcode{before, class opcode ...}
The specified opcode is further constrained in that the matched
character sequence must be immediately followed by a character
belonging to the specified class. If this opcode is used more than
once on the same line then the union of the characters in all the
classes is used.
@end table
@node Swap Opcodes
@section Swap Opcodes
The swap opcodes are needed to tell the @opcoderef{context}, the
@opcoderef{correct} and multipass opcodes which dot patterns to swap
for which characters. There are three, @code{swapcd}, @code{swapdd}
and @code{swapcc}. The first swaps dot patterns for characters. The
second swaps dot patterns for dot patterns and the third swaps
characters for characters. The first is used in the @code{context}
opcode and the second is used in the multipass opcodes.
All the swap opcodes have a name so they can be refered to from the
@code{context}, @code{correct} and multipass opcodes. The name operand
must contain only letters (a-z and A-Z). The letters may be upper or
lower-case but the case matters.
Dot patterns are separated by commas and may contain more than one
cell.
@table @code
@opcode{swapcd, name characters dots@comma{} dots@comma{} dots@comma{} ...}
See above paragraph for explanation. For example:
@example
swapcd dropped 0123456789 356,2,23,...
@end example
@opcode{swapdd, name dots@comma{} dots@comma{} dots ... dotpattern1@comma{} dotpattern2@comma{} dotpattern3@comma{} ...}
The @code{swapdd} opcode defines substitutions for the multipass
opcodes. In the second operand the dot patterns must be single cells,
but in the third operand multi-cell dot patterns are allowed. This is
because multi-cell patterns in the second operand would lead to
ambiguities.
@opcode{swapcc, name characters characters}
The @code{swapcc} opcode swaps characters in its second operand for
characters in the corresponding places in its third operand. It is
intended for use with @code{correct} opcodes and can solve problems
such as formatting phone numbers.
@end table
@node The Context and Multipass Opcodes
@section The Context and Multipass Opcodes
The @code{context} and multipass opcodes (@code{pass2}, @code{pass3}
and @code{pass4}) provide translation capabilities beyond those of the
basic translation opcodes (@pxref{Translation Opcodes}) discussed
previously. The multipass opcodes cause additional passes to be made
over the string to be translated. The number after the word
@code{pass} indicates in which pass the entry is to be applied. If no
multipass opcodes are given, only the first translation pass is made.
The @code{context} opcode is basically a multipass opcode for the
first pass. It differs slightly from the multipass opcodes per se.
When back-translating, the passes are performed in the reverse order,
i.e.@: @code{pass4}, @code{pass3}, @code{pass2}, @code{context}. Each
of these opcodes must be prefixed by either the @opcoderef{noback} or
the @opcoderef{nofor}. The format of all these opcodes is @code{opcode
test action}. The specific opcodes are invoked as follows:
@table @code
@anchor{context opcode}
@opcodeindex context
@opcodeindex pass2
@opcodeindex pass3
@opcodeindex pass4
@item context test action
@itemx pass2 test action
@itemx pass3 test action
@itemx pass4 test action
@end table
The @code{test} and @code{action} operands have suboperands. Each
suboperand begins with a non-alphanumeric character and ends when
another non-alphanumeric character is encountered. The suboperands and
their initial characters are as follows.
@table @kbd
@item " (double quote)
a string of characters. This string must be terminated by another
double quote. It may contain any characters. If a double quote is
needed within the string, it must be preceded by a backslash
(@samp{\}). If a space is needed, it must be represented by the escape
sequence \s. This suboperand is valid
in the test and action parts of the @code{correct} opcode,
in the test part of the @code{context} opcode when forward translating,
and in the action part of the @code{context} opcode when back translating.
@item @@ (at sign)
a sequence of dot patterns. Cells are separated by hyphens as usual.
This suboperand is valid in the test and action parts of
the @code{pass2}, @code{pass3}, and @code{pass4} opcodes,
in the action part of the @code{context} opcode when forward translating,
and in the test part of the @code{context} opcode when back translating.
@item ` (accent mark)
If this is the beginning of the string being translated this
suboperand is true. It is valid only in the test part and must be the
first thing in this operand.
@item ~ (tilde)
If this is the end of the string being translated this suboperand is
true. It is valid only in the test part and must be the last thing in
this operand.
@item $ (dollar sign)
a string of attributes, such as @samp{d} for digit, @samp{l} for
letter, etc. For a list of all valid attributes @pxref{valid attribute
characters}. More than one attribute can be given. If you wish to
check characters with any attribute, use the letter @samp{a}. Input
characters are checked to see if they have at least one of the
attributes. The attribute string can be followed by numbers specifying
how many characters are to be checked. If no numbers are given, 1 is
assumed. If two numbers separated by a hyphen are given, the input is
checked to make sure that at least the first number of characters with
the attributes are present, but no more than the second number. If
only one number is present, then exactly that many characters must
have the attributes. A period instead of the numbers indicates an
indefinite number of characters (for technical reasons the number of
characters that are actually matched is limited to 65535).
This suboperand is valid in all test parts but not in action parts.
For the characters which can be used in attribute strings, see the
following table.
@item ! (exclamation point)
reverses the logical meaning of the suboperand which follows. For
example, !$d is true only if the character is @emph{NOT} a digit. This
suboperand is valid in test parts only.
@item % (percent sign)
the name of a class defined by the @opcoderef{class} or the name of a
swap set defined by the swap opcodes (@pxref{Swap Opcodes}). Names
must contain only letters (a-z and A-Z). The letters may be upper or
lower-case but the case matters. Class names may be used in test parts
only. Swap names are valid everywhere.
@item @{ (left brace)
Name: the name of a grouping pair. The left brace indicates that the
first (or left) member of the pair is to be used in matching. If this
is between replacement brackets it must be the only item. This is also
valid in the action part.
The brace actions, @code{@{name} and @code{@}name}, refer to named
groupings. A grouping is created with the @opcoderef{grouping} and
contains exactly two characters which represent the opening character
and the matching closing character for a character grouping. The first
operand is the grouping name, the second is the two (opening and
closing) characters, and the third is the two dot patterns separated
by a comma.
Let's say that you'd like to define the opening and closing
parentheses via multipass rules, and that you'd like to use dots
123478 for the opening parenthesis and dots 145678 for the closing
parenthesis. One way to do so is like this:
@example
grouping parentheses () 123478,145678
noback correct @{parentheses @{parentheses
noback correct @}parentheses @}parentheses
@end example
The references within the test part of the multipass rule match
against the characters (the second operand) of the grouping rule, and
the references within the action part replace with the dot patterns
(the third operand) of the grouping.
@item @} (right brace)
Name: the name of a grouping pair. The right brace indicates that the
second (or right) member is to be used in matching. See the remarks on
the left brace immediately above.
@item / (slash)
Search the input for the expression following the slash and return
true if found. This can be used to set a variable.
@item _ (underscore)
Move backward. If a number follows, move backward that number of
characters. The default is to move backward one character. This
suboperand is valid only in test parts. The test fails if moving
backward beyond the beginning of the input string.
@item [ (left bracket)
start replacement here. This suboperand must always be paired with a
right bracket and is valid only in test parts. Multiple pairs of
square brackets in a single expression are not allowed.
@item ] (right bracket)
end replacement here. This suboperand must always be paired with a
left bracket and is valid only in test parts.
@item # (number sign or crosshatch)
test or set a variable. Variables are referred to by numbers
(0 through 49), e.g. @code{#1}, @code{#2}, @code{#25}.
Variables may be set by one @code{context} or multipass opcode and tested
by another. Thus, an operation that occurs at one place in a translation
can tell an operation that occurs later within the same pass about itself.
This feature is used in math translation, and may also help to alleviate
the need for new opcodes. This suboperand is valid everywhere.
Variables are set in the action part. To set a variable, use an
expression like @code{#1=1}. All of the variables are initialized to 0
at the start of each pass.
Variables can also be incremented and decremented by one in the action
part with expressions like @code{#1+} and @code{#3-} respectively.
An attempt to decrement a variable below 0 is silently ignored.
Variables are tested in the test part with conditional expressions like:
@code{#1=2}, @code{#3<4}, @code{#5>6}, @code{#7<=8}, @code{#9>=10}.
@item * (asterisk)
Copy the input characters or dot patterns within the replacement brackets
into the output, and discard anything else that was matched. If there are
no replacement brackets then copy all of the matched input. This
suboperand is only valid within the action part. It may be specified any
number of times. This feature is used, for example, for handling numeric
subscripts in Nemeth.
@item ? (question mark)
Valid only in the action part. The characters to be replaced are
simply ignored. That is, they are replaced with nothing. If either
member of a grouping pair is in the replace brackets the other member
at the same level is also removed.
@end table
@anchor{valid attribute characters}
The valid characters which can be used in attribute strings are as
follows:
@table @kbd
@item a
any attribute
@item d
digit
@item D
literary digit
@item l
letter
@item m
math
@item p
punctuation
@item S
sign
@item s
space
@item U
uppercase
@item u
lowercase
@item w
first user-defined class
@item x
second user-defined class
@item y
third user-defined class
@item z
fourth user-defined class
@end table
The following illustrates the algorithm how text is evaluated with
multipass expressions:
@noindent
Loop over context, pass2, pass3 and pass4 and do the following for each pass:
@enumerate a
@item
Match the text following the cursor against all expressions in the
current pass. If an expression has square brackets to indicate the
part to be replaced, and the opening bracket would correspond with a
position before the cursor, it is not a match.
@item
If there is no match: shift the cursor one position to the right and
continue the loop
@item
If there are matches: choose the longest match
@item
Do the replacement. If the expression has square brackets, the part of
the input that matches the part in between the brackets is replaced
with the right-hand side of the rule. If the expression has no square
brackets, the whole match is replaced.
@item
Place the cursor after the replaced text
@item
continue loop
@end enumerate
Normally, when a rule is applied, the characters in the input that the
rule applies to are "consumed", i.e. the position of the input string
is stepped forward, and the characters are no longer available for
subsequent rules. However, with the multipass opcodes, the
@opcoderef{context} opcode and the @opcoderef{correct} opcode, it is
possible to make rules which don't consume any characters from the
input. This could happen, e.g. if you use the @opcoderef{context}
opcode to insert a dot pattern before a special group of characters.
In these cases, Liblouis will always advance the position by one
character to make sure that the program doesn't apply a rule to the
same characters again and again.
@node The correct Opcode
@section The correct Opcode
@table @code
@opcode{correct, test action}
Because some input (such as that from an OCR program) may contain
systematic errors, it is sometimes advantageous to use a
pre-translation pass to remove them. The errors and their corrections
are specified by the @code{correct} opcode. If there are no
@code{correct} opcodes in a table, the pre-translation pass is not used.
If any back-translation corrections have been specified then they are
applied in a post-translation (i.e.@: the very last) pass.
Note that like the @opcoderef{context} and multi-pass opcodes, the
@code{correct} opcode must be preceded by @opcoderef{noback} or
@opcoderef{nofor}.
The format of the @code{correct} opcode is very similar to that
of the @opcoderef{context}. The only difference is that in the action
part strings may be used and dot patterns may not be used. Some
examples of @code{correct} opcode entries are:
@example
noback correct "\\" ? Eliminate backslashes
noback correct "cornf" "comf" fix a common "scano"
noback correct "cornm" "comm"
noback correct "cornp" "comp"
noback correct "*" ? Get rid of stray asterisks
noback correct "|" ? ditto for vertical bars
noback correct "\s?" "?" drop space before question mark
@end example
@end table
@node The match Opcode
@section The match Opcode
The match opcode is similar the multipass opcodes and can be seen as
the more low-level and powerful cousin to the @opcoderef{context}.
@strong{Note:} For historical reasons despite being fairly similar in
syntax and functionality both the @opcoderef{context} and the
@opcoderef{match} exist and are in use in modern braille tables. But
in the future they might be merged under some common opcode. For that
reason consider the match opcode @emph{somewhat experimental}.
@table @code
@opcode{match, pre-pattern characters post-pattern dots}
This opcode allows for matching a string of characters via @emph{pre}
and @emph{post patterns}. The patterns are specified using an
expression syntax somewhat like regular expressions (@pxref{pattern
expression syntax}). A single hyphen (@samp{-}) by itself means no
pattern is specified.
The following will replace @samp{xyz} with the dots
@samp{1346-13456-1356} when it appears in the string @samp{abxyzcd}.
@example
match ab xyz cd 1346-13456-1356
@end example
The following will replace @samp{ONE} with @samp{3456-1} when it
starts the input and is followed by @samp{:}
@example
match ^ ONE : 3456-1
@end example
@end table
@anchor{pattern expression syntax}
The @code{pre-pattern} and the @code{post-pattern} can contain
any of the following expressions:
@table @samp
@item [ ]
Expression can be any of the characters between the brackets. If only
one character present then the brackets are not needed unless it is a
special character, in which it should be escaped with the backslash.
@item .
Expression can be any character.
@item %[ ]
Expression is a character with the attributes listed between the
brackets. If only one character is present then the brackets are not
needed. The set of attributes are specified as follows:
@table @samp
@item _
space
@item #
digit
@item a
letter
@item u
uppercase
@item l
lowercase
@item .
punctuation
@item $
sign
@end table
@item ^
Match at the end of input processing (or beginning depending of the
direction pre or post).
@item $
Same as @samp{^}.
@end table
For example the following will replace @samp{bb} with the dots @samp{23} when it
is between letters.
@example
match %a bb %a 23
@end example
The following will replace @samp{con} with the dots @samp{25} when it
is preceded by a space or beginning of input, and followed by an
@samp{s} and then any letter.
@example
match %[^_] con s%a 25
@end example
Similar to regular expressions the pattern expressions can contain
grouping, quantifiers and even negation:
@table @samp
@item ( )
Expressions between parentheses are grouped together as one
expression.
@item !
The following expression is negated.
@item ?
The previous expression must match zero or one times.
@item *
The previous expression must match zero or more times.
@item +
The previous expression must match one or more times.
@item |
Either the previous or the following expressions must match.
@end table
For example the following will replace @samp{ing} with the dots
@samp{346} when it is @emph{not} preceded by a space or beginning of
input. What follows after the @samp{ing} does not matter, hence the
@samp{-}.
@example
match !%[^_] ing - 346
@end example
The following will replace @samp{con} with the dots @samp{25} when it
is preceded by a space, or beginning of input; then followed by a
@samp{c} that is followed by any character but @samp{h}.
@example
match %[^_] con c!h 25
@end example
@node Miscellaneous Opcodes
@section Miscellaneous Opcodes
@table @code
@opcode{include, filename}
Read the file indicated by @code{filename} and incorporate or include
its entries into the table. Included files can include other files,
which can include other files, etc. For an example, see what files are
included by the entry include @file{en-us-g1.ctb} in the table
@file{en-us-g2.ctb}. If the included file is not in the same directory
as the main table, use a full path name for filename.
@opcode{undefined, dots}
If this opcode is used in a table any characters which have not been
handled in the table but are encountered in the text will be replaced
by the dot pattern. If this opcode is not used, any undefined
characters are replaced by @code{'\xhhhh'}, where the h's are
hexadecimal digits.
@opcode{display, character dots}
Associates dot patterns with the characters which will be sent to a
braille embosser, display or screen font. The character must be in the
range 0-255 and the dots must specify a single cell. Here are some
examples:
@example
# When the character a is sent to the embosser or display,
# it will produce a dot 1.
display a 1
@end example
@example
# When the character L is sent to the display or embosser
# it will produce dots 1-2-3.
display L 123
@end example
The @code{display} opcode is optional. It is used when the embosser or
display has a different mapping of characters to dot patterns than
that given in @ref{Character-Definition Opcodes}. If used, display
entries must proceed character-definition entries.
A possible use case would be to define display opcodes so that the
result is Unicode braille for use on a display and a second set of
display opcodes (in a different file) to produce plain ASCII braille
for use with an embosser.
@opcode{multind, dots opcode opcode ...}
The @code{multind} opcode tells the back-translator that a sequence of
braille cells represents more than one braille indicator. For example,
in @file{en-us-g2.ctb} we have @code{multind 56-6 letsign capsletter}.
The back-translator can generally handle single braille indicators,
but it cannot apply them when they immediately follow each other. It
recognizes the letter sign if it is followed by a letter and takes
appropriate action. It also recognizes the capital sign if it is
followed by a letter. But when there is a letter sign followed by a
capital sign it fails to recognize the letter sign unless the sequence
has been defined with @code{multind}. A @code{multind} entry may not
contain a comment because liblouis would attempt to interpret it as an
opcode.
@end table
@node Notes on Back-Translation
@chapter Notes on Back-Translation
@anchor{General Notes}
@section General Notes
Back-translation refers to the process of translating backwards,
i.e.@: from Braille to text. For many years, Liblouis was mainly
concerned with forward translation, and so were most of the authors of
the translation tables. Today however, Liblouis is being used
extensively in conjunction with screen reading programs like NVDA and
JAWS for Windows as well as Braille note-takers like BrailleSense from
HIMS and BrailleNote from HumanWare. So when writing a translation
table for Liblouis, it is indeed relevant to consider how the table
will work when used for back-translation, if anything special must be
done, or if you want to write separate tables for forward translation
and back-translation.
Back-translation is generally harder to do in a computer program than
forward translation. Ideally, any text could be translated to Braille
and then translated back to text giving exactly the same result as the
original. However, many Braille codes omit a lot of information and
leaves it to the reader to fill in the missing bits. An example of this
is letters with accents. In languages where accents are uncommon, e.g.
English, Accented letters are usually just marked with a Braille
indicator stating that there is an accent, but not which accent, even
though this may be crucial to the meaning of the word or the sentence.
Another example of this is when not all capital letters are marked in
the Braille code, but only the "important" capital letters. A third
example is when a Braille character serves as both a punctuation sign,
a math sign, and perhaps even as a contraction, and the Braille code
then leaves it up to the reader to use his/her knowledge of the context
to decide the meaning of the Braille character.
In some cases, you may need to bend the rules of the Braille code if it
is important to create Braille that can be properly back-translated.
This may include marking all capital letters instead of just the
"important" ones, or perhaps marking a Braille character with an
indicator stating that this character should in fact be interpreted as
a math sign and not a punctuation or Braille contraction. In some
cases, the best solution may be to create two separate sets of tables
for forward translation: One set for Braille that must be
back-translatable (for use with screen readers and note-takers), and
another for good and nice literary Braille (for embossing).
But no matter how you bend the Braille code, the back-translation
process may not be perfect.
@anchor{Back-translation with Liblouis}
@section Back-translation with Liblouis
Back-translation is carried out by the function
@code{lou_backTranslateString}. Its calling sequence is described in
@ref{Programming with liblouis}. @code{lou_backTranslateString} first
performs @code{pass4}, if
present, then @code{pass3}, then @code{pass2}, then the
backtranslation, then corrections. Note that this is exactly the
inverse of forward translation.
Most opcodes can be preceded by @opcoderef{noback} or @opcoderef{nofor},
and the @code{correct}, @code{context} and multi-pass opcodes must be
preceded with either @code{noback} or @code{nofor}. So in most cases,
it will be perfectly possible to make one table for translation in both
directions, although a separate table for forward and backward
translation might be more readable in some cases.
Most of the opcodes associated with pass 1 have two operands, a
character operand to the left and a dots operand to the right. During
forward translation, these operands are used to replace the characters
with the dot pattern according to the conditions of the opcode. The
opcode works from left to right. When back-translating, these opcodes
work the opposite way. The dot patterns are replaced by the text. The
opcodes work from right to left.
On the other hand, the @code{correct}, @code{context} and multi-pass
opcodes have a test part to the left and an action part to the right.
These opcodes work from left to right in both translation directions.
The test is performed, and if true, the action is executed, i.e.@:
replacing, inserting or deleting characters or dots. This is why a
translation direction always has to be specified with these opcodes
using @code{noback} or @code{nofor}.
@node Table Metadata
@chapter Table Metadata
Translation tables may contain metadata. This makes them
discoverable. Programs may for example use the Liblouis function
@ref{lou_findTable,@code{lou_findTable}} to find a table based on a
special query of which the @ref{Query Syntax,syntax} is described
below.
@section Syntax
Metadata must be defined in special comments within the table
header. The table header is the area at the top of the file, before
the first translation rule, consisting of only comments or empty
lines. Any metadata within included tables is ignored.
A metadata field must be defined on its own line, starting with
@code{#+}. It has the following syntax:
@example
#+<key>: <value>
@end example
where @samp{<key>} and @samp{<value>} are sequences of one or more
characters @code{a} to @code{z}, @code{A} to @code{Z}, @code{0} to
@code{9}, @code{.}, @code{-}, and @code{_}. The colon that separates
the key and value may have zero or more spaces or tabs on either side.
A value is optional. In case of no value the colon must be omitted as
well:
@example
#+<key>
@end example
There is no restriction on which keys and values are allowed, as long
as the syntax is correct. However in order to be really useful there
must be some standard keys and values. A possible grammar is proposed
on the wiki page
@url{https://github.com/liblouis/liblouis/wiki/Table-discovery-based-on-table-metadata#standard-metadata-tags, Standard metadata tags}.
@anchor{Query Syntax}
@section Query Syntax
A query that is passed to the @ref{lou_findTable,@code{lou_findTable}}
function must have the following syntax:
@example
<feature1> <feature2> <feature3> ...
@end example
where @samp{<feature>} is either:
@example
<key>: <value>
@end example
or:
@example
<key>
@end example
Features are separated by one or more spaces or tabs. No spaces are
allowed around colons.
@node Testing Translation Tables interactively
@chapter Testing Translation Tables interactively
A number of test programs are provided as part of the liblouis
package. They are intended for testing liblouis and for debugging
tables. None of them is suitable for braille transcription. An
application that can be used for transcription is @command{file2brl},
which is part of the liblouisutdml package (@pxref{Top, , Introduction,
liblouisutdml, Liblouisutdml User's and Programmer's Manual}). The source
code of the test programs can be studied to learn how to use the
liblouis library and they can be used to perform the following
functions.
@anchor{common options}
All of these programs recognize the @option{--help} and
@option{--version} options.
@table @option
@item --help
@itemx -h
Print a usage message listing all available options, then exit
successfully.
@item --version
@itemx -v
Print the version number, then exit successfully.
@end table
Most test programs let you specify one or multiple tables to use.
These tables are usually found in standard locations in the file
system or local to where the command is executed. @xref{How tables are
found}, for a description on how the tables are located.
@menu
* lou_debug::
* lou_trace::
* lou_checktable::
* lou_allround::
* lou_translate (program)::
* lou_checkhyphens::
* lou_checkyaml::
@end menu
@node lou_debug
@section lou_debug
@pindex lou_debug
The @command{lou_debug} tool is intended for debugging liblouis
translation tables. The command line for @command{lou_debug} is:
@example
lou_debug [OPTIONS] TABLE[,TABLE,...]
@end example
The command line options that are accepted by @command{lou_debug} are
described in @ref{common options}.
The table (or comma-separated list of tables) is compiled. If no
errors are found a brief command summary is printed, then the prompt
@samp{Command:}. You can then input one of the command letters and get
output, as described below.
Most of the commands print information in the various arrays of
@code{TranslationTableHeader}. Since these arrays are pointers to
chains of hashed items, the commands first print the hash number, then
the first item, then the next item chained to it, and so on. After
each item there is a prompt indicated by @samp{=>}. You can then press
enter (@kbd{@key{RET}}) to see the next item in the chain or the first
item in the next chain. Or you can press @kbd{h} (for next-(h)ash) to
skip to the next hash chain. You can also press @kbd{e} to exit the
command and go back to the @samp{command:} prompt.
@table @kbd
@item h
Brings up a screen of somewhat more extensive help.
@item f
Display the first forward-translation rule in the first non-empty hash
bucket. The number of the bucket is displayed at the beginning of the
chain. Each rule is identified by the word @samp{Rule:}. The fields
are displayed by phrases consisting of the name of the field, an equal
sign, and its value. The before and after fields are displayed only if
they are nonzero. Special opcodes such as the @opcoderef{correct} and
the multipass opcodes are shown with the code that instructs the
virtual machine that interprets them. If you want to see only the
rules for a particular character string you can type @kbd{p} at the
@samp{command:} prompt. This will take you to the @samp{particular:}
prompt, where you can press @kbd{f} and then type in the string. The
whole hash chain containing the string will be displayed.
@item b
Display back-translation rules. This display is very similar to that
of forward translation rules except that the dot pattern is displayed
before the character string.
@item c
Display character definitions, again within their hash chains.
@item d
Displays single-cell dot definitions. If a character-definition opcode
gives a multi-cell dot pattern, it is displayed among the
back-translation rules.
@item C
Display the character-to-dots map. This is set up by the
character-definition opcodes and can also be influenced by the
@opcoderef{display}.
@item D
Display the dot to character map, which shows which single-cell dot
patterns map to which characters.
@item z
Show the multi-cell dot patterns which have been assigned to the
characters from 0 to 255 to comply with computer braille codes such as
a 6-dot code. Note that the character-definition opcodes should use
8-dot computer braille.
@item p
Bring up a secondary (@samp{particular:}) prompt from which you can
examine particular character strings, dot patterns, etc. The commands
(given in its own command summary) are very similar to those of the
main @samp{command:} prompt, but you can type a character string or
dot pattern. They include @kbd{h}, @kbd{f}, @kbd{b}, @kbd{c}, @kbd{d},
@kbd{C}, @kbd{D}, @kbd{z} and @kbd{x} (to exit this prompt), but not
@kbd{p}, @kbd{i} and @kbd{m}.
@item i
Show braille indicators. This shows the dot patterns for various
opcodes such as the @opcoderef{capsletter} and the @opcoderef{numsign}.
It also shows emphasis dot patterns, such as those for the
@opcoderef{begemphword}, the @opcoderef{begemphphrase}, etc. If a
given opcode has not been used nothing is printed for it.
@item m
Display various miscellaneous information about the table, such as the
number of passes, whether certain opcodes have been used, and whether
there is a hyphenation table.
@item q
Exit the program.
@end table
@node lou_trace
@section lou_trace
@pindex lou_trace
When working on translation tables it is sometimes useful to determine
what rules were applied when translating a string. @command{lou_trace}
helps with exactly that. It list all the the applied rules for a given
translation table and an input string.
@example
lou_trace [OPTIONS] TABLE[,TABLE,...]
@end example
Aside from the standard options (@pxref{common options})
@command{lou_trace} also accepts the following options:
@table @option
@item --forward
@itemx -f
Trace a forward translation.
@item --backward
@itemx -b
Trace a backward translation.
@end table
If no options are given forward translation is assumed.
Once started you can type an input string followed by @kbd{@key{RET}}.
@command{lou_trace} will print the braille translation followed by
list of rules that were applied to produce the translation. A possible
invocation is listed in the following example:
@example
$ lou_trace tables/en-us-g2.ctb
the u.s. postal service
! u4s4 po/al s@}vice
1. largesign the 2346
2. repeated 0
3. lowercase u 136
4. punctuation . 46
5. context _$l["."]$l @@256
6. lowercase s 234
7. postpunc . 256
8. repeated 0
9. begword post 1234-135-34
10. largesign a 1
11. lowercase l 123
12. repeated 0
13. lowercase s 234
14. always er 12456
15. lowercase v 1236
16. lowercase i 24
17. lowercase c 14
18. lowercase e 15
19. pass2 $s1-10 @@0
20. pass2 $s1-10 @@0
21. pass2 $s1-10 @@0
@end example
@node lou_checktable
@section lou_checktable
@pindex lou_checktable
To use this program type the following:
@example
lou_checktable [OPTIONS] TABLE
@end example
Aside from the standard options (@pxref{common options})
@command{lou_checktable} also accepts the following options:
@table @option
@item --quiet
@itemx -q
Do not write to standard error if there are no errors.
@end table
If the table contains errors, appropriate messages will be displayed.
If there are no errors the message @samp{no errors found.} will be
shown.
@node lou_allround
@section lou_allround
@pindex lou_allround
This program tests every capability of the liblouis library. It is
completely interactive. Invoke it as follows:
@example
lou_allround [OPTIONS]
@end example
The command line options that are accepted by @command{lou_allround}
are described in @ref{common options}.
You will see a few lines telling you how to use the program. Pressing
one of the letters in parentheses and then enter will take you to a
message asking for more information or for the answer to a yes/no
question. Typing the letter @samp{r} and then @key{RET} will take you
to a screen where you can enter a line to be processed by the library
and then view the results.
@node lou_translate (program)
@section lou_translate
@pindex lou_translate
This program translates whatever is on the standard input unit and
prints it on the standard output unit. It is intended for large-scale
testing of the accuracy of translation and back-translation. The
command line for @command{lou_translate} is:
@example
lou_translate [OPTION] TABLE[,TABLE,...]
@end example
Aside from the standard options (@pxref{common options}) this program
also accepts the following options:
@table @option
@item --forward
@itemx -f
Do a forward translation.
@item --backward
@itemx -b
Do a backward translation.
@end table
If no options are given forward translation is assumed.
Use the following command to do a forward translation with translation
table @file{en-us-g2.ctb}. The resulting braille is ASCII encoded (as
defined in @file{en-us-g2.ctb}).
@example
lou_translate --forward en-us-g2.ctb < input.txt
@end example
The next example illustrates a forward translation with translation
table @file{en-us-g2.ctb} and display table @file{unicode.dis}. The
resulting braille is encoded as Unicode dot patterns (as defined in
@file{unicode.dis}).
@example
lou_translate --forward unicode.dis,en-us-g2.ctb < input.txt
@end example
Use a pipe if you would rather just pass some given text to the
translator.
@example
echo "The quick brown fox jumps over the lazy dog" | lou_translate -f unicode.dis,en-us-g2.ctb
@end example
The result will be written to standard output:
@example
⠠⠮ ⠟⠅ ⠃⠗⠪⠝ ⠋⠕⠭ ⠚⠥⠍⠏⠎ ⠕⠧⠻ ⠮ ⠇⠁⠵⠽ ⠙⠕⠛
@end example
Backward translation can be done as follows:
@example
echo ",! qk br@{n fox jumps ov@} ! lazy dog" | lou_translate --backward en-us-g2.ctb
@end example
which results in
@example
The quick brown fox jumps over the lazy dog
@end example
You can also do a backward translation using Unicode dot patterns
@example
echo "⠠⠮ ⠟⠅ ⠃⠗⠪⠝ ⠋⠕⠭" | lou_translate --backward unicode.dis,en-us-g2.ctb
@end example
resulting in
@example
The quick brown fox
@end example
@node lou_checkhyphens
@section lou_checkhyphens
@pindex lou_checkhyphens
This program checks the accuracy of hyphenation in Braille translation
for both translated and untranslated words. It is completely
interactive. Invoke it as follows:
@example
lou_checkhyphens [OPTIONS]
@end example
The command line options that are accepted by
@command{lou_checkhyphens} are described in @ref{common options}.
You will see a few lines telling you how to use the program.
@node lou_checkyaml
@section lou_checkyaml
@pindex lou_checkyaml
This program tests a liblouis table against a corpus of known good
Braille translations defined in YAML format. For a description of the
format refer to @ref{YAML Tests}. The program returns 0 if all tests
pass or 1 if any of the tests fail. If @code{libyaml} is not installed
the program will simply skip all tests. Invoke it as follows:
@example
lou_checkyaml YAML_TEST_FILE
@end example
The command line options that are accepted by
@command{lou_checkyaml} are described in @ref{common options}.
@cindex Running YAML tests manually
@cindex Running individual YAML tests
Due to some technical limitations the YAML tests work best if the
@env{LOUIS_TABLEPATH} is set up correctly. By running @command{make}
this is all taken care for you. You can also run individual YAML tests
as shown in the following example:
@example
cd tests
make check TESTS=yaml/en-ueb-g2_backward.yaml
@end example
@node Automated Testing of Translation Tables
@chapter Automated Testing of Translation Tables
There are a number of automated tests for liblouis and they are
proving to be of tremendous value. When changing the code the
developers can run the tests to see if anything broke.
The easiest way to test the translation tables is to write a YAML file
where you define the table that is to be tested and any number of
words or phrases to translate together with their respective expected
translation.
The YAML tests are data driven, i.e.@: you give the test data, a string
to translate and the expected output. The data is in a standard format
namely YAML. If you have @file{libyaml} installed they will
automatically be invoked as part of the standard @command{make check}
command.
@anchor{YAML Tests}
@section YAML Tests
@url{http://yaml.org/,YAML} is a human readable data serialization
format that allows for an easy and compact way to define tests.
A YAML file first defines which tables are to be used for the tests.
Then it optionally defines flags such as the @samp{testmode}. Finally
all the tests are defined.
You can repeat the cycle as many times as you like (tables, optional
flags, tests). You can also define several rounds of tests for any
table, with or without the optional flags. Just remember that the
flags are reset to their default values each time you start a new
round of tests or load a new set of tables.
Let's just look at a simple example how tests could be defined:
@iftex
@emph{(For technical reasons the Unicode braille in the expected
translation in the following YAML examples is not displayed correctly.
Please refer to the example YAML file @file{example_test.yaml} in the
@file{tests} directory of the source distribution or read these
examples in another version of the documentation such as HTML)}
@end iftex
@example
# comments start with '#' anywhere on a line
# first define which tables will be used for your tests
table: [unicode.dis, en-ueb-g1.ctb]
# then optionally define flags such as testmode. If no flags are
# defined forward translation is assumed
# now define the tests
tests:
- # each test is a list.
# The first item is the string to translate. Quoting of strings is
# optional
- hello
# The second item is the expected translation
- ⠓⠑⠇⠇⠕
- # optionally you can define additional parameters in a third
# item such as typeform or expected failure, etc
- Hello
- ⠨⠶⠠⠓⠑⠇⠇⠕⠨⠄
- @{typeform: @{italic: '++++ '@}, xfail: true@}
- # a simple, no-frills test
- Good bye
- ⠠⠛⠕⠕⠙ ⠃⠽⠑
# same as above using "flow style" notation
- [Good bye, ⠠⠛⠕⠕⠙ ⠃⠽⠑]
@end example
The four basic components of a test file are as follows:
@table @samp
@item table
A list containing table files, which the tests should be run against.
This is usually just one file, but for some situations more than one
file can be required. For example:
@example
table: [hu-hu-g1.ctb, hyph_hu_HU.dic]
@end example
It is also possible to specify a table inline. @ref{Inline definition
of tables} below explains how to do this.
A third way to specify a table is by its metadata. A table query,
which is essentially as list of ``features'', is matched against the
@ref{Table Metadata,table metadata} defined inside the tables
contained in @env{LOUIS_TABLEPATH}. Only the best match is used for
the test.
The syntax of the query is a variation of the @ref{Query
Syntax,syntax} used for the @ref{lou_findTable,@code{lou_findTable}}
function:
@example
table:
locale: fr
grade: 1
@end example
@item display
A display table, which should be used to encode braille in the
test. This item is optional. If it is present it should be the first
item of the file. If it is not present, the braille encoding of each
test is determined by the table that is being tested.
The next example shows how to test the @file{en-ueb-g1.ctb} table
using ASCII notation (as defined in @file{en-ueb-g1.ctb} itself):
@example
table: [en-ueb-g1.ctb]
@end example
If you wanted to test the @file{en-ueb-g1.ctb} table using Unicode dot
patterns then you would use the following definition:
@example
display: unicode.dis
table: [en-ueb-g1.ctb]
@end example
@item flags
The flags that apply for all tests in this file. At the moment only
the @samp{testmode} flag is supported. It can have four possible
values:
@table @samp
@item forward
This indicates that the tests are for forward translation
@item backward
This indicates that the tests are for backward translation
@item bothDirections
This indicates that the tests are for both forward and backward translation.
@item hyphenate
This indicates that the tests are for hyphenation
@item hyphenateBraille
This indicates that the tests are for hyphenation and the input is braille
@end table
If no flags are defined forward translation is assumed.
@item tests
A list of tests. Each test consists of a list of two, three or in some
cases even four items. The first item is the unicode text to be
tested. The second item is the expected braille output. This can be
either unicode braille or an ASCII-braille like encoding. Quoting
strings is optional. Comments can be inserted almost anywhere using
the @samp{#} sign. A simple test would look at follows:
@example
- # a simple, no-frills test
- Good bye
- ⠠⠛⠕⠕⠙ ⠃⠽⠑
@end example
Using the more compact ``flow style'' notation it would look like the
following:
@example
- [Good bye, ⠠⠛⠕⠕⠙ ⠃⠽⠑]
@end example
An optional third item can contain additional options for a test such
as the typeform, or whether a test is expected to fail. The following
shows a typical example:
@example
-
- Hello
- ⠨⠶⠠⠓⠑⠇⠇⠕⠨⠄
- @{typeform: @{italic: '++++ '@}, xfail: true@}
# same test more compact
- [Hello, ⠨⠶⠠⠓⠑⠇⠇⠕⠨⠄, @{typeform: @{italic: '++++ '@}, xfail: true@}]
@end example
The valid additional options for a test are as follows:
@table @samp
@item xfail
Whether a test is expected to fail. If you expect a test to fail, set
this to @samp{true}. If you prefer you can also specify a reason for
the failure:
@example
- [Hello, ⠨, @{xfail: Test case is not complete@}]
@end example
If you expect a test case to pass then just don't mark it with
@samp{xfail} or if you really have to, set @samp{xfail} to
@samp{false} or @samp{off}.
@item typeform
The typeform used for a translation. It consists of one or more
emphasis specifications. For each character in the specifications that
is not a space the corresponding emphasis will be set. Valid options
for emphasis are @samp{italic}, @samp{underline}, @samp{bold},
@samp{computer_braille}, @samp{passage_break}, @samp{word_reset},
@samp{script}, @samp{trans_note}, @samp{trans_note_1},
@samp{trans_note_2}, @samp{trans_note_3}, @samp{trans_note_4} or
@samp{trans_note_5}. The following shows an example where both
@samp{italic} and @samp{underline} are specified:
@example
-
- Hello
- ⠨⠶⠠⠓⠑⠇⠇⠕⠨⠄
- typeform:
italic: '++++ '
underline: ' +'
@end example
@item inputPos
A list of 0-based input positions, one for each output position.
Useful when simulating screen reader interaction, to debug contraction
and cursor behavior as in the following example. Note that all
positions in this and the following examples start at 0. Also note
that in these examples the additional options are not passed using the
``flow style'' notation.
@example
-
- went
- ⠺⠢⠞
- inputPos: [0,1,3]
@end example
@item outputPos
A list of 0-based output positions, one for each input position. Useful when
simulating screen reader interaction, to debug contraction and cursor
behavior as in the following example.
@example
-
- went
- ⠺⠢⠞
- outputPos: [0,1,1,2]
@end example
@item cursorPos
The cursor position for the given translation and optionally an
expected cursor position where the cursor is supposed to be after the
translation. Useful when simulating screen reader interaction, to
debug contraction and cursor behavior:
The cursor position can take two forms: You can either specify a
single number or alternatively you can give a tuple of two numbers.
@table @asis
@item single number (e.g. @samp{4})
When you simply want to specify the cursor position for the given
translation you pass a number as in the following example:
@example
-
- you went to
- ⠽ ⠺⠑⠝⠞ ⠞⠕
- mode: [compbrlAtCursor]
cursorPos: 4
@end example
@item a tuple (e.g. @samp{[4,2]})
When you expect the cursor to be in a particular position after the
translation and you want to check this then pass a tuple of cursor
positions as in the following example:
@example
-
- you went to
- ⠽ ⠺⠑⠝⠞ ⠞⠕
- mode: [compbrlAtCursor]
cursorPos: [4,2]
@end example
@end table
@item mode
A list of translation modes that should be used for this test. If not
defined defaults to 0. Valid mode values are @samp{noContractions},
@samp{compbrlAtCursor}, @samp{dotsIO}, @samp{compbrlLeftCursor},
@samp{ucBrl}, @samp{noUndefined} or @samp{partialTrans}.
For a description of the various translation mode flags, please see
the function @ref{lou_translateString}.
@item maxOutputLength
Define a maximum length of the output. This can be used to test the
behavior of liblouis in the face of a limited output buffer, for
example the length of the refreshable braille display.
@end table
@end table
@subsection Optional test description
When a test contains three or four items the first item is assumed to
be a test description, the second item is the unicode text to be
tested and the third item is the expected braille output. Again an
optional fourth item can contain additional options for the test. The
following shows an example:
@example
-
- Number-text-transitions with italic
- 123abc
- ⠼⠁⠃⠉⠨⠶⠰⠁⠃⠉⠨⠄
- @{typeform: '000111'@}
@end example
In case the test fails the description will be printed together with
the expected and the actual braille output.
For more examples and inspiration please see the YAML tests
(@file{*.yaml}) in the @file{tests} directory of the source
distribution.
@subsection Testing multiple tables within the same YAML test file
Sometimes you are more focused on testing a particular feature across
several tables rather than just testing one table. For that reason the
following is also allowed:
@example
table: ...
tests:
- [..., ...]
- [..., ...]
table: ...
tests:
- [..., ...]
- [..., ...]
@end example
If you specify flags for the tests, remember that the flags are reset
to their default values when you specify a new table.
@subsection Multiple test sections for each table
You can specify several sections of tests for each table, with or
without the optional flags. This is useful e.g. if you want to have
various tests for both forward and backward translation for the same
set of tables, especially if you are defining the table as part of
the yaml file (see next section). This feature is also useful if you
simply want to devide your tests into multiple sections for better
overview. All flags are reset to their default values when you start
a new test section.
Thus, a yaml file might look as follows:
@example
table: ...
tests:
- [..., ...]
- [..., ...]
# Some more tests
tests:
- [..., ...]
- [..., ...]
# Some tests for back-translation - same table
flags: @{testmode: backward@}
- [..., ...]
- [..., ...]
@end example
@anchor{Inline definition of tables}
@subsection Inline definition of tables
When testing very specific opcode combinations it is sometimes tedious
to create specific test tables just for that. Hence the YAML tests
allow for specification of table definitions inline. Instead of
referring to a table by name you just define the table inline by using
what the YAML spec calls a
@url{http://www.yaml.org/spec/1.2/spec.html#id2795688,Literal Style
Block}. Start the definition with a @samp{|}, then list the opcodes
with an indentation. The inline table ends when the indentation ends.
@example
table: |
sign a 1
...
tests:
- ...
- ...
@end example
@subsection Running the same test data on multiple tables
Sometimes you maintain multiple tables which are very similar and
basically contain the same test data. Instead of copying the YAML test
and changing the table name you can also define multiple tables. This
will cause the YAML tests to be checked against both tables.
@example
table: nl-NL
table: nl-BE
tests:
- [..., ...]
- [..., ...]
@end example
@node Programming with liblouis
@chapter Programming with liblouis
@menu
* Overview (library)::
* Data structure of liblouis tables::
* How tables are found::
* Deprecation of the logging system::
* lou_version::
* lou_translateString::
* lou_translate::
* lou_backTranslateString::
* lou_backTranslate::
* lou_hyphenate::
* lou_compileString::
* lou_getTypeformForEmphClass::
* lou_dotsToChar::
* lou_charToDots::
* lou_registerLogCallback::
* lou_setLogLevel::
* lou_logFile::
* lou_logPrint::
* lou_logEnd::
* lou_setDataPath::
* lou_getDataPath::
* lou_getTable::
* lou_findTable::
* lou_indexTables::
* lou_checkTable::
* lou_readCharFromFile::
* lou_free::
* lou_charSize::
* Python bindings::
@end menu
@node Overview (library)
@section Overview
You use the liblouis library by calling the following functions,
@code{lou_translateString}, @code{lou_backTranslateString},
@code{lou_translate}, @code{lou_backTranslate},
@code{lou_registerLogCallback}, @code{lou_setLogLevel},
@code{lou_logFile}, @code{lou_logPrint}, @code{lou_logEnd},
@code{lou_getTable}, @code{lou_findTable}, @code{lou_indexTables},
@code{lou_checkTable}, @code{lou_hyphenate}, @code{lou_charToDots},
@code{lou_dotsToChar}, @code{lou_compileString},
@code{lou_getTypeformForEmphClass}, @code{lou_readCharFromFile},
@code{lou_version}, @code{lou_free} and @code{lou_charSize}. These are
described below. The header file, @file{liblouis.h}, also contains
brief descriptions. Liblouis is written in straight C. It has four
code modules, @file{compileTranslationTable.c}, @file{logging.c},
@file{lou_translateString.c} and @file{lou_backTranslateString.c}. In
addition, there are two header files, @file{liblouis.h}, which defines
the API, and @file{louis.h}, used only internally and by
liblouisutdml. The latter includes @file{liblouis.h}.
Persons who wish to use liblouis from Python may want to skip ahead to
@ref{Python bindings}.
@file{compileTranslationTable.c} keeps track of all translation tables
which an application has used. It is called by the translation,
hyphenation and checking functions when they start. If a table has not
yet been compiled @file{compileTranslationTable.c} checks it for
correctness and compiles it into an efficient internal representation.
The main entry point is @code{lou_getTable}. Since it is the module
that keeps track of memory usage, it also contains the @code{lou_free}
function. In addition, it contains the @code{lou_checkTable} function,
plus some utility functions which are used by the other modules.
By default, liblouis handles all characters internally as 16-bit
unsigned integers. It can be compiled for 32-bit characters as
explained below. The meanings of these integers are not hard-coded.
Rather they are defined by the character-definition opcodes. However,
the standard printable characters, from decimal 32 to 126 are
recognized for the purpose of processing the opcodes. Hence, the
following definition is included in @file{liblouis.h}. It is correct
for computers with at least 32-bit processors.
@example
typedef unsigned short int widechar
@end example
To make liblouis handle 32-bit Unicode simply remove the word
@code{short} in the above @code{typedef}. This will cause the translate and
back-translate functions to expect input in 32-bit form and to deliver
their output in this form. The input to the compiler (tables) is
unaffected except that two new escape sequences for 20-bit and 32-bit
characters are recognized.
At runtime, the width of a character specified during compilation may
be obtained using @code{lou_charSize}.
Here are the definitions of the eleven liblouis functions and their
parameters. They are given in terms of 16-bit Unicode. If liblouis has
been compiled for 32-bit Unicode simply read 32 instead of 16.
@node Data structure of liblouis tables
@section Data structure of liblouis tables
The data structure @code{TranslationTableHeader} is defined by a
@code{typedef} statement in @file{louis.h}. To find the beginning,
search for the word @samp{header}. As its name implies, this is
actually the table header. Data are placed in the @code{ruleArea}
array, which is the last item defined in this structure. This array is
declared with a length of 1 and is expanded as needed. The table
header consists mostly of arrays of pointers of size @code{HASHNUM}.
These pointers are actually offsets into @code{ruleArea} and point to
chains of items which have been placed in the same hash bucket by a
simple hashing algorithm. @code{HASHNUM} should be a prime and is
currently 1123. The structure of the table was chosen to optimize
speed rather than memory usage.
The first part of the table contains miscellaneous information, such
as the number of passes and whether various opcodes have been used. It
also contains the amount of memory allocated to the table and the
amount actually used.
The next section contains pointers to various braille indicators and
begins with @code{capitalSign}. The rules pointed to contain the
dot pattern for the indicator and an opcode which is used by the
back-translator but does not appear in the list of opcodes. The
braille indicators also include various kinds of emphasis, such as
italic and bold and information about the length of emphasized
phrases. The latter is contained directly in the table item instead of
in a rule.
After the braille indicators comes information about when a letter
sign should be used.
Next is an array of size @code{HASHNUM} which points to character
definitions. These are created by the character-definition opcodes.
Following this is a similar array pointing to definitions of
single-cell dot patterns. This is also created from the
character-definition opcodes. If a character definition contains a
multi-cell dot pattern this is compiled into ordinary forward and
backward rules. If such a multi-cell dot pattern contains a single
cell which has not previously been defined that cell is placed in this
array, but is given the attribute @code{space}.
Next come arrays that map characters to single-cell dot patterns and
dots to characters. These are created from both character-definition
opcodes and display opcodes.
Next is an array of size 256 which maps characters in this range to
dot patterns which may consist of multiple cells. It is used, for
example, to map @samp{@{} to dots 456-246. These mappings are created
@c FIXME: the compdots opcode should be documented
@c by the @opcoderef{compdots}
by the @code{compdots}
or the @opcoderef{comp6}.
Next are two small arrays that held pointers to chains of rules
produced by the @opcoderef{swapcd} and the @opcoderef{swapdd} and by
some multipass, @code{context} and @code{correct} opcodes.
Now we get to an array of size @code{HASHNUM} which points to chains
of rules for forward translation.
Following this is a similar array for back-translation.
Finally is the @code{ruleArea}, an array of variable size to which
various structures are mapped and to which almost everything else
points.
@node How tables are found
@section How tables are found
@cindex Table search path
@cindex LOUIS_TABLEPATH
liblouis knows where to find all the tables that have been distributed
with it. So you can just give a table name such as @code{en-us-g2.ctb}
and liblouis will load it. You can also give a table name which
includes a path. If this is the first table in a list, all the tables
in the list must be on the same path. You can specify a path on which
liblouis will look for table names by setting the environment variable
@env{LOUIS_TABLEPATH}. This environment variable can contain one or
more paths separated by commas. On receiving a table name liblouis
first checks to see if it can be found on any of these paths. If not,
it then checks to see if it can be found in the current directory, or,
if the first (or only) name in a table list, if it contains a
path name, can be found on that path. If not, it checks to see if it
can be found on the path where the distributed tables have been
installed. If a table has already been loaded and compiled this
path-checking is skipped.
@node Deprecation of the logging system
@section Deprecation of the logging system
As of version 2.6.0 @code{lou_logFile}, @code{lou_logPrint} and
@code{lou_logEnd} are deprecated. They are replaced by a more powerful,
abstract API consisting of @code{lou_registerLogCallback} and
@code{lou_setLogLevel}.
Usage of @code{lou_logFile}, @code{lou_logPrint} and @code{lou_logEnd} is
discouraged as they may not be part of future releases. Applications using
Liblouis should implement their own logging system.
During the transitional phase, @code{lou_logPrint} is registered as default
callback in @code{lou_registerLogCallback}. @code{lou_logPrint} is overwritten
by the first call to @code{lou_registerLogCallback} and reattached when
@code{NULL} is set as callback. Note that calling @code{lou_logPrint} directly
will not cause an invocation of the registered callback.
@node lou_version
@section lou_version
@findex lou_version
@example
char *lou_version ()
@end example
This function returns a pointer to a character string containing the
version of liblouis, plus other information, such as the release date
and perhaps notable changes.
@node lou_translateString
@section lou_translateString
@findex lou_translateString
@example
int lou_translateString(
const char *tableList,
const widechar *inbuf,
int *inlen,
widechar *outbuf,
int *outlen,
formtype *typeform,
char *spacing,
int mode);
@end example
This function takes a string of 16-bit Unicode characters in
@code{inbuf} and translates it into a string of 16-bit characters in
@code{outbuf}. Each 16-bit character produces a particular dot pattern
in one braille cell when sent to an embosser or braille display or to
a screen type font. Which 16-bit character represents which dot pattern
is indicated by the character-definition and display opcodes in the
translation table.
@anchor{translation-tables}
The @code{tableList} parameter points to a list of translation tables
separated by commas. @xref{How tables are found}, for a description on
how the tables are located in the file system. If only one table is
given, no comma should be used after it. It is these tables which
control just how the translation is made, whether in Grade 2, Grade 1,
or something else.
The tables in a list are all compiled into the same internal table.
The list is then regarded as the name of this table. As explained in
@ref{How to Write Translation Tables}, each table is a file which may
be plain text, big-endian Unicode or little-endian Unicode. A table
(or list of tables) is compiled into an internal representation the
first time it is used. Liblouis keeps track of which tables have been
compiled. For this reason, it is essential to call the @code{lou_free}
function at the end of your application to avoid memory leaks. Do
@emph{NOT} call @code{lou_free} after each translation. This will
force liblouis to compile the translation tables each time they are
used, leading to great inefficiency.
Note that both the @code{*inlen} and @code{*outlen} parameters are
pointers to integers. When the function is called, these integers
contain the maximum input and output lengths, respectively. When it
returns, they are set to the actual lengths used.
The @code{typeform} parameter is used to indicate italic type,
boldface type, computer braille, etc. It is an array of @code{formtype}
with the same length as the input buffer pointed to by @code{*inbuf}.
However, it is used to pass back character-by-character results, so
enough space must be provided to match the @code{*outlen} parameter.
Each element indicates the typeform of the corresponding character
in the input buffer. The values and their meaning can be consulted in the
@code{typeforms} enum in @file{liblouis.h}. These values can be
added for multiple emphasis. If this parameter is @code{NULL}, no
checking for type forms is done. In addition, if this parameter is not
@code{NULL}, it is set on return to have an 8 at every position
corresponding to a character in @code{outbuf} which was defined to
have a dot representation containing dot 7, dot 8 or both, and to 0
otherwise.
The @code{spacing} parameter is used to indicate differences in
spacing between the input string and the translated output string. It
is also of the same length as the string pointed to by @code{*inbuf}.
If this parameter is @code{NULL}, no spacing information is computed.
The @code{mode} parameter specifies how the translation should be
done. The valid values of mode are defined in @file{liblouis.h}. They
are all powers of 2, so that a combined mode can be specified by
adding up different values.
Note that the @code{mode} parameter is an integer, not a pointer to
an integer.
A combination of the following mode flags can be used with the
@code{lou_translateString} function:
@table @code
@item compbrlAtCursor
If this bit is set in the @code{mode} parameter the space-bounded
characters containing the cursor will be translated in computer
braille.
@item compbrlLeftCursor
If this bit is set, only the characters to the left of the cursor will
be in computer braille. This bit overrides @code{compbrlAtCursor}.
@item dotsIO
When this bit is set, during forward translation, Liblouis will produce
output as dot patterns. During back-translation Liblouis accepts input
as dot patterns. Note that the produced dot patterns are affected if
you have any @opcoderef{display} defined in any of your tables.
@item ucBrl
The @code{ucBrl} (Unicode Braille) bit is used by the functions
@code{lou_charToDots} and @code{lou_translate}. It causes the dot
patterns to be Unicode Braille rather than the liblouis representation.
Note that you will not notice any change when setting @code{ucBrl}
unless @code{dotsIO} is also set. @code{lou_dotsToChar} and
@code{lou_backTranslate} recognize Unicode braille automatically.
@item partialTrans
This flag specifies that back-translation input should be treated as an
incomplete word. Rules that apply only for complete words or at the end
of a word will not take effect. This is intended to be used when
translating input typed on a braille keyboard to provide a rough idea
to the user of the characters they are typing before the word is
complete.
@item noUndefined
Setting this bit disables the output of hexadecimal values when
forward-translating undefined characters (characters that are not
matched by any rule), and dot numbers when back-translating undefined
braille patterns (braille patterns that are not matched by any
rule). The default is for liblouis to output the hexadecimal value (as
'\xhhhh') of an undefined character when forward-translating and the
dot numbers (as \ddd/) of an undefined braille pattern when
back-translating.
When back translating input from a braille keyboard cell by cell, it
is desirable to output characters as soon as they are
produced. Similarly, when back translating contracted braille, it is
desirable to provide a "guess" to the user of the characters they
typed. To achieve this, liblouis needs to have the ability to produce
no text when indicators (which don't produce a character by
themselves) are not followed by another cell. This works automatically
for indicators liblouis knows about such as capital sign, number sign,
etc., but it does not work for indicators which are not (and cannot
be) specifically defined as indicators. For example, in UEB, dots 4 5
6 alone produces the text "\456/". Setting the noUndefined mode
suppresses this dot number output.
@end table
The function returns 1 if no errors were encountered@footnote{When the
output buffer is not big enough, @code{lou_translateString} returns a
partial translation that is more or less accurate up until the
returned @code{inlen}/@code{outlen}, and treats it as a successful
translation, i.e.@: also returns 1.} and 0 otherwise.
@node lou_translate
@section lou_translate
@findex lou_translate
@example
int lou_translate(
const char *tableList,
const widechar *inbuf,
int *inlen,
widechar *outbuf,
int *outlen,
formtype *typeform,
char *spacing,
int *outputPos,
int *inputPos,
int *cursorPos,
int mode);
@end example
This function adds the parameters @code{outputPos}, @code{inputPos}
and @code{cursorPos}, to facilitate use in screen reader programs. The
@code{outputPos} parameter must point to an array of integers with at
least @code{inlen} elements. On return, this array will contain the
position in @code{outbuf} corresponding to each input position.
Similarly, @code{inputPos} must point to an array of integers of at
least @code{outlen} elements. On return, this array will contain the
position in @code{inbuf} corresponding to each position in
@code{outbuf}. @code{cursorPos} must point to an integer containing
the position of the cursor in the input. On return, it will contain
the cursor position in the output. Any parameter after @code{outlen}
may be @code{NULL}. In this case, the actions corresponding to it will
not be carried out.
For a description of all other parameters, please see
@ref{lou_translateString}.
@node lou_backTranslateString
@section lou_backTranslateString
@findex lou_backTranslateString
@example
int lou_backTranslateString(
const char *tableList,
const widechar *inbuf,
int *inlen,
widechar *outbuf,
int *outlen,
formtype *typeform,
char *spacing,
int mode);
@end example
This is exactly the opposite of @code{lou_translateString}.
@code{inbuf} is a string of 16-bit Unicode characters representing
braille. @code{outbuf} will contain a string of 16--bit Unicode
characters. @code{typeform} will indicate any emphasis found in the
input string, while @code{spacing} will indicate any differences in
spacing between the input and output strings. The @code{typeform} and
@code{spacing} parameters may be @code{NULL} if this information is
not needed. @code{mode} specifies how the back-translation
should be done.
By default, if a dot pattern in the input is undefined
then its dot numbers will be included in the output (as \ddd/).
This does not occur if the @code{noUndefined} mode is set;
an undefined dot pattern simply produces no output.
The @code{partialTrans} mode specifies that the input should be
treated as an incomplete word. That is, rules that apply only for
complete words or at the end of a word will not take effect. This is
intended to be used when translating input typed on a braille keyboard
to provide a rough idea to the user of the characters they are typing
before the word is complete.
@node lou_backTranslate
@section lou_backTranslate
@findex lou_backTranslate
@example
int lou_backTranslate(
const char *tableList,
const widechar *inbuf,
int *inlen,
widechar *outbuf,
int *outlen,
formtype *typeform,
char *spacing,
int *outputPos,
int *inputPos,
int *cursorPos,
int mode);
@end example
This function is exactly the inverse of @code{lou_translate}.
@node lou_hyphenate
@section lou_hyphenate
@findex lou_hyphenate
@example
int lou_hyphenate (
const char *tableList,
const widechar *inbuf,
int inlen,
char *hyphens,
int mode);
@end example
This function looks at the characters in @code{inbuf} and if it finds
a sequence of letters attempts to hyphenate it as a word. Note that
lou_hyphenate operates on single words only, and spaces or punctuation
marks between letters are not allowed. Leading and trailing
punctuation marks are ignored. The table named by the @code{tableList}
parameter must contain a hyphenation table. If it does not, the
function does nothing. @code{inlen} is the length of the character
string in @code{inbuf}. @code{hyphens} is an array of characters and
must be of size @code{inlen} + 1 (to account for the NULL terminator).
If hyphenation is successful it will have a 1 at the beginning of each
syllable and a 0 elsewhere. If the @code{mode} parameter is 0
@code{inbuf} is assumed to contain untranslated characters. Any
nonzero value means that @code{inbuf} contains a translation. In this
case, it is back-translated, hyphenation is performed, and it is
re-translated so that the hyphens can be placed correctly. The
@code{lou_translate} and @code{lou_backTranslate} functions are used
in this process. @code{lou_hyphenate} returns 1 if hyphenation was
successful and 0 otherwise. In the latter case, the contents of the
@code{hyphens} parameter are undefined. This function was provided for
use in liblouisutdml.
@node lou_compileString
@section lou_compileString
@findex lou_compileString
@example
int lou_compileString (const char *tableList, const char *inString)
@end example
This function enables you to compile a table entry on the fly at
run-time. The new entry is added to @code{tableList} and remains in force
until @code{lou_free} is called. If @code{tableList} has not previously
been loaded it is loaded and compiled. @code{inString} contains the
table entry to be added. It may be anything valid. Error messages
will be produced if it is invalid. The function returns 1 on success and
0 on failure.
@node lou_getTypeformForEmphClass
@section lou_getTypeformForEmphClass
@findex lou_getTypeformForEmphClass
@example
int lou_getTypeformForEmphClass (const char *tableList, const char *emphClass);
@end example
This function returns the typeform bit associated with the given
emphasis class. If the emphasis class is undefined this function
returns @code{0}. If errors are found error messages are logged to the
log callback (see @code{lou_registerLogCallback}) and the return value
is @code{0}. @code{tableList} is a list of names of table files
separated by commas, as explained previously
(@pxref{translation-tables,,@code{tableList} parameter in
@code{lou_translateString}}). @code{emphClass} is the name of an
emphasis class.
@node lou_dotsToChar
@section lou_dotsToChar
@findex lou_dotsToChar
@example
int lou_dotsToChar (
const char *tableList,
const widechar *inbuf,
widechar *outbuf,
int length,
int mode)
@end example
This function takes a widechar string in @code{inbuf} consisting of dot
patterns and converts it to a widechar string in @code{outbuf}
consisting of characters according to the specifications in
@code{tableList}. @code{length} is the length of both @code{inbuf} and
@code{outbuf}. The dot patterns in @code{inbuf} can be in either
liblouis format or Unicode braille. The function returns 1 on success
and 0 on failure.
Note that the @code{mode} parameter has no effect and is deprecated.
@node lou_charToDots
@section lou_charToDots
@findex lou_charToDots
@example
int lou_charToDots (
const char *tableList,
const widechar *inbuf,
widechar *outbuf,
int length,
int mode)
@end example
This function is the inverse of @code{lou_dotsToChar}. It takes a
widechar string in @code{inbuf} consisting of characters and converts it
to a widechar string in @code{outbuf} consisting of dot patterns
according to the specifications in @code{tableList}. @code{length} is the
length of both @code{inbuf} and @code{outbuf}. The dot patterns in
@code{outbufbuf} are in liblouis format if the mode bit @code{ucBrl} is
not set and in Unicode format if it is set. The function returns 1 on
success and 0 on failure.
@node lou_registerLogCallback
@section lou_registerLogCallback
@findex lou_registerLogCallback
@example
typedef void (*logcallback) (
int level,
const char *message);
void lou_registerLogCallback (
logcallback callback);
@end example
This function can be used to register a custom logging callback. The
callback must take two arguments, the log level and the message string. By default
log messages are printed to stderr, or if a filename was specified
with @code{lou_logFile} then messages are logged to that
file. @code{lou_registerLogCallback} overrides the default
callback. Passing @code{NULL} resets to the default callback.
@node lou_setLogLevel
@section lou_setLogLevel
@findex lou_setLogLevel
@example
typedef enum
@{
LOU_LOG_ALL = 0,
LOU_LOG_DEBUG = 10000,
LOU_LOG_INFO = 20000,
LOU_LOG_WARN = 30000,
LOU_LOG_ERROR = 40000,
LOU_LOG_FATAL = 50000,
LOU_LOG_OFF = 60000
@} logLevels;
void lou_setLogLevel (
logLevels level);
@end example
This function can be used to influence the amount of logging, from
fatal error messages only to detailed debugging messages. Supported
values are @code{LOU_LOG_DEBUG}, @code{LOU_LOG_INFO},
@code{LOU_LOG_WARN}, @code{LOU_LOG_ERROR}, @code{LOU_LOG_FATAL} and
@code{LOU_LOG_OFF}. Enabling logging at a given level also enables
logging at all higher levels. Setting the level to @code{LOU_LOG_OFF}
disables logging. The default level is @code{LOU_LOG_INFO}.
@node lou_logFile
@section lou_logFile (deprecated)
@findex lou_logFile
@example
void lou_logFile (
char *fileName);
@end example
This function is used when it is not convenient either to let messages
be printed on stderr or to use redirection, as when liblouis is used
in a GUI application or in liblouisutdml. Any error messages generated
will be printed to the file given in this call. The entire path name of
the file must be given.
This function is deprecated. See @ref{Deprecation of the logging system}.
@node lou_logPrint
@section lou_logPrint (deprecated)
@findex lou_logPrint
@example
void lou_logPrint (
char *format,
...);
@end example
This function is called like @code{fprint}. It can be used by other
libraries to print messages to the file specified by the call to
@code{lou_logFile}. In particular, it is used by the companion
library liblouisutdml.
This function is deprecated. See @ref{Deprecation of the logging system}.
@node lou_logEnd
@section lou_logEnd (deprecated)
@findex lou_logEnd
@example
lou_logEnd ();
@end example
This function is used at the end of processing a document to close the
log file, so that it can be read by the rest of the program.
This function is deprecated. See @ref{Deprecation of the logging system}.
@node lou_setDataPath
@section lou_setDataPath
@findex lou_setDataPath
@example
char *lou_setDataPath (
char *path);
@end example
This function is used to tell liblouis and liblouisutdml where tables
and files are located. It thus makes them completely relocatable, even
on Linux. The @code{path} is the directory where the subdirectories
@code{liblouis/tables} and @code{liblouisutdml/lbu_files} are rooted
or located. The function returns a pointer to the @code{path}.
@node lou_getDataPath
@section lou_getDataPath
@findex lou_getDataPath
@example
char *lou_getDataPath ();
@end example
This function returns a pointer to the path set by
@code{lou_setDataPath}. If no path has been set it returns
@code{NULL}.
@node lou_getTable
@section lou_getTable
@findex lou_getTable
@example
void *lou_getTable (
char *tableList);
@end example
@code{tableList} is a list of names of table files separated by
commas, as explained previously
(@pxref{translation-tables,,@code{tableList} parameter in
@code{lou_translateString}}). If no errors are found this function
returns a pointer to the compiled table. If errors are found error
messages are logged to the log callback (see
@code{lou_registerLogCallback}). Errors result in a @code{NULL}
pointer being returned.
@node lou_findTable
@section lou_findTable
@findex lou_findTable
@example
char *lou_findTable (const char *query);
@end example
This function can be used to find a table based on
metadata. @code{query} is a string in the special @ref{Query
Syntax,query syntax}. It is matched against @ref{Table Metadata,table
metadata} inside the tables that were previously indexed with
@ref{lou_indexTables,@code{lou_indexTables}}. Returns the file name of
the best match. Returns @code{NULL} if the query is invalid or if no
match can be found.
The match algorithm works as follows:
@itemize @bullet
@item
For every table a match quotient with the query is computed. The table
with the highest (positive) match quotient wins. If no table has a
positive quotient, there is no match.
@item
A query is a list of features. Features defined first have a higher
importance (have a higher impact on the final quotient) than features
defined later.
@item
A feature that matches a metadata field in the table (keys equal and
values equal, or both values absent) adds to the quotient.
@item
A feature that is undefined in the table (no field with that key)
creates a medium penalty.
@item
A feature that is defined in the table but does not match (keys equal
but values not equal) creates the highest penalty.
@item
Every field in the table that has no corresponding feature in the
query creates a very small penalty.
@end itemize
@node lou_indexTables
@section lou_indexTables
@findex lou_indexTables
@example
void lou_indexTables (const char **tables);
@end example
This function must be called prior to
@ref{lou_findTable,@code{lou_findTable}}. It parses, analyzes and
indexes all specified tables. @code{tables} must be an array of file
names. Tables that contain invalid metadata are ignored.
@node lou_checkTable
@section lou_checkTable
@findex lou_checkTable
@example
int lou_checkTable (const char *tableList);
@end example
This function does the same as @code{lou_getTable} but does not return
a pointer to the resulting table. It is to be preferred if only the
validity of a table needs to be checked. @code{tableList} is a list of
names of table files separated by commas, as explained previously
(@pxref{translation-tables,,@code{tableList} parameter in
@code{lou_translateString}}). If no errors are found this function
returns a non-zero. If errors are found error messages are logged to
the log callback (see @code{lou_registerLogCallback}) and the return
value is @code{0}.
@node lou_readCharFromFile
@section lou_readCharFromFile
@findex lou_readCharFromFile
@example
int lou_readCharFromFile (
const char *fileName,
int *mode);
@end example
This function is provided for situations where it is necessary to read
a file which may contain little-endian or big-endian 16-bit Unicode
characters or ASCII8 characters. The return value is a little-endian
character, encoded as an integer. The @code{fileName} parameter is the
name of the file to be read. The @code{mode} parameter is a pointer to
an integer which must be set to 1 on the first call. After that, the
function takes care of it. On end-of-file the function returns
@code{EOF}.
@node lou_free
@section lou_free
@findex lou_free
@example
void lou_free ();
@end example
This function should be called at the end of the application to free
all memory allocated by liblouis. Failure to do so will result in
memory leaks. Do @emph{NOT} call @code{lou_free} after each
translation. This will force liblouis to compile the translation
tables every time they are used, resulting in great inefficiency.
@node lou_charSize
@section lou_charSize
@findex lou_charSize
@example
int lou_charSize ();
@end example
This function returns the size of @code{widechar} in bytes and can
therefore be used to differentiate between 16-bit and 32bit-Unicode
builds of liblouis.
@node Python bindings
@section Python bindings
There are Python bindings for @code{lou_translateString},
@code{lou_translate}, @code{lou_backTranslateString},
@code{lou_backTranslate}, @code{lou_hyphenate}, @code{checkTable},
@code{lou_compileString} and @code{lou_version}. For installation
instructions see the the @file{README} file in the @file{python}
directory. Usage information is included in the Python module itself.
@node Concept Index
@unnumbered Concept Index
@printindex cp
@node Opcode Index
@unnumbered Opcode Index
@printindex opcode
@node Function Index
@unnumbered Function Index
@printindex fn
@node Program Index
@unnumbered Program Index
@printindex pg
@bye
@c The following list is a list of exceptions for the ispell spell
@c checker
@c LocalWords: liblouis opcode args BRLTTY ViewPlus Abilitiessoft LGPL lou
@c LocalWords: checktable allround checkhyphens Opcodes Multipass dotsToChar
@c LocalWords: translateString backTranslateString backTranslate charToDots
@c LocalWords: compileString logFile logPrint checkyaml findTable
@c LocalWords: getTable checkTable readCharFromFile itemx charSize
@c LocalWords: README liblouisxml pindex samp kbd opcodes opcoderef numsign
@c LocalWords: FIXME ctb nemeth filename multipass suboperand uplow litdigit
@c LocalWords: begcaps endcaps letsign noletsign largesign typeform
@c LocalWords: noletsignbefore noletsignafter compbrl firstwordital
@c LocalWords: lenitalphrase doubleOpcode lastworditalbefore firstletterital
@c LocalWords: lastworditalafter lastletterital firstwordbold UTF
@c LocalWords: singleletterital lastwordboldbefore lastwordboldafter
@c LocalWords: firstletterbold lastletterbold lenboldphrase filll
@c LocalWords: singleletterbold firstwordunder lastwordunderbefore
@c LocalWords: lastwordunderafter firstletterunder lastletterunder
@c LocalWords: singleletterunder lenunderphrase begcomp endcomp decpoint texi
@c LocalWords: capsnocont noback nofor texinfo setfilename settitle direntry
@c LocalWords: dircategory finalout defindex opcodeindex noindent uref vskip
@c LocalWords: titlepage insertcopying ifnottex dir detailmenu italword RET
@c LocalWords: TranslationTableHeader txt cti nocross exactdots nocont emph
@c LocalWords: prepunc postpunc repword joinword lowword sufword prfword API
@c LocalWords: begword begmidword midword midendword endword partword begnum
@c LocalWords: midnum endnum joinnum swapcd swapdd swapcc multind endLog
@c LocalWords: backtranslation compileTranslationTable typedef louis ruleArea
@c LocalWords: HASHNUM capitalSign compdots findex const inbuf outbuf outlen
@c LocalWords: tableList TABLEPATH widechar inputPos cursorPos outputPos
@c LocalWords: inlen compbrlAtCursor compbrlLeftCursor trantab stderr endian
@c LocalWords: tablelist fileName printindex deprecatedopcode setDataPath
@c LocalWords: getDataPath MathML suboperands logEnd liblouisutdml whitespace
@c LocalWords: xhhhh yhhhhh zhhhhhhhh OpenOffice documentencoding
@c LocalWords: YAML JSON logLevels nocontractsign OSX DLL env NVDA
@c LocalWords: MERCHANTABILITY registerLogCallback setLogLevel brf
@c LocalWords: cindex chardefs xhtml pxref dec multi hyph dic Aa al
@c LocalWords: mrow mfrac emphclass transnote subsubsection begemph
@c LocalWords: endemph emphletter begemphword endemphword www cd th
@c LocalWords: lenemphphrase begemphphrase endemphphrase andthe se
@c LocalWords: abrege decrement pre cornf comf scano cornm cornp po
@c LocalWords: h's brl testtrans UCS asis libyaml url yaml formtype
@c LocalWords: testmode iftex unicode ueb xfail eo noContractions
@c LocalWords: dotsIO ucBrl noUndefined partialTrans capsletter
@c LocalWords: abc doctest inString enum cp outbufbuf logcallback fprint
@c LocalWords: lbu EOF heckTable fn ispell getTypeformForEmphClass
@c LocalWords: indexTables begcapsword endcapsword typeforms
@c LocalWords: endemphphraseopcode emphClass BrailleSense HumanWare
@c LocalWords: BrailleNote refreshable