Next: Input Encodings, Previous: Macro Packages, Up: Text [Contents][Index]
Organize input to
GNU
troff into lines separated by the Unix newline character
(U+000A),
using the character encoding it recognizes:
ISO Latin-1 (8859-1).
A document encoded in
ISO 646:1991 IRV
(US-ASCII),
or,
equivalently,
uses only code points from the
“C0 Controls” and “Basic Latin” parts of the Unicode character set
is also a valid ISO Latin-1 document;
the standards are interchangeable
in their first 128 code points.34
Some control characters
(from the sets “C0 Controls” and “C1 Controls”
as Unicode describes them)
are invalid as input characters.
GNU
troff discards them upon reading.35
It processes
a character sequence “foo”,
followed by an invalid
character and then “bar”,
as “foobar”.
Invalid input characters comprise
0x00,
0x0B,
0x0D–0x1F,
and
0x80–0x9F.36
GNU
troff uses some of these code points for internal purposes,
making non-trivial the extension of the program
to accept UTF-8
or other encodings that use characters from these ranges.