FILE FORMATS
Grammar, lexicon,
and test sentence
files are all in
plain ASCII text
format. Some of the
files include
comment lines
beginning with the
semicolon character
(";") to include
required notices.
The grammar files
are in the form of
blocks, separated by
blank lines,
defining the
productions
expanding each
nonterminal. The
first line in each
block contains only
the nonterminal
symbol whose
productions are
defined by that
block. Each
remaining line of
the block consists
of a space-separated
sequence of
nonterminals and
preterminals
defining a possible
expansion of the
nonterminal in
question. Tokens
beginning with
upper-case
characters are
nonterminals, all
other tokens are
preterminals. For
example, the block
NP
det NBAR
NP POSTNOMMOD
would define
productions more
conventionally
written as
NP -> det
NBAR
NP -> NP
POSTNOMMOD
In the lexicon
files, each line
contains a lexical
item followed by a
space followed by
its preterminal
category. Note that
lexical items and
preterminal
categories are not
necessarily distinct
symbols. In these
lexicons, whenever
there is only one
lexical item in a
given preterminal
category, the
lexical item itself
is used as the
symbol for the
preterminal
category.
In the sentence
files, each line
consists of the
lexical tokens of a
single sentence,
separated by spaces.
Punctuation marks
are treated as
lexical tokens, and
are present only
where required by
the corresponding
grammar.
|