Version: 1.00 Date: 29/1/05
 

FILE FORMATS

(taken from the supplementary materials of "Improved Left-Corner Chart Parsing for Large Context-Free Grammars" by Robert C. Moore)

Grammar, lexicon, and test sentence files are all in plain ASCII text format. Some of the files include comment lines beginning with the semicolon character (";") to include required notices.

The grammar files are in the form of blocks, separated by blank lines, defining the productions expanding each nonterminal. The first line in each block contains only the nonterminal symbol whose productions are defined by that block. Each remaining line of the block consists of a space-separated sequence of nonterminals and preterminals defining a possible expansion of the nonterminal in question. Tokens beginning with upper-case characters are nonterminals, all other tokens are preterminals. For example, the block

NP
det NBAR
NP POSTNOMMOD

would define productions more conventionally written as

NP -> det NBAR
NP -> NP POSTNOMMOD

In the lexicon files, each line contains a lexical item followed by a space followed by its preterminal category. Note that lexical items and preterminal categories are not necessarily distinct symbols. In these lexicons, whenever there is only one lexical item in a given preterminal category, the lexical item itself is used as the symbol for the preterminal category.

In the sentence files, each line consists of the lexical tokens of a single sentence, separated by spaces. Punctuation marks are treated as lexical tokens, and are present only where required by the corresponding grammar.

 

 


 

Contact us


Back to ADIOS