public class PennTreeReader extends java.lang.Object implements TreeReader
TreeReader interface to read Penn Treebank-style
files. The reader is implemented as a push-down automaton (PDA) that parses the Lisp-style
format in which the trees are stored. This reader is compatible with both PTB
and PATB trees.
PennTreeReader
silently replaces \* with * and \/ with /. Two possible designs
for this were to make the PennTreeReader always do
this or to make the TreeNormalizers do this. We
decided to put it in the PennTreeReader class itself
to avoid the problem of people making new
TreeNormalizers and forgetting to include the
unescaping.| Constructor and Description |
|---|
PennTreeReader(java.io.Reader in)
Read parse trees from a
Reader. |
PennTreeReader(java.io.Reader in,
TreeFactory tf)
Read parse trees from a
Reader. |
PennTreeReader(java.io.Reader in,
TreeFactory tf,
TreeNormalizer tn)
Read parse trees from a Reader.
|
PennTreeReader(java.io.Reader in,
TreeFactory tf,
TreeNormalizer tn,
Tokenizer<java.lang.String> st)
Read parse trees from a Reader.
|
| Modifier and Type | Method and Description |
|---|---|
void |
close()
Closes the underlying
Reader used to create this
class. |
static void |
main(java.lang.String[] args)
Loads treebank data from first argument and prints it.
|
Tree |
readTree()
Reads a single tree in standard Penn Treebank format from the
input stream.
|
public PennTreeReader(java.io.Reader in)
Reader.
For the defaulted arguments, you get a
SimpleTreeFactory, no TreeNormalizer, and
a PennTreebankTokenizer.in - The Readerpublic PennTreeReader(java.io.Reader in,
TreeFactory tf)
Reader.in - the Readertf - TreeFactory -- factory to create some kind of Treepublic PennTreeReader(java.io.Reader in,
TreeFactory tf,
TreeNormalizer tn)
in - Readertf - TreeFactory -- factory to create some kind of Treetn - the method of normalizing treespublic PennTreeReader(java.io.Reader in,
TreeFactory tf,
TreeNormalizer tn,
Tokenizer<java.lang.String> st)
in - Readertf - TreeFactory -- factory to create some kind of Treetn - the method of normalizing treesst - Tokenizer that divides up Readerpublic Tree readTree() throws java.io.IOException
IOException.
Note that the method will skip malformed trees and attempt to
read additional trees from the input stream. It is possible, however,
that a malformed tree will corrupt the token stream. In this case,
an IOException will eventually be thrown.
readTree in interface TreeReadernull at end of token stream.java.io.IOException - If I/O problempublic void close()
throws java.io.IOException
Reader used to create this
class.close in interface TreeReaderclose in interface java.io.Closeableclose in interface java.lang.AutoCloseablejava.io.IOExceptionpublic static void main(java.lang.String[] args)
args - Array of command-line arguments: specifies a filename