public abstract class Treebank extends java.util.AbstractCollection<Tree>
Treebank object provides access to a corpus of examples with
given tree structures.
This class now implements the Collection interface. However, it may offer
less than the full power of the Collection interface: some Treebanks are
read only, and so may throw the UnsupportedOperationException.| Modifier and Type | Field and Description |
|---|---|
static java.lang.String |
DEFAULT_TREE_FILE_SUFFIX |
| Constructor and Description |
|---|
Treebank()
Create a new Treebank (using a LabeledScoredTreeReaderFactory).
|
Treebank(int initialCapacity)
Create a new Treebank.
|
Treebank(int initialCapacity,
TreeReaderFactory trf)
Create a new Treebank.
|
Treebank(TreeReaderFactory trf)
Create a new Treebank.
|
Treebank(TreeReaderFactory trf,
java.lang.String encoding)
Create a new Treebank.
|
| Modifier and Type | Method and Description |
|---|---|
abstract void |
apply(TreeVisitor tp)
Apply a TreeVisitor to each tree in the Treebank.
|
abstract void |
clear()
Empty a
Treebank. |
void |
decimate(java.io.Writer trainW,
java.io.Writer devW,
java.io.Writer testW)
Divide a Treebank into 3, by taking every 9th sentence for the dev
set and every 10th for the test set.
|
java.lang.String |
encoding()
Returns the encoding in use for treebank file bytestream access.
|
void |
loadPath(java.io.File path)
Load a sequence of trees from given file or directory and its subdirectories.
|
abstract void |
loadPath(java.io.File path,
java.io.FileFilter filt)
Load trees from given path specification.
|
void |
loadPath(java.io.File path,
java.lang.String suffix,
boolean recursively)
Load trees from given directory.
|
void |
loadPath(java.lang.String pathName)
Load a sequence of trees from given directory and its subdirectories.
|
void |
loadPath(java.lang.String pathName,
java.io.FileFilter filt)
Load a sequence of trees from given directory and its subdirectories
which match the file filter.
|
void |
loadPath(java.lang.String pathName,
java.lang.String suffix,
boolean recursively)
Load trees from given directory.
|
boolean |
remove(java.lang.Object o)
This operation isn't supported for a Treebank.
|
int |
size()
Returns the size of the Treebank.
|
java.lang.String |
textualSummary()
Return various statistics about the treebank (number of sentences,
words, tag set, etc.).
|
java.lang.String |
textualSummary(TreebankLanguagePack tlp)
Return various statistics about the treebank (number of sentences,
words, tag set, etc.).
|
java.lang.String |
toString()
Return the whole treebank as a series of big bracketed lists.
|
Treebank |
transform(TreeTransformer treeTrans)
Return a Treebank (actually a TransformingTreebank) where each
Tree in the current treebank has been transformed using the
TreeTransformer.
|
protected TreeReaderFactory |
treeReaderFactory()
Get the
TreeReaderFactory for a Treebank --
this method is provided in order to make the
TreeReaderFactory available to subclasses. |
add, addAll, contains, containsAll, isEmpty, iterator, removeAll, retainAll, toArray, toArrayclone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, waitpublic static final java.lang.String DEFAULT_TREE_FILE_SUFFIX
public Treebank()
public Treebank(TreeReaderFactory trf)
trf - the factory class to be called to create a new
TreeReaderpublic Treebank(TreeReaderFactory trf, java.lang.String encoding)
trf - the factory class to be called to create a new
TreeReaderencoding - The charset encoding to use for treebank file decodingpublic Treebank(int initialCapacity)
initialCapacity - The initial size of the underlying Collection,
(if a Collection-based storage mechanism is being provided)public Treebank(int initialCapacity,
TreeReaderFactory trf)
initialCapacity - The initial size of the underlying Collection,
(if a Collection-based storage mechanism is being provided)trf - the factory class to be called to create a new
TreeReaderprotected TreeReaderFactory treeReaderFactory()
TreeReaderFactory for a Treebank --
this method is provided in order to make the
TreeReaderFactory available to subclasses.public java.lang.String encoding()
public abstract void clear()
Treebank.public void loadPath(java.lang.String pathName)
pathName - file or directory namepublic void loadPath(java.io.File path)
path - File specificationpublic void loadPath(java.lang.String pathName,
java.lang.String suffix,
boolean recursively)
pathName - File or directory namesuffix - Extension of files to load: If pathName
is a directory, then, if this is
non-null, all and only files ending in "." followed
by this extension will be loaded; if it is null,
all files in directories will be loaded. If pathName
is not a directory, this parameter is ignored.recursively - descend into subdirectories as wellpublic void loadPath(java.io.File path,
java.lang.String suffix,
boolean recursively)
path - file or directory to load fromsuffix - suffix of files to loadrecursively - descend into subdirectories as wellpublic void loadPath(java.lang.String pathName,
java.io.FileFilter filt)
pathName - file or directory namefilt - A filter used to determine which files matchpublic abstract void loadPath(java.io.File path,
java.io.FileFilter filt)
path - file or directory to load fromfilt - a FilenameFilter of files to loadpublic abstract void apply(TreeVisitor tp)
tp - The TreeVisitor to be appliedpublic Treebank transform(TreeTransformer treeTrans)
treeTrans - The TreeTransformer to usepublic java.lang.String toString()
toString in class java.util.AbstractCollection<Tree>public int size()
public void decimate(java.io.Writer trainW,
java.io.Writer devW,
java.io.Writer testW)
throws java.io.IOException
java.io.IOExceptionpublic java.lang.String textualSummary()
public java.lang.String textualSummary(TreebankLanguagePack tlp)
tlp - The TreebankLanguagePack used to determine punctuation and an
appropriate character encoding