public abstract class AbstractTreebankParserParams extends java.lang.Object implements TreebankLangParserParams
TreebankLangParserParams implementing class.
With some extending classes you'll want to have access to special
attributes of the corresponding TreebankLanguagePack while taking
advantage of this class's code for making the TreebankLanguagePack
accessible. A good way to do this is to pass a new instance of the
appropriate TreebankLanguagePack into this class's constructor,
then get it back later on by casting a call to
treebankLanguagePack(). See ChineseTreebankParserParams for an
example.| Modifier and Type | Class and Description |
|---|---|
protected static class |
AbstractTreebankParserParams.AnnotatePunctuationFunction
Annotation function for mapping punctuation to PTB-style equivalence classes.
|
protected class |
AbstractTreebankParserParams.RemoveGFSubcategoryStripper
The job of this class is to remove subcategorizations from
tag and category nodes, so as to put a tree in a suitable
state for evaluation.
|
protected class |
AbstractTreebankParserParams.SubcategoryStripper
The job of this class is to remove subcategorizations from
tag and category nodes, so as to put a tree in a suitable
state for evaluation.
|
| Modifier and Type | Field and Description |
|---|---|
protected boolean |
evalGF
If true, then evaluation is over grammatical functions as well as the labels
If false, then grammatical functions are stripped for evaluation.
|
protected boolean |
generateOriginalDependencies |
protected java.lang.String |
inputEncoding |
protected java.lang.String |
outputEncoding |
protected TreebankLanguagePack |
tlp |
| Modifier | Constructor and Description |
|---|---|
protected |
AbstractTreebankParserParams(TreebankLanguagePack tlp)
Stores the passed-in TreebankLanguagePack and sets up charset encodings.
|
| Modifier and Type | Method and Description |
|---|---|
abstract TreeTransformer |
collinizer()
the tree transformer used to produce trees for evaluation.
|
abstract TreeTransformer |
collinizerEvalb()
the tree transformer used to produce trees for evaluation.
|
java.lang.String[] |
defaultCoreNLPFlags()
When run inside StanfordCoreNLP, which flags should be used by default
|
Extractor<DependencyGrammar> |
dependencyGrammarExtractor(Options op,
Index<java.lang.String> wordIndex,
Index<java.lang.String> tagIndex) |
static <E> java.util.Collection<E> |
dependencyObjectify(Tree t,
HeadFinder hf,
TreeTransformer collinizer,
DependencyTyper<E> typer)
Returns the set of dependencies in a tree, according to some
DependencyTyper. |
abstract DiskTreebank |
diskTreebank()
returns a DiskTreebank appropriate to the treebank source
|
abstract void |
display()
display language-specific settings
|
boolean |
generateOriginalDependencies() |
GrammaticalStructure |
getGrammaticalStructure(Tree t,
java.util.function.Predicate<java.lang.String> filter,
HeadFinder hf)
Build a GrammaticalStructure from a Tree.
|
java.lang.String |
getInputEncoding()
Returns the input encoding being used.
|
java.lang.String |
getOutputEncoding()
Returns the output encoding being used.
|
abstract HeadFinder |
headFinder()
The HeadFinder to use for your treebank.
|
boolean |
isEvalGF() |
Lexicon |
lex(Options op,
Index<java.lang.String> wordIndex,
Index<java.lang.String> tagIndex)
Vends a
Lexicon object suitable to the particular language/treebank combination of interest. |
abstract MemoryTreebank |
memoryTreebank()
returns a MemoryTreebank appropriate to the treebank source
|
double[] |
MLEDependencyGrammarSmoothingParams()
Give the parameters for smoothing in the MLEDependencyGrammar.
|
static java.util.Collection<Constituent> |
parsevalObjectify(Tree t,
TreeTransformer collinizer)
Takes a Tree and a collinizer and returns a Collection of labeled
Constituents for PARSEVAL. |
static java.util.Collection<Constituent> |
parsevalObjectify(Tree t,
TreeTransformer collinizer,
boolean labelConstituents)
Takes a Tree and a collinizer and returns a Collection of
Constituents for
PARSEVAL evaluation. |
AbstractEval |
ppAttachmentEval()
Returns a language specific object for evaluating PP attachment
|
Label |
processHeadWord(Label headWord)
Allows language specific processing (e.g., stemming) of head words.
|
java.io.PrintWriter |
pw()
The PrintWriter used to print output.
|
java.io.PrintWriter |
pw(java.io.OutputStream o)
The PrintWriter used to print output.
|
java.util.List<GrammaticalStructure> |
readGrammaticalStructureFromFile(java.lang.String filename)
Returns a function which reads the given filename and turns its
content in a list of GrammaticalStructures.
|
void |
setEvalGF(boolean evalGF) |
void |
setEvaluateGrammaticalFunctions(boolean evalGFs)
Sets whether to consider grammatical functions in evaluation
|
void |
setGenerateOriginalDependencies(boolean originalDependencies)
For languages that have implementations of the
original Stanford dependencies and Universal
dependencies, this parameter is used to decide which
implementation should be used.
|
void |
setInputEncoding(java.lang.String encoding)
Sets the input encoding.
|
int |
setOptionFlag(java.lang.String[] args,
int i)
Set language-specific options according to flags.
|
void |
setOutputEncoding(java.lang.String encoding)
Sets the output encoding.
|
abstract java.lang.String[] |
sisterSplitters()
Returns the splitting strings used for selective splits.
|
TreeTransformer |
subcategoryStripper()
Returns a TreeTransformer appropriate to the Treebank which
can be used to remove functional tags (such as "-TMP") from
categories.
|
boolean |
supportsBasicDependencies()
By default, parsers are assumed to not support dependencies.
|
MemoryTreebank |
testMemoryTreebank()
You can often return the same thing for testMemoryTreebank as
for memoryTreebank
|
abstract Tree |
transformTree(Tree t,
Tree root)
This method does language-specific tree transformations such
as annotating particular nodes with language-relevant features.
|
Treebank |
treebank()
Implemented as required by TreebankFactory.
|
TreebankLanguagePack |
treebankLanguagePack()
Returns an appropriate treebankLanguagePack
|
TokenizerFactory<Tree> |
treeTokenizerFactory() |
static EquivalenceClasser<java.util.List<java.lang.String>,java.lang.String> |
typedDependencyClasser()
Returns an EquivalenceClasser that classes typed dependencies
by the syntactic categories of mother, head and daughter,
plus direction.
|
abstract HeadFinder |
typedDependencyHeadFinder()
The HeadFinder to use when extracting typed dependencies.
|
static java.util.Collection<java.util.List<java.lang.String>> |
typedDependencyObjectify(Tree t,
HeadFinder hf,
TreeTransformer collinizer)
Returns a collection of word-word dependencies typed by mother, head, daughter node syntactic categories.
|
static java.util.Collection<java.util.List<java.lang.String>> |
unorderedTypedDependencyObjectify(Tree t,
HeadFinder hf,
TreeTransformer collinizer)
Returns a collection of unordered (but directed!) typed word-word dependencies for the tree.
|
static java.util.Collection<java.util.List<java.lang.String>> |
unorderedUntypedDependencyObjectify(Tree t,
HeadFinder hf,
TreeTransformer collinizer)
Returns a collection of unordered (but directed!) untyped word-word dependencies for the tree.
|
static java.util.Collection<java.util.List<java.lang.String>> |
untypedDependencyObjectify(Tree t,
HeadFinder hf,
TreeTransformer collinizer)
Returns a collection of untyped word-word dependencies for the tree.
|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitdefaultTestSentence, treeReaderFactoryprotected boolean evalGF
protected java.lang.String inputEncoding
protected java.lang.String outputEncoding
protected TreebankLanguagePack tlp
protected boolean generateOriginalDependencies
protected AbstractTreebankParserParams(TreebankLanguagePack tlp)
tlp - The treebank language pack to usepublic Label processHeadWord(Label headWord)
TreebankLangParserParamsprocessHeadWord in interface TreebankLangParserParamsheadWord - An Label that minimally implements the
HasWord and HasTag interfaces.Labelpublic void setEvaluateGrammaticalFunctions(boolean evalGFs)
setEvaluateGrammaticalFunctions in interface TreebankLangParserParamspublic void setInputEncoding(java.lang.String encoding)
setInputEncoding in interface TreebankLangParserParamspublic void setOutputEncoding(java.lang.String encoding)
setOutputEncoding in interface TreebankLangParserParamspublic java.lang.String getOutputEncoding()
getOutputEncoding in interface TreebankLangParserParamspublic java.lang.String getInputEncoding()
getInputEncoding in interface TreebankLangParserParamspublic AbstractEval ppAttachmentEval()
ppAttachmentEval in interface TreebankLangParserParamsAbstractEvalpublic abstract MemoryTreebank memoryTreebank()
memoryTreebank in interface TreebankLangParserParamspublic abstract DiskTreebank diskTreebank()
diskTreebank in interface TreebankLangParserParamspublic MemoryTreebank testMemoryTreebank()
testMemoryTreebank in interface TreebankLangParserParamspublic Treebank treebank()
treebank in interface TreebankLangParserParamstreebank in interface TreebankFactorypublic java.io.PrintWriter pw()
pw in interface TreebankLangParserParamspublic java.io.PrintWriter pw(java.io.OutputStream o)
pw in interface TreebankLangParserParamspublic TreebankLanguagePack treebankLanguagePack()
treebankLanguagePack in interface TreebankLangParserParamspublic abstract HeadFinder headFinder()
headFinder in interface TreebankLangParserParamspublic abstract HeadFinder typedDependencyHeadFinder()
typedDependencyHeadFinder in interface TreebankLangParserParamspublic Lexicon lex(Options op, Index<java.lang.String> wordIndex, Index<java.lang.String> tagIndex)
TreebankLangParserParamsLexicon object suitable to the particular language/treebank combination of interest.lex in interface TreebankLangParserParamsop - Options as to how the Lexicon behavespublic double[] MLEDependencyGrammarSmoothingParams()
MLEDependencyGrammarSmoothingParams in interface TreebankLangParserParamspublic static java.util.Collection<Constituent> parsevalObjectify(Tree t, TreeTransformer collinizer)
Constituents for PARSEVAL.t - The tree to extract constituents fromcollinizer - The TreeTransformer used to normalize the tree for
evaluationpublic static java.util.Collection<Constituent> parsevalObjectify(Tree t, TreeTransformer collinizer, boolean labelConstituents)
Constituents for
PARSEVAL evaluation. Some notes on this particular parseval:
labelConstituents
parameter
public static java.util.Collection<java.util.List<java.lang.String>> untypedDependencyObjectify(Tree t, HeadFinder hf, TreeTransformer collinizer)
public static java.util.Collection<java.util.List<java.lang.String>> unorderedUntypedDependencyObjectify(Tree t, HeadFinder hf, TreeTransformer collinizer)
public static java.util.Collection<java.util.List<java.lang.String>> typedDependencyObjectify(Tree t, HeadFinder hf, TreeTransformer collinizer)
public static java.util.Collection<java.util.List<java.lang.String>> unorderedTypedDependencyObjectify(Tree t, HeadFinder hf, TreeTransformer collinizer)
public static <E> java.util.Collection<E> dependencyObjectify(Tree t, HeadFinder hf, TreeTransformer collinizer, DependencyTyper<E> typer)
DependencyTyper.public static EquivalenceClasser<java.util.List<java.lang.String>,java.lang.String> typedDependencyClasser()
public abstract TreeTransformer collinizer()
collinizer in interface TreebankLangParserParamspublic abstract TreeTransformer collinizerEvalb()
collinizerEvalb in interface TreebankLangParserParamspublic abstract java.lang.String[] sisterSplitters()
sisterSplitters in interface TreebankLangParserParamspublic TreeTransformer subcategoryStripper()
subcategoryStripper in interface TreebankLangParserParamspublic abstract Tree transformTree(Tree t, Tree root)
t. It changes both
labels and the tree shape.transformTree in interface TreebankLangParserParamst - The input tree (with non-language specific annotation already
done, so you need to strip back to basic categories)root - The root of the current tree (can be null for words)public abstract void display()
display in interface TreebankLangParserParamspublic int setOptionFlag(java.lang.String[] args,
int i)
Generic options are processed separately by
Options.setOption(String[],int),
and implementations of this method do not have to worry about them.
The Options class handles routing options.
TreebankParserParams that extend this class should call super when
overriding this method.
setOptionFlag in interface TreebankLangParserParamsargs - Array of command line argumentsi - Index in command line arguments to try to process as an optionpublic TokenizerFactory<Tree> treeTokenizerFactory()
treeTokenizerFactory in interface TreebankLangParserParamspublic Extractor<DependencyGrammar> dependencyGrammarExtractor(Options op, Index<java.lang.String> wordIndex, Index<java.lang.String> tagIndex)
dependencyGrammarExtractor in interface TreebankLangParserParamspublic boolean isEvalGF()
public void setEvalGF(boolean evalGF)
public java.util.List<GrammaticalStructure> readGrammaticalStructureFromFile(java.lang.String filename)
TreebankLangParserParamsreadGrammaticalStructureFromFile in interface TreebankLangParserParamspublic GrammaticalStructure getGrammaticalStructure(Tree t, java.util.function.Predicate<java.lang.String> filter, HeadFinder hf)
TreebankLangParserParamsgetGrammaticalStructure in interface TreebankLangParserParamspublic boolean supportsBasicDependencies()
supportsBasicDependencies in interface TreebankLangParserParamspublic void setGenerateOriginalDependencies(boolean originalDependencies)
setGenerateOriginalDependencies in interface TreebankLangParserParamspublic boolean generateOriginalDependencies()
generateOriginalDependencies in interface TreebankLangParserParamspublic java.lang.String[] defaultCoreNLPFlags()
TreebankLangParserParamsdefaultCoreNLPFlags in interface TreebankLangParserParams