public class ChineseTreebankParserParams extends AbstractTreebankParserParams
AbstractTreebankParserParams.AnnotatePunctuationFunction, AbstractTreebankParserParams.RemoveGFSubcategoryStripper, AbstractTreebankParserParams.SubcategoryStripper| Modifier and Type | Field and Description |
|---|---|
boolean |
bikelHeadFinder |
boolean |
charTags |
boolean |
chineseSelectiveTagPA |
boolean |
chineseSplitDouHao
Chinese: Split the dou hao (a punctuation mark separating
members of a list) from other punctuation.
|
boolean |
chineseSplitPunct
Chinese: split Chinese punctuation several ways, along the lines
of English punctuation plus another category for the dou hao.
|
boolean |
chineseSplitPunctLR
Chinese: split left right/paren quote (if chineseSplitPunct is also
true.
|
int |
chineseSplitVP
Chinese VP splitting.
|
boolean |
chineseVerySelectiveTagPA |
static boolean |
DEFAULT_USE_GOOD_TURNING_UNKNOWN_WORD_MODEL
Parameters specific for creating a ChineseLexicon
|
boolean |
discardFrags |
boolean |
dominatesV
Verbal distance -- mark whether symbol dominates a verb (V*).
|
boolean |
gpaAD
Grandparent annotate all AD.
|
double |
lengthPenalty
Parameters for a ChineseCharacterBasedLexicon
|
boolean |
markADgrandchildOfIP
Chinese: mark ADs that are grandchild of IP.
|
boolean |
markCC
Mark phrases which are conjunctions.
|
boolean |
markIPadjsubj |
boolean |
markIPconj
Chinese: mark IPs that are conjuncts.
|
boolean |
markIPsisDEC
Chinese: mark IPs that are part of prenominal modifiers.
|
boolean |
markIPsisterBA
Chinese: mark IPs that are sister of BA.
|
boolean |
markIPsisterVVorP
Chinese: mark IP's that are sister of VV or P.
|
boolean |
markModifiedNP
Chinese: mark left-modified NPs (rightmost NPs with a left-side
mod).
|
boolean |
markMultiNtag
Chinese: mark nominal tags that are part of multi-nominal
rewrites.
|
boolean |
markNPconj
Chinese: mark NPs that are conjuncts.
|
boolean |
markNPmodNP
Chinese: mark NP modifiers of NPs.
|
boolean |
markPostverbalP
Chinese: mark P with a left aunt VV, and PP with a left sister
VV.
|
boolean |
markPostverbalPP |
boolean |
markPsisterIP
Chinese: mark P's that are sister of IP.
|
boolean |
markVPadjunct
Chinese: mark phrases that are adjuncts of VP (these tend to be
locatives/temporals, and have a specific distribution).
|
boolean |
markVVsisterIP
Chinese: mark VVs that are sister of IP (communication &
small-clause-taking verbs).
|
boolean |
mergeNNVV
Chinese: merge NN and VV.
|
boolean |
paRootDtr
Chinese: parent annotate daughter of root.
|
int |
penaltyType
penaltyType should be set as follows:
0: no length penalty
1: quadratic length penalty
2: penalty for continuation chars only
TODO: make this an enum
|
boolean |
segment |
java.lang.String |
segmenterClass |
boolean |
segmentMarkov |
boolean |
splitBaseNP
Mark base NPs.
|
boolean |
splitNPTMP
Whether to retain the -TMP functional tag on various phrasal
categories.
|
boolean |
splitPPTMP |
boolean |
splitXPTMP |
boolean |
sunJurafskyHeadFinder |
boolean |
tagWordSize
Annotate tags for number of characters contained.
|
boolean |
unaryCP |
boolean |
unaryIP
Chinese: unary category marking
|
boolean |
useCharacterBasedLexicon |
boolean |
useCharBasedUnknownWordModel |
boolean |
useGoodTuringUnknownWordModel |
boolean |
useMaxentDepGrammar |
boolean |
useMaxentLexicon |
boolean |
useSimilarWordMap |
boolean |
useUnknownCharacterModel |
evalGF, generateOriginalDependencies, inputEncoding, outputEncoding, tlp| Constructor and Description |
|---|
ChineseTreebankParserParams() |
| Modifier and Type | Method and Description |
|---|---|
TreeTransformer |
collinizer()
Returns a ChineseCollinizer
|
TreeTransformer |
collinizerEvalb()
Returns a ChineseCollinizer that doesn't delete punctuation
|
java.util.ArrayList<Word> |
defaultTestSentence()
Return a default sentence for the language (for testing)
|
Extractor<DependencyGrammar> |
dependencyGrammarExtractor(Options op,
Index<java.lang.String> wordIndex,
Index<java.lang.String> tagIndex) |
DiskTreebank |
diskTreebank()
Uses a DiskTreebank with a CHTBTokenizer and a
BobChrisTreeNormalizer.
|
void |
display()
display language-specific settings
|
GrammaticalStructure |
getGrammaticalStructure(Tree t,
java.util.function.Predicate<java.lang.String> filter,
HeadFinder hf)
Build a GrammaticalStructure from a Tree.
|
HeadFinder |
headFinder()
Returns a ChineseHeadFinder
|
Lexicon |
lex(Options op,
Index<java.lang.String> wordIndex,
Index<java.lang.String> tagIndex)
Returns a ChineseLexicon
|
static void |
main(java.lang.String[] args)
For testing: loads a treebank and prints the trees.
|
MemoryTreebank |
memoryTreebank()
Uses a MemoryTreebank with a CHTBTokenizer and a
BobChrisTreeNormalizer
|
double[] |
MLEDependencyGrammarSmoothingParams()
Give the parameters for smoothing in the MLEDependencyGrammar.
|
java.util.List<GrammaticalStructure> |
readGrammaticalStructureFromFile(java.lang.String filename)
Returns a function which reads the given filename and turns its
content in a list of GrammaticalStructures.
|
int |
setOptionFlag(java.lang.String[] args,
int i)
Set language-specific options according to flags.
|
java.lang.String[] |
sisterSplitters()
Returns the splitting strings used for selective splits.
|
boolean |
supportsBasicDependencies()
By default, parsers are assumed to not support dependencies.
|
Tree |
transformTree(Tree t,
Tree root)
transformTree does all language-specific tree
transformations.
|
TreeReaderFactory |
treeReaderFactory()
Returns a factory for reading in trees from the source you want.
|
HeadFinder |
typedDependencyHeadFinder()
The HeadFinder to use when extracting typed dependencies.
|
defaultCoreNLPFlags, dependencyObjectify, generateOriginalDependencies, getInputEncoding, getOutputEncoding, isEvalGF, parsevalObjectify, parsevalObjectify, ppAttachmentEval, processHeadWord, pw, pw, setEvalGF, setEvaluateGrammaticalFunctions, setGenerateOriginalDependencies, setInputEncoding, setOutputEncoding, subcategoryStripper, testMemoryTreebank, treebank, treebankLanguagePack, treeTokenizerFactory, typedDependencyClasser, typedDependencyObjectify, unorderedTypedDependencyObjectify, unorderedUntypedDependencyObjectify, untypedDependencyObjectifypublic boolean charTags
public boolean useCharacterBasedLexicon
public boolean useMaxentLexicon
public boolean useMaxentDepGrammar
public boolean segment
public boolean segmentMarkov
public boolean sunJurafskyHeadFinder
public boolean bikelHeadFinder
public boolean discardFrags
public boolean useSimilarWordMap
public java.lang.String segmenterClass
public boolean chineseSplitDouHao
public boolean chineseSplitPunct
public boolean chineseSplitPunctLR
public boolean markVVsisterIP
public boolean markPsisterIP
public boolean markIPsisterVVorP
public boolean markADgrandchildOfIP
public boolean gpaAD
public boolean chineseVerySelectiveTagPA
public boolean chineseSelectiveTagPA
public boolean markIPsisterBA
public boolean markVPadjunct
public boolean markNPmodNP
public boolean markModifiedNP
public boolean markNPconj
public boolean markMultiNtag
public boolean markIPsisDEC
public boolean markIPconj
public boolean markIPadjsubj
public int chineseSplitVP
public boolean mergeNNVV
public boolean unaryIP
public boolean unaryCP
public boolean paRootDtr
public boolean markPostverbalP
public boolean markPostverbalPP
public boolean splitBaseNP
public boolean tagWordSize
public boolean markCC
public boolean splitNPTMP
public boolean splitPPTMP
public boolean splitXPTMP
public boolean dominatesV
public static final boolean DEFAULT_USE_GOOD_TURNING_UNKNOWN_WORD_MODEL
public boolean useGoodTuringUnknownWordModel
public boolean useCharBasedUnknownWordModel
public double lengthPenalty
public boolean useUnknownCharacterModel
public int penaltyType
public HeadFinder headFinder()
headFinder in interface TreebankLangParserParamsheadFinder in class AbstractTreebankParserParamspublic HeadFinder typedDependencyHeadFinder()
AbstractTreebankParserParamstypedDependencyHeadFinder in interface TreebankLangParserParamstypedDependencyHeadFinder in class AbstractTreebankParserParamspublic Lexicon lex(Options op, Index<java.lang.String> wordIndex, Index<java.lang.String> tagIndex)
lex in interface TreebankLangParserParamslex in class AbstractTreebankParserParamsop - Options as to how the Lexicon behavespublic double[] MLEDependencyGrammarSmoothingParams()
AbstractTreebankParserParamsMLEDependencyGrammarSmoothingParams in interface TreebankLangParserParamsMLEDependencyGrammarSmoothingParams in class AbstractTreebankParserParamspublic TreeReaderFactory treeReaderFactory()
TreebankLangParserParamspublic DiskTreebank diskTreebank()
diskTreebank in interface TreebankLangParserParamsdiskTreebank in class AbstractTreebankParserParamspublic MemoryTreebank memoryTreebank()
memoryTreebank in interface TreebankLangParserParamsmemoryTreebank in class AbstractTreebankParserParamspublic TreeTransformer collinizer()
collinizer in interface TreebankLangParserParamscollinizer in class AbstractTreebankParserParamspublic TreeTransformer collinizerEvalb()
collinizerEvalb in interface TreebankLangParserParamscollinizerEvalb in class AbstractTreebankParserParamspublic java.lang.String[] sisterSplitters()
AbstractTreebankParserParamssisterSplitters in interface TreebankLangParserParamssisterSplitters in class AbstractTreebankParserParamspublic Tree transformTree(Tree t, Tree root)
transformTree in interface TreebankLangParserParamstransformTree in class AbstractTreebankParserParamst - The input tree (with non-language specific annotation already
done, so you need to strip back to basic categories)root - The root of the current tree (can be null for words)public void display()
AbstractTreebankParserParamsdisplay in interface TreebankLangParserParamsdisplay in class AbstractTreebankParserParamspublic int setOptionFlag(java.lang.String[] args,
int i)
setOptionFlag in interface TreebankLangParserParamssetOptionFlag in class AbstractTreebankParserParamsargs - Array of command line argumentsi - Index in command line arguments to try to process as an optionpublic Extractor<DependencyGrammar> dependencyGrammarExtractor(Options op, Index<java.lang.String> wordIndex, Index<java.lang.String> tagIndex)
dependencyGrammarExtractor in interface TreebankLangParserParamsdependencyGrammarExtractor in class AbstractTreebankParserParamspublic java.util.ArrayList<Word> defaultTestSentence()
public java.util.List<GrammaticalStructure> readGrammaticalStructureFromFile(java.lang.String filename)
TreebankLangParserParamsreadGrammaticalStructureFromFile in interface TreebankLangParserParamsreadGrammaticalStructureFromFile in class AbstractTreebankParserParamspublic GrammaticalStructure getGrammaticalStructure(Tree t, java.util.function.Predicate<java.lang.String> filter, HeadFinder hf)
TreebankLangParserParamsgetGrammaticalStructure in interface TreebankLangParserParamsgetGrammaticalStructure in class AbstractTreebankParserParamspublic boolean supportsBasicDependencies()
AbstractTreebankParserParamssupportsBasicDependencies in interface TreebankLangParserParamssupportsBasicDependencies in class AbstractTreebankParserParamspublic static void main(java.lang.String[] args)