public interface Lexicon
extends java.io.Serializable
| Modifier and Type | Field and Description | 
|---|---|
| static java.lang.String | BOUNDARY | 
| static java.lang.String | BOUNDARY_TAG | 
| static java.lang.String | UNKNOWN_WORD | 
| Modifier and Type | Method and Description | 
|---|---|
| void | finishTraining()Done collecting statistics for the lexicon. | 
| UnknownWordModel | getUnknownWordModel() | 
| void | incrementTreesRead(double weight)If training on a per-word basis instead of on a per-tree basis,
 we will want to increment the tree count as this happens. | 
| void | initializeTraining(double numTrees)Start training this lexicon on the expected number of trees. | 
| boolean | isKnown(int word)Checks whether a word is in the lexicon. | 
| boolean | isKnown(java.lang.String word)Checks whether a word is in the lexicon. | 
| int | numRules()Returns the number of rules (tag rewrites as word) in the Lexicon. | 
| void | readData(java.io.BufferedReader in)Read the lexicon from the BufferedReader in the format written by
 writeData. | 
| java.util.Iterator<IntTaggedWord> | ruleIteratorByWord(int word,
                  int loc,
                  java.lang.String featureSpec)Get an iterator over all rules (pairs of (word, POS)) for this word. | 
| java.util.Iterator<IntTaggedWord> | ruleIteratorByWord(java.lang.String word,
                  int loc,
                  java.lang.String featureSpec)Same thing, but with a string that needs to be translated by the
 lexicon's word index | 
| float | score(IntTaggedWord iTW,
     int loc,
     java.lang.String word,
     java.lang.String featureSpec)Get the score of this word with this tag (as an IntTaggedWord) at this
 loc. | 
| void | setUnknownWordModel(UnknownWordModel uwm) | 
| java.util.Set<java.lang.String> | tagSet(java.util.function.Function<java.lang.String,java.lang.String> basicCategoryFunction)Return the Set of tags used by this tagger (available after training the tagger). | 
| void | train(java.util.Collection<Tree> trees)Trains this lexicon on the Collection of trees. | 
| void | train(java.util.Collection<Tree> trees,
     java.util.Collection<Tree> rawTrees) | 
| void | train(java.util.Collection<Tree> trees,
     double weight) | 
| void | train(java.util.List<TaggedWord> sentence,
     double weight)Not all subclasses support this particular method. | 
| void | train(TaggedWord tw,
     int loc,
     double weight)Not all subclasses support this particular method. | 
| void | train(Tree tree,
     double weight) | 
| void | trainUnannotated(java.util.List<TaggedWord> sentence,
                double weight)Sometimes we might have a sentence of tagged words which we would
 like to add to the lexicon, but they weren't part of a binarized,
 markovized, or otherwise annotated tree. | 
| void | writeData(java.io.Writer w)Write the lexicon in human-readable format to the Writer. | 
static final java.lang.String UNKNOWN_WORD
static final java.lang.String BOUNDARY
static final java.lang.String BOUNDARY_TAG
boolean isKnown(int word)
word - The word as an intboolean isKnown(java.lang.String word)
word - The word as a Stringjava.util.Set<java.lang.String> tagSet(java.util.function.Function<java.lang.String,java.lang.String> basicCategoryFunction)
java.util.Iterator<IntTaggedWord> ruleIteratorByWord(int word, int loc, java.lang.String featureSpec)
word - The word, represented as an integer in Indexloc - The position of the word in the sentence (counting from 0).
                Implementation note: The BaseLexicon class doesn't
                actually make use of this position information.featureSpec - Additional word features like morphosyntactic information.tag -> word rule.)java.util.Iterator<IntTaggedWord> ruleIteratorByWord(java.lang.String word, int loc, java.lang.String featureSpec)
int numRules()
void initializeTraining(double numTrees)
void train(java.util.Collection<Tree> trees)
trees - Trees to train onvoid train(java.util.Collection<Tree> trees, double weight)
void train(Tree tree, double weight)
void train(java.util.List<TaggedWord> sentence, double weight)
void train(TaggedWord tw, int loc, double weight)
void incrementTreesRead(double weight)
void trainUnannotated(java.util.List<TaggedWord> sentence, double weight)
void finishTraining()
float score(IntTaggedWord iTW, int loc, java.lang.String word, java.lang.String featureSpec)
iTW - An IntTaggedWord pairing a word and POS tagloc - The position in the sentence.  In the default implementation
               this is used only for unknown words to change their
               probability distribution when sentence initial.word - The word itself; useful so we don't have to look it
               up in an indexfeatureSpec - TODOvoid writeData(java.io.Writer w)
        throws java.io.IOException
w - The writer to output tojava.io.IOException - If any I/O problemvoid readData(java.io.BufferedReader in)
       throws java.io.IOException
in - The BufferedReader to read fromjava.io.IOException - If any I/O problemUnknownWordModel getUnknownWordModel()
void setUnknownWordModel(UnknownWordModel uwm)