| Interface | Description |
|---|---|
| Annotator |
This is an interface for adding annotations to a partially annotated
Annotation.
|
| Class | Description |
|---|---|
| Annotation |
An annotation representing a span of text in a document.
|
| Annotator.Requirement |
The Requirement is a general way of describing the pre and post
conditions of an Annotator running.
|
| ChunkAnnotationUtils |
Utility functions for annotating chunks
|
| CoreMapAggregator |
Function that aggregates several core maps into one
|
| CoreMapAttributeAggregator |
Functions for aggregating token attributes.
|
| CoreMapAttributeAggregator.ConcatAggregator | |
| CoreMapAttributeAggregator.ConcatCoreMapListAggregator<T extends CoreMap> | |
| CoreMapAttributeAggregator.ConcatListAggregator<T> | |
| CoreMapAttributeAggregator.MostFreqAggregator | |
| DefaultPaths |
Default model paths for StanfordCoreNLP
All these paths point to files distributed with the model jar file (stanford-corenlp-models-*.jar)
|
| LabeledChunkIdentifier |
Identifies chunks based on labels that uses IOB like encoding
Assumes labels have the form
|
| LabeledChunkIdentifier.LabelTagType |
Class representing a label, tag and type
|
TextAnnotation.class). They should also specify what they add
to the annotation, and where.
public void testPipeline(String text) throws Exception {
// create pipeline
AnnotationPipeline pipeline = new AnnotationPipeline();
pipeline.addAnnotator(new TokenizerAnnotator(false, "en"));
pipeline.addAnnotator(new WordsToSentencesAnnotator(false));
pipeline.addAnnotator(new POSTaggerAnnotator(false));
pipeline.addAnnotator(new MorphaAnnotator(false));
pipeline.addAnnotator(new NERCombinerAnnotator(false));
pipeline.addAnnotator(new ParserAnnotator(false, -1));
// create annotation with text
Annotation document = new Annotation(text);
// annotate text with pipeline
pipeline.annotate(document);
// demonstrate typical usage
for (CoreMap sentence: document.get(CoreAnnotations.SentencesAnnotation.class)) {
// get the tree for the sentence
Tree tree = sentence.get(TreeAnnotation.class);
// get the tokens for the sentence and iterate over them
for (CoreLabel token: sentence.get(CoreAnnotations.TokensAnnotation.class)) {
// get token attributes
String tokenText = token.get(TextAnnotation.class);
String tokenPOS = token.get(PartOfSpeechAnnotation.class);
String tokenLemma = token.get(LemmaAnnotation.class);
String tokenNE = token.get(NamedEntityTagAnnotation.class);
}
}
}
./bin/stanfordcorenlp.shor
java -cp stanford-corenlp-YYYY-MM-DD.jar:stanford-corenlp-YYYY-MM-DD-models.jar:xom.jar:joda-time.jar -Xmx3g edu.stanford.nlp.pipeline.StanfordCoreNLP [ -props YOUR_CONFIGURATION_FILE ] -file YOUR_INPUT_FILEwhere the following properties are defined: (if
-props or annotators is not defined, default properties will be loaded via the classpath)
"annotators" - comma separated list of annotators
The following annotators are supported: tokenize, ssplit, pos, lemma, ner, truecase, parse, dcoref, nfl
More information is available here: Stanford CoreNLP