public class TokenSequencePattern extends SequencePattern<CoreMap>
CoreMap).
Sequences over tokens can be matched like strings.
To use:
TokenSequencePattern p = TokenSequencePattern.compile("....");
TokenSequenceMatcher m = p.getMatcher(tokens);
while (m.find()) ....
Supports the following:
X YX | YX & Y(X) (with numeric group id)(?$var X) (with group name "$var")(?:X)m.group()) or list of tokens (m.groupNodes()).
m.group(id) or m.groupNodes(id)
m.group("$var") or m.groupNodes("$var")
SequenceMatchResult for more accessor functions to retrieve matches.
X+, X?, X*, X{n,m}, X{n}, X{n,}X+?, X??, X*?, X{n,m}?, X{n}?, X{n,}?\captureid [pattern] => [value].
Value for matched expression can be accessed using m.groupValue()
( one => 1 | two => 2 | three => 3 | ...)
Individual tokens are marked by "[" TOKEN_EXPR "]"
Possible TOKEN_EXPR:
{ lemma:/.../; tag:"NNP" } = attributes that need to all match.
If only one attribute, the {} can be dropped.
AnnotationLookup for a list of predefined token attribute names.
/.../ used for regular expressions,
"..." for exact string matches
{ word>=2 }
">=", "<=", ">", "<", or "=="
{ word::IS_NUM } , { word::IS_NIL } or
{ word::NOT_EXISTS }, { word::NOT_NIL } or { word::EXISTS }
/.../ or "..."
!{...}
{...} & {...} or {...} | {...}
Special tokens:
Any token: []
String pattern match across multiple tokens:
(?m){min,max} /pattern/
Special expressions: indicated by double braces: {{ expr }}
See Expressions for syntax.
Binding of variables for use in compiling patterns:
Env env = TokenSequencePattern.getNewEnv() to create a new environment for binding env.bind("numtype", CoreAnnotations.NumericTypeAnnotation.class);
// Bind string for later compilation using: compile("/it/ /was/ $RELDAY");
env.bind("$RELDAY", "/today|yesterday|tomorrow|tonight|tonite/");
// Bind pre-compiled patter for later compilation using: compile("/it/ /was/ $RELDAY");
env.bind("$RELDAY", TokenSequencePattern.compile(env, "/today|yesterday|tomorrow|tonight|tonite/"));
// Bind node pattern so we can do patterns like: compile("... temporal::IS_TIMEX_DATE ...");
// (TimexTypeMatchNodePattern is a NodePattern that implements some custom logic)
env.bind("::IS_TIMEX_DATE", new TimexTypeMatchNodePattern(SUTime.TimexType.DATE));
Actions (partially implemented)
pattern ==> action &annotate( { ner="DATE" } ) pattern.getAction().apply(match, groupid)TokenSequenceMatcher,
Serialized FormSequencePattern.AndPatternExpr, SequencePattern.BackRefPatternExpr, SequencePattern.GroupPatternExpr, SequencePattern.MultiNodePatternExpr, SequencePattern.NodePatternExpr, SequencePattern.NodesMatchChecker<T>, SequencePattern.OrPatternExpr, SequencePattern.Parser<T>, SequencePattern.PatternExpr, SequencePattern.RepeatPatternExpr, SequencePattern.SequenceEndPatternExpr, SequencePattern.SequencePatternExpr, SequencePattern.SequenceStartPatternExpr, SequencePattern.SpecialNodePatternExpr, SequencePattern.ValuePatternExpr| Modifier and Type | Field and Description |
|---|---|
static TokenSequencePattern |
ANY_NODE_PATTERN |
ANY_NODE_PATTERN_EXPR, MATCH_STATE, NODES_EQUAL_CHECKER, SEQ_BEGIN_PATTERN_EXPR, SEQ_END_PATTERN_EXPR| Constructor and Description |
|---|
TokenSequencePattern(java.lang.String patternStr,
SequencePattern.PatternExpr nodeSequencePattern) |
TokenSequencePattern(java.lang.String patternStr,
SequencePattern.PatternExpr nodeSequencePattern,
SequenceMatchAction<CoreMap> action) |
| Modifier and Type | Method and Description |
|---|---|
static TokenSequencePattern |
compile(Env env,
java.lang.String... strings)
Compiles a sequence of regular expressions into a TokenSequencePattern
using the specified environment.
|
static TokenSequencePattern |
compile(Env env,
java.lang.String string)
Compiles a regular expression over tokens into a TokenSequencePattern
using the specified environment.
|
static TokenSequencePattern |
compile(SequencePattern.PatternExpr nodeSequencePattern)
Compiles a PatternExpr into a TokenSequencePattern.
|
static TokenSequencePattern |
compile(java.lang.String... strings)
Compiles a sequence of regular expressions into a TokenSequencePattern
using the default environment.
|
static TokenSequencePattern |
compile(java.lang.String string)
Compiles a regular expressions over tokens into a TokenSequencePattern
using the default environment.
|
TokenSequenceMatcher |
getMatcher(java.util.List<? extends CoreMap> tokens)
Returns a TokenSequenceMatcher that can be used to match this pattern
against the specified list of tokens.
|
static MultiPatternMatcher<CoreMap> |
getMultiPatternMatcher(java.util.Collection<TokenSequencePattern> patterns)
Create a multi-pattern matcher for matching across multiple TokensRegex patterns.
|
static MultiPatternMatcher<CoreMap> |
getMultiPatternMatcher(TokenSequencePattern... patterns)
Create a multi-pattern matcher for matching across multiple TokensRegex patterns.
|
static Env |
getNewEnv() |
TokenSequenceMatcher |
matcher(java.util.List<? extends CoreMap> tokens)
Returns a TokenSequenceMatcher that can be used to match this pattern
against the specified list of tokens.
|
java.lang.String |
toString()
Returns a String representation of the TokenSequencePattern.
|
findNodePattern, getAction, getPatternExpr, getPriority, getTotalGroups, getWeight, pattern, setAction, setPriority, setWeight, transformpublic static final TokenSequencePattern ANY_NODE_PATTERN
public TokenSequencePattern(java.lang.String patternStr,
SequencePattern.PatternExpr nodeSequencePattern)
public TokenSequencePattern(java.lang.String patternStr,
SequencePattern.PatternExpr nodeSequencePattern,
SequenceMatchAction<CoreMap> action)
public static Env getNewEnv()
public static TokenSequencePattern compile(java.lang.String string)
string - Regular expression to be compiledpublic static TokenSequencePattern compile(Env env, java.lang.String string)
env - Environment to usestring - Regular expression to be compiledpublic static TokenSequencePattern compile(java.lang.String... strings)
strings - List of regular expression to be compiledpublic static TokenSequencePattern compile(Env env, java.lang.String... strings)
env - Environment to usestrings - List of regular expression to be compiledpublic static TokenSequencePattern compile(SequencePattern.PatternExpr nodeSequencePattern)
nodeSequencePattern - A sequence pattern expression (before translation into a NFA)public TokenSequenceMatcher getMatcher(java.util.List<? extends CoreMap> tokens)
getMatcher in class SequencePattern<CoreMap>tokens - List of tokens to match againstpublic TokenSequenceMatcher matcher(java.util.List<? extends CoreMap> tokens)
tokens - List of tokens to match againstpublic java.lang.String toString()
toString in class SequencePattern<CoreMap>public static MultiPatternMatcher<CoreMap> getMultiPatternMatcher(java.util.Collection<TokenSequencePattern> patterns)
patterns - Collection of input patternspublic static MultiPatternMatcher<CoreMap> getMultiPatternMatcher(TokenSequencePattern... patterns)
patterns - input patterns