public class DefaultLexicalMapper extends java.lang.Object implements Mapper, java.io.Serializable
| Modifier and Type | Field and Description |
|---|---|
java.util.regex.Pattern |
arabicDigit |
java.util.regex.Pattern |
arabicPunc |
java.util.regex.Pattern |
latinPunc |
java.util.regex.Pattern |
segmentationMarker |
| Constructor and Description |
|---|
DefaultLexicalMapper() |
| Modifier and Type | Method and Description |
|---|---|
boolean |
canChangeEncoding(java.lang.String parent,
java.lang.String element)
Indicates whether
child can be converted to another encoding. |
static void |
main(java.lang.String[] args) |
java.lang.String |
map(java.lang.String parent,
java.lang.String element)
Maps from one string representation to another.
|
void |
setup(java.io.File path,
java.lang.String... options)
Perform initialization prior to the first call to
map. |
public final java.util.regex.Pattern latinPunc
public final java.util.regex.Pattern arabicPunc
public final java.util.regex.Pattern arabicDigit
public final java.util.regex.Pattern segmentationMarker
public java.lang.String map(java.lang.String parent,
java.lang.String element)
Mapperpublic void setup(java.io.File path,
java.lang.String... options)
Mappermap.public boolean canChangeEncoding(java.lang.String parent,
java.lang.String element)
Mapperchild can be converted to another encoding. In the ATB, for example,
if a punctuation character is labeled with the "PUNC" POS tag, then that character should not
be converted from Buckwalter to UTF-8.canChangeEncoding in interface Mapperparent - element's context (e.g., the parent node in a parse tree)element - The string to be transformed.public static void main(java.lang.String[] args)