|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectmorfologik.tools.MorphEncoder
public final class MorphEncoder
A class that converts tabular data to fsa morphological format. Three formats are supported:
standardEncode(byte[], byte[], byte[])
prefixEncode(byte[], byte[], byte[])
infixEncode(byte[], byte[], byte[])
Constructor Summary | |
---|---|
MorphEncoder()
|
|
MorphEncoder(byte annotationSeparator)
|
Method Summary | |
---|---|
protected static java.lang.String |
asString(byte[] str,
java.lang.String encoding)
Converts a byte array to a given encoding. |
static int |
commonPrefix(byte[] s1,
byte[] s2)
|
byte[] |
infixEncode(byte[] wordForm,
byte[] wordLemma,
byte[] wordTag)
This method converts wordform, wordLemma and the tag to the form: inflected_form + MLKending + tags where '+' is a separator, M is the position of characters to be deleted towards the beginning of the inflected form ("A" means from the beginning, "B" from the second character, "C" - from the third one, and so on), L is the number of characters to be deleted from the position specified by M ("A" means none, "B" means one, "C" - 2, etc.), K is a character that specifies how many characters should be deleted from the end of the inflected form to produce the lexeme by concatenating the stripped string with the ending ("A" means none, "B' - 1, "C" - 2, and so on). |
java.lang.String |
infixEncodeUTF8(java.lang.String wordForm,
java.lang.String wordLemma,
java.lang.String wordTag)
A UTF-8 variant of infixEncode(byte[], byte[], byte[]) . |
byte[] |
prefixEncode(byte[] wordForm,
byte[] wordLemma,
byte[] wordTag)
This method converts wordform, wordLemma and the tag to the form: inflected_form + LKending + tags where '+' is a separator, L is the number of characters to be deleted from the beginning of the word ("A" means none, "B" means one, "C" - 2, etc.), K is a character that specifies how many characters should be deleted from the end of the inflected form to produce the lexeme by concatenating the stripped string with the ending ("A" means none, "B' - 1, "C" - 2, and so on). |
java.lang.String |
prefixEncodeUTF8(java.lang.String wordForm,
java.lang.String wordLemma,
java.lang.String wordTag)
A UTF-8 variant of prefixEncode(byte[], byte[], byte[]) This
method converts wordform, wordLemma and the tag to the form:
inflected_form + LKending + tags
where '+' is a separator, L is the number of characters to be deleted
from the beginning of the word ("A" means none, "B" means one, "C" - 2,
etc.), K is a character that specifies how many characters should be
deleted from the end of the inflected form to produce the lexeme by
concatenating the stripped string with the ending ("A" means none,
"B' - 1, "C" - 2, and so on). |
byte[] |
standardEncode(byte[] wordForm,
byte[] wordLemma,
byte[] wordTag)
This method converts the wordForm, wordLemma and tag to the form: wordForm + Kending + tags where '+' is a separator, K is a character that specifies how many characters should be deleted from the end of the inflected form to produce the lexeme by concatenating the stripped string with the ending. |
java.lang.String |
standardEncodeUTF8(java.lang.String wordForm,
java.lang.String wordLemma,
java.lang.String wordTag)
A UTF-8 variant of standardEncode(byte[], byte[], byte[]) This
method converts the wordForm, wordLemma and tag to the form:
wordForm + Kending + tags
where '+' is a separator, K is a character that specifies how many
characters should be deleted from the end of the inflected form to
produce the lexeme by concatenating the stripped string with the ending. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public MorphEncoder()
public MorphEncoder(byte annotationSeparator)
Method Detail |
---|
public static int commonPrefix(byte[] s1, byte[] s2)
public byte[] standardEncode(byte[] wordForm, byte[] wordLemma, byte[] wordTag)
wordForm + Kending + tagswhere '+' is a separator, K is a character that specifies how many characters should be deleted from the end of the inflected form to produce the lexeme by concatenating the stripped string with the ending.
public byte[] prefixEncode(byte[] wordForm, byte[] wordLemma, byte[] wordTag)
inflected_form + LKending + tags
where '+' is a separator, L is the number of characters to be deleted from the beginning of the word ("A" means none, "B" means one, "C" - 2, etc.), K is a character that specifies how many characters should be deleted from the end of the inflected form to produce the lexeme by concatenating the stripped string with the ending ("A" means none, "B' - 1, "C" - 2, and so on).
wordForm
- - inflected word formwordLemma
- - canonical formwordTag
- - tag
public byte[] infixEncode(byte[] wordForm, byte[] wordLemma, byte[] wordTag)
inflected_form + MLKending + tags
where '+' is a separator, M is the position of characters to be deleted towards the beginning of the inflected form ("A" means from the beginning, "B" from the second character, "C" - from the third one, and so on), L is the number of characters to be deleted from the position specified by M ("A" means none, "B" means one, "C" - 2, etc.), K is a character that specifies how many characters should be deleted from the end of the inflected form to produce the lexeme by concatenating the stripped string with the ending ("A" means none, "B' - 1, "C" - 2, and so on).
wordForm
- - inflected word formwordLemma
- - canonical formwordTag
- - tag
protected static java.lang.String asString(byte[] str, java.lang.String encoding)
str
- Byte-array to be converted.
public java.lang.String standardEncodeUTF8(java.lang.String wordForm, java.lang.String wordLemma, java.lang.String wordTag) throws java.io.UnsupportedEncodingException
standardEncode(byte[], byte[], byte[])
This
method converts the wordForm, wordLemma and tag to the form:
wordForm + Kending + tagswhere '+' is a separator, K is a character that specifies how many characters should be deleted from the end of the inflected form to produce the lexeme by concatenating the stripped string with the ending.
java.io.UnsupportedEncodingException
public java.lang.String prefixEncodeUTF8(java.lang.String wordForm, java.lang.String wordLemma, java.lang.String wordTag) throws java.io.UnsupportedEncodingException
prefixEncode(byte[], byte[], byte[])
This
method converts wordform, wordLemma and the tag to the form:
inflected_form + LKending + tags
where '+' is a separator, L is the number of characters to be deleted from the beginning of the word ("A" means none, "B" means one, "C" - 2, etc.), K is a character that specifies how many characters should be deleted from the end of the inflected form to produce the lexeme by concatenating the stripped string with the ending ("A" means none, "B' - 1, "C" - 2, and so on).
wordForm
- - inflected word formwordLemma
- - canonical formwordTag
- - tag
java.io.UnsupportedEncodingException
public java.lang.String infixEncodeUTF8(java.lang.String wordForm, java.lang.String wordLemma, java.lang.String wordTag) throws java.io.UnsupportedEncodingException
infixEncode(byte[], byte[], byte[])
.
This method converts wordform, wordLemma and the tag to the form:
inflected_form + MLKending + tags
where '+' is a separator, M is the position of characters to be deleted towards the beginning of the inflected form ("A" means from the beginning, "B" from the second character, "C" - from the third one, and so on), L is the number of characters to be deleted from the position specified by M ("A" means none, "B" means one, "C" - 2, etc.), K is a character that specifies how many characters should be deleted from the end of the inflected form to produce the lexeme by concatenating the stripped string with the ending ("A" means none, "B' - 1, "C" - 2, and so on).
wordForm
- - inflected word formwordLemma
- - canonical formwordTag
- - tag
java.io.UnsupportedEncodingException
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |