|
||||||||||
PREV NEXT | FRAMES NO FRAMES |
AbbrevDescription
extends Description
.AbbrevDescription
for
the abbreviation description contained in the dom
Document abbrDescr
.
AnnotatedString
is an interface for annotating strings
and working on them.CliticsDescription
extends Description
.CliticsDescription
for
the clitics description contained in the dom
Document clitDescr
.
List
of Paragraph
s with
TextUnit
and Token
from an annotated
input
.
input
.
input
.
Description
is an abstract class that provides common
methods to manage the content of description files.FastAnnotatedString
is a fast implementation of
the AnnotatedString
interface.FastAnnotatedString
for a text in inputString
.
FileTools
provides static methods to work on files and
stream.Match
es for the regular
expression in input
.
List
with all Match
es for the regular
expression in input
.
Description.definitionsMap
.
aDirectory
with suffix aSuffix
and
returns them in a List
.
LanguageResource
for the given language
if available
Description.listsMap
.
Description.regExpMap
.
Description.rulesMap
.
HashMap
that maps class names to
their tags as defined in the class definition file.
XML_TEXT_UNIT
that contains the text unit id.
XML_TOKEN
that contains the token image.
InitializationException
is thrown when the tokenizer
can't be initialized.InitializationException
.
InitializationException
with an error message
aMessage
tag1
is
ancestor in the class hierarchy of the class of a token with tag
tag2
or if the token classes are equal in the token
class hierarchy for aLanguage
.
JTok
is a low level tokenizer tool that recognizes
paragraphs, sentences, tokens, punctuation, numbers, abbreviations,
etc.JTok
using
the properties in configProps
.
JavaRegExp
implements the RegExp
interface for
regular expressions of the java.util.regex package.JavaRegExp
for a
String
containing a regular expression.
JavaRegExpFactory
extends RegExpFactory
for
regular expressions of the java.util.regex package.JavaRegExpFactory
.
XML_TOKEN
that contains the token length.
LanguageNotSupportedException
is thrown when the
necessary language resources are not available.LanguageNotSupportedException
.
LanguageNotSupportedException
with an error message
aMessage
LanguageResource
class manages the language-specific
information needed by the tokenizer to process a document of that
language.LanguageResource
for
aLanguage
by using the resource description files in
aResourceDir
.
Match
holds the result of matching an input string
with a regular expression.Match
using the given
parameters.
TestJTok
with it.
NumbersDescription
extends Description
.NumbersDescription
for
the numbers description contained in the dom
Document numbDescr
.
XML_TOKEN
that contains the token offset.
Paragraph
.
Paragraph
that
contains the given text units.
ParagraphOutputter
provides static methods that
convert a AnnotatedString
into a list of nested representation of Paragraph
s with TextUnit
s and Token
s.ProcessingException
is thrown when the
processing of input data causes an error.ProcessingException
.
ProcessingException
with an error message aMessage
PunctDescription
extends Description
.PunctDescription
for
the punctuation description contained in the dom
Document punctDescr
.
RegExp
defines an interface for regular expression
patterns.RegExpFactory
is an abstract class for creating
objects that fit the RegExp
interface.Description.definitionsMap
to
aDefinitionsMap
.
anEndIndex
.
anEndIndex
.
anEndIndex
.
anImage
.
Description.listsMap
to
aListsMap
.
Description.regExpMap
to
aRegExpMap
.
Description.rulesMap
to
aRulesMap
.
aStartIndex
.
aStartIndex
.
aStartIndex
.
someTextUnits
.
someTokens
.
aType
.
XML_TOKEN
that contains the token type.
TestFastAnnotatedString
is a test class for FastAnnotatedString
.JTok
.TestJTok
using
the test description in the file configFile
.
TextUnit
.
TextUnit
that
contains the given tokens.
Token
.
Token
with the given
start index, end index, type and surface image of the token.
String
matching the regular
expression pattern.
String
that contains the text to
tokenize and parses it for aLanguage
.
XMLOutputter
provides static methods that return an
XML presentation of a AnnotatedString
.
|
||||||||||
PREV NEXT | FRAMES NO FRAMES |