de.dfki.lt.tools.tokenizer
Class Description

java.lang.Object
  extended byde.dfki.lt.tools.tokenizer.Description
Direct Known Subclasses:
AbbrevDescription, CliticsDescription, NumbersDescription, PunctDescription

public abstract class Description
extends java.lang.Object

Description is an abstract class that provides common methods to manage the content of description files.

Version:
$Id: Description.java,v 1.6 2005/04/12 08:47:37 steffen Exp $
Author:
Joerg Steffen, DFKI

Field Summary
protected static java.lang.String DEF_CLASS
          This is the attribute of a definition or list element that contains the class name.
protected static java.lang.String DEF_REGEXP
          This is the attribute of a definition element that contains the regular expression.
protected  java.util.HashMap definitionsMap
          This maps a class to a regular expression that matches all tokens of this class.
protected static java.lang.String DEFS
          This is the name of the element with the definitions in the description files.
protected static RegExpFactory FACTORY
          This is the factory for creating regular expressions.
protected static java.lang.String LIST_ENCODING
          This is the attribute of a list element that contains the encoding of the list file.
protected static java.lang.String LIST_FILE
          This is the attribute of a list element that point to the list file.
protected static java.lang.String LISTS
          This is the name of the element with the lists in the description files.
protected  java.util.HashMap listsMap
          This maps a class to a hash map that contains members of this class.
protected  java.util.HashMap regExpMap
          This maps regular expressions of rules to class names of the matched expression.
protected static java.lang.String RULES
          This is the name of the element with the rules in the description files.
protected  java.util.HashMap rulesMap
          This maps the rule names to regular expressions that match the tokens as described by the rule.
 
Constructor Summary
Description()
           
 
Method Summary
protected  java.util.HashMap getDefinitionsMap()
          This returns the field definitionsMap.
protected  java.util.HashMap getListsMap()
          This returns the field listsMap.
protected  java.util.HashMap getRegExpMap()
          This returns the field regExpMap.
protected  java.util.HashMap getRulesMap()
          This returns the field rulesMap.
protected  void loadDefinitions(org.w3c.dom.Document aDescr, java.util.Set classes)
          This uses the definitions section in a description file to map each token class from the definitions to a regular expression that matches all tokens of that class.
protected  void loadLists(org.w3c.dom.Document aDescr, java.util.Set classes, java.lang.String aResourceDir)
          This uses the lists section in a description file to map each token class from the lists to a hashmap that contains all members of that class.
protected  void loadRules(org.w3c.dom.Document aDescr)
          This maps each rule from the description to a regular expression that matches all tokens from that rule.
protected  void setDefinitionsMap(java.util.HashMap aDefinitionsMap)
          This sets the field definitionsMap to aDefinitionsMap.
protected  void setListsMap(java.util.HashMap aListsMap)
          This sets the field listsMap to aListsMap.
protected  void setRegExpMap(java.util.HashMap aRegExpMap)
          This sets the field regExpMap to aRegExpMap.
protected  void setRulesMap(java.util.HashMap aRulesMap)
          This sets the field rulesMap to aRulesMap.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

DEFS

protected static final java.lang.String DEFS
This is the name of the element with the definitions in the description files.

See Also:
Constant Field Values

DEF_REGEXP

protected static final java.lang.String DEF_REGEXP
This is the attribute of a definition element that contains the regular expression.

See Also:
Constant Field Values

DEF_CLASS

protected static final java.lang.String DEF_CLASS
This is the attribute of a definition or list element that contains the class name.

See Also:
Constant Field Values

LISTS

protected static final java.lang.String LISTS
This is the name of the element with the lists in the description files.

See Also:
Constant Field Values

LIST_FILE

protected static final java.lang.String LIST_FILE
This is the attribute of a list element that point to the list file.

See Also:
Constant Field Values

LIST_ENCODING

protected static final java.lang.String LIST_ENCODING
This is the attribute of a list element that contains the encoding of the list file.

See Also:
Constant Field Values

RULES

protected static final java.lang.String RULES
This is the name of the element with the rules in the description files.

See Also:
Constant Field Values

FACTORY

protected static RegExpFactory FACTORY
This is the factory for creating regular expressions.


definitionsMap

protected java.util.HashMap definitionsMap
This maps a class to a regular expression that matches all tokens of this class. The regular expression is build as a disjunction of the regular expressions used in the definitions. If a rule matches expressions from more than one class, this is used to identify the class.


rulesMap

protected java.util.HashMap rulesMap
This maps the rule names to regular expressions that match the tokens as described by the rule.


regExpMap

protected java.util.HashMap regExpMap
This maps regular expressions of rules to class names of the matched expression. This is used for rules that only match expressions that all have the same class.


listsMap

protected java.util.HashMap listsMap
This maps a class to a hash map that contains members of this class.

Constructor Detail

Description

public Description()
Method Detail

getDefinitionsMap

protected java.util.HashMap getDefinitionsMap()
This returns the field definitionsMap.

Returns:
a HashMap

setDefinitionsMap

protected void setDefinitionsMap(java.util.HashMap aDefinitionsMap)
This sets the field definitionsMap to aDefinitionsMap.

Parameters:
aDefinitionsMap - a HashMap

getRulesMap

protected java.util.HashMap getRulesMap()
This returns the field rulesMap.

Returns:
a HashMap

setRulesMap

protected void setRulesMap(java.util.HashMap aRulesMap)
This sets the field rulesMap to aRulesMap.

Parameters:
aRulesMap - a HashMap

getRegExpMap

protected java.util.HashMap getRegExpMap()
This returns the field regExpMap.

Returns:
a HashMap

setRegExpMap

protected void setRegExpMap(java.util.HashMap aRegExpMap)
This sets the field regExpMap to aRegExpMap.

Parameters:
aRegExpMap - a HashMap

getListsMap

protected java.util.HashMap getListsMap()
This returns the field listsMap.

Returns:
a HashMap

setListsMap

protected void setListsMap(java.util.HashMap aListsMap)
This sets the field listsMap to aListsMap.

Parameters:
aListsMap - a HashMap

loadDefinitions

protected void loadDefinitions(org.w3c.dom.Document aDescr,
                               java.util.Set classes)
This uses the definitions section in a description file to map each token class from the definitions to a regular expression that matches all tokens of that class.

Parameters:
aDescr - a dom Document with a description
classes - a Set with the defined classes, used for validation
Throws:
InitializationException - if definitions description contains illegal regular expression or undefined classes

loadRules

protected void loadRules(org.w3c.dom.Document aDescr)
This maps each rule from the description to a regular expression that matches all tokens from that rule.

Parameters:
aDescr - a dom Document with the description
Throws:
InitializationException - if rules description contains illegal regular expression

loadLists

protected void loadLists(org.w3c.dom.Document aDescr,
                         java.util.Set classes,
                         java.lang.String aResourceDir)
This uses the lists section in a description file to map each token class from the lists to a hashmap that contains all members of that class.

Parameters:
aDescr - a dom Document with a description
classes - a Set with the defined classes, used for validation
aResourceDir - a String with the name of the resource directory
Throws:
InitializationException - if lists description contains undefined classes or file