de.dfki.lt.hog.util
Class CorrectCharPosHandler

java.lang.Object
  extended by org.xml.sax.helpers.DefaultHandler
      extended by de.dfki.lt.hog.util.CorrectCharPosHandler
All Implemented Interfaces:
org.xml.sax.ContentHandler, org.xml.sax.DTDHandler, org.xml.sax.EntityResolver, org.xml.sax.ErrorHandler

public class CorrectCharPosHandler
extends org.xml.sax.helpers.DefaultHandler

Correct byte-offset character position to Unicode char-offset character position (a fix to EUC-JP-only output of ChaSen for use in Heart of Gold). This code, however, is completely ChaSen-independent and works on any Java-supported byte-oriented encoding. Created 2006-03-01


Constructor Summary
CorrectCharPosHandler()
           
 
Method Summary
 void characters(char[] buf, int offset, int len)
           
static java.lang.String correctCharPos(java.lang.String markup, java.lang.String rawtext, java.lang.String encoding, java.lang.String cstartattr, java.lang.String cendattr)
          Correct character positions (instead of byte position as output by ChaSen)
 void endElement(java.lang.String namespaceURI, java.lang.String sName, java.lang.String qName)
           
 java.lang.String parseAndCorrect(java.lang.String markup, CorrectCharPosHandler handler, int[] mapping, java.lang.String cstartattr, java.lang.String cendattr)
           
 void startElement(java.lang.String namespaceURI, java.lang.String lName, java.lang.String qName, org.xml.sax.Attributes attrs)
           
 
Methods inherited from class org.xml.sax.helpers.DefaultHandler
endDocument, endPrefixMapping, error, fatalError, ignorableWhitespace, notationDecl, processingInstruction, resolveEntity, setDocumentLocator, skippedEntity, startDocument, startPrefixMapping, unparsedEntityDecl, warning
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

CorrectCharPosHandler

public CorrectCharPosHandler()
Method Detail

correctCharPos

public static java.lang.String correctCharPos(java.lang.String markup,
                                              java.lang.String rawtext,
                                              java.lang.String encoding,
                                              java.lang.String cstartattr,
                                              java.lang.String cendattr)
                                       throws java.io.UnsupportedEncodingException,
                                              org.xml.sax.SAXException,
                                              javax.xml.parsers.ParserConfigurationException,
                                              java.io.IOException
Correct character positions (instead of byte position as output by ChaSen)

Returns:
modified XML string
Throws:
java.io.UnsupportedEncodingException
org.xml.sax.SAXException
javax.xml.parsers.ParserConfigurationException
java.io.IOException

parseAndCorrect

public java.lang.String parseAndCorrect(java.lang.String markup,
                                        CorrectCharPosHandler handler,
                                        int[] mapping,
                                        java.lang.String cstartattr,
                                        java.lang.String cendattr)
                                 throws org.xml.sax.SAXException,
                                        javax.xml.parsers.ParserConfigurationException,
                                        java.io.IOException
Throws:
org.xml.sax.SAXException
javax.xml.parsers.ParserConfigurationException
java.io.IOException

startElement

public void startElement(java.lang.String namespaceURI,
                         java.lang.String lName,
                         java.lang.String qName,
                         org.xml.sax.Attributes attrs)
                  throws org.xml.sax.SAXException
Specified by:
startElement in interface org.xml.sax.ContentHandler
Overrides:
startElement in class org.xml.sax.helpers.DefaultHandler
Throws:
org.xml.sax.SAXException

endElement

public void endElement(java.lang.String namespaceURI,
                       java.lang.String sName,
                       java.lang.String qName)
                throws org.xml.sax.SAXException
Specified by:
endElement in interface org.xml.sax.ContentHandler
Overrides:
endElement in class org.xml.sax.helpers.DefaultHandler
Throws:
org.xml.sax.SAXException

characters

public void characters(char[] buf,
                       int offset,
                       int len)
                throws org.xml.sax.SAXException
Specified by:
characters in interface org.xml.sax.ContentHandler
Overrides:
characters in class org.xml.sax.helpers.DefaultHandler
Throws:
org.xml.sax.SAXException