de.dfki.lt.hog.util
Class CorrectCharPosHandler
java.lang.Object
org.xml.sax.helpers.DefaultHandler
de.dfki.lt.hog.util.CorrectCharPosHandler
- All Implemented Interfaces:
- org.xml.sax.ContentHandler, org.xml.sax.DTDHandler, org.xml.sax.EntityResolver, org.xml.sax.ErrorHandler
public class CorrectCharPosHandler
- extends org.xml.sax.helpers.DefaultHandler
Correct byte-offset character position to Unicode char-offset character position
(a fix to EUC-JP-only output of ChaSen for use in Heart of Gold).
This code, however, is completely ChaSen-independent and works on any Java-supported
byte-oriented encoding.
Created 2006-03-01
Method Summary |
void |
characters(char[] buf,
int offset,
int len)
|
static java.lang.String |
correctCharPos(java.lang.String markup,
java.lang.String rawtext,
java.lang.String encoding,
java.lang.String cstartattr,
java.lang.String cendattr)
Correct character positions (instead of byte position as output by ChaSen) |
void |
endElement(java.lang.String namespaceURI,
java.lang.String sName,
java.lang.String qName)
|
java.lang.String |
parseAndCorrect(java.lang.String markup,
CorrectCharPosHandler handler,
int[] mapping,
java.lang.String cstartattr,
java.lang.String cendattr)
|
void |
startElement(java.lang.String namespaceURI,
java.lang.String lName,
java.lang.String qName,
org.xml.sax.Attributes attrs)
|
Methods inherited from class org.xml.sax.helpers.DefaultHandler |
endDocument, endPrefixMapping, error, fatalError, ignorableWhitespace, notationDecl, processingInstruction, resolveEntity, setDocumentLocator, skippedEntity, startDocument, startPrefixMapping, unparsedEntityDecl, warning |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
CorrectCharPosHandler
public CorrectCharPosHandler()
correctCharPos
public static java.lang.String correctCharPos(java.lang.String markup,
java.lang.String rawtext,
java.lang.String encoding,
java.lang.String cstartattr,
java.lang.String cendattr)
throws java.io.UnsupportedEncodingException,
org.xml.sax.SAXException,
javax.xml.parsers.ParserConfigurationException,
java.io.IOException
- Correct character positions (instead of byte position as output by ChaSen)
- Returns:
- modified XML string
- Throws:
java.io.UnsupportedEncodingException
org.xml.sax.SAXException
javax.xml.parsers.ParserConfigurationException
java.io.IOException
parseAndCorrect
public java.lang.String parseAndCorrect(java.lang.String markup,
CorrectCharPosHandler handler,
int[] mapping,
java.lang.String cstartattr,
java.lang.String cendattr)
throws org.xml.sax.SAXException,
javax.xml.parsers.ParserConfigurationException,
java.io.IOException
- Throws:
org.xml.sax.SAXException
javax.xml.parsers.ParserConfigurationException
java.io.IOException
startElement
public void startElement(java.lang.String namespaceURI,
java.lang.String lName,
java.lang.String qName,
org.xml.sax.Attributes attrs)
throws org.xml.sax.SAXException
- Specified by:
startElement
in interface org.xml.sax.ContentHandler
- Overrides:
startElement
in class org.xml.sax.helpers.DefaultHandler
- Throws:
org.xml.sax.SAXException
endElement
public void endElement(java.lang.String namespaceURI,
java.lang.String sName,
java.lang.String qName)
throws org.xml.sax.SAXException
- Specified by:
endElement
in interface org.xml.sax.ContentHandler
- Overrides:
endElement
in class org.xml.sax.helpers.DefaultHandler
- Throws:
org.xml.sax.SAXException
characters
public void characters(char[] buf,
int offset,
int len)
throws org.xml.sax.SAXException
- Specified by:
characters
in interface org.xml.sax.ContentHandler
- Overrides:
characters
in class org.xml.sax.helpers.DefaultHandler
- Throws:
org.xml.sax.SAXException