cz.kebrt.html2latex
Class Parser

java.lang.Object
  extended by cz.kebrt.html2latex.Parser

public class Parser
extends java.lang.Object

HTML parser.


Field Summary
private  java.io.File _file
          Input file.
private  java.io.FileReader _fr
          Input file.
private  ParserHandler _handler
          Handler which receives events from the parser.
private  java.util.Stack<ElementStart> _openElements
          Stack containing all opened and still non-closed elements.
private  java.io.BufferedReader _reader
          Input file.
 
Constructor Summary
Parser()
           
 
Method Summary
private  void destroy()
          Closes the input input file specified in the parse() method.
private  void doParsing()
          Reads the input file char by char.
private  void checkValidity(ElementEnd element)
          Checks whether the document is well-formed.
private  void init()
          Opens the input file specified in the parse() method.
 void parse(java.io.File inputFile, ParserHandler handler)
          Parses the HTML file and converts it using the particular handler.
private  MyElement parseElement(java.lang.String elementString)
          Parses element.
private  void readContent(char firstChar)
          Reads text content of an element.
private  void readElement()
          Reads elements (tags).
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

_file

private java.io.File _file
Input file.


_fr

private java.io.FileReader _fr
Input file.


_reader

private java.io.BufferedReader _reader
Input file.


_handler

private ParserHandler _handler
Handler which receives events from the parser.


_openElements

private java.util.Stack<ElementStart> _openElements
Stack containing all opened and still non-closed elements.

Constructor Detail

Parser

public Parser()
Method Detail

parse

public void parse(java.io.File inputFile,
                  ParserHandler handler)
           throws FatalErrorException
Parses the HTML file and converts it using the particular handler. The file is processed char by char and a couple of events are sent to the handler. The whole process is very similar to the SAX model used with XML. The list of possible events which are sent to the handler follows.

Parameters:
inputFile - input HTML file
handler - receives events such as startElement (ie. <html) >, endElement, ...
Throws:
FatalErrorException - fatal error (ie. input file can't be opened) occurs

init

private void init()
           throws FatalErrorException
Opens the input file specified in the parse() method.

Throws:
FatalErrorException - when input file can't be opened

destroy

private void destroy()
              throws FatalErrorException
Closes the input input file specified in the parse() method.

Throws:
FatalErrorException - when input file can't be closed

doParsing

private void doParsing()
                throws java.io.IOException
Reads the input file char by char. When the "<" char is reached readElement() is called otherwise readContent() is called.

Throws:
java.io.IOException - when input error occurs

readElement

private void readElement()
                  throws java.io.IOException
Reads elements (tags). Sends comment, startElement and endElement events to the handler.

Throws:
java.io.IOException - when input error occurs

parseElement

private MyElement parseElement(java.lang.String elementString)
Parses element. Stores element attributes in ElementStart object if it's a start element.

Parameters:
elementString - string containing the element with its attributes (but without leading "<" and ending ">")
Returns:
ElementStart or ElementEnd object.

readContent

private void readContent(char firstChar)
                  throws java.io.IOException
Reads text content of an element. Sends character event to the handler.

Parameters:
firstChar - first char read in doParsing() method
Throws:
java.io.IOException - when input error occurs

checkValidity

private void checkValidity(ElementEnd element)
Checks whether the document is well-formed. If not it sends endElement events for the elements which were opened but not correctly closed.

Parameters:
element - the latest ending element which was reached