cz.kebrt.html2latex
Class Convertor

java.lang.Object
  extended by cz.kebrt.html2latex.Convertor

 class Convertor
extends java.lang.Object

Class which converts HTML into LaTeX format. Plain HTML elements are converted using commonElementStart() and commonElementEnd() methods. Elements requiring special care during the conversion are converted by calling special methods like tableRowStart() .


Field Summary
private  java.util.HashMap<java.lang.String,java.lang.String> _biblio
          Document's bibliography.
private  Configuration _config
          Program configuration.
private  int _countIgnoreContentElements
          Counter telling in how many elements with "ignoreContent" attribute the parser is.
private  int _countLeaveTextElements
          Counter telling in how many elements with "leaveText" attribute the parser is.
private  boolean _firstCell
          If table cell is reached is it first table cell?
private  boolean _firstRow
          If table row is reached is it first table row?
private  java.io.FileWriter _fw
          Output file.
private  java.io.File _outputFile
          Output file.
private  boolean _printBorder
          Shall border be printed in current table.
private  java.io.BufferedWriter _writer
          Output file.
 
Constructor Summary
Convertor(java.io.File outputFile)
          Opens the output file.
 
Method Summary
 void anchorEnd(ElementEnd element, ElementStart es)
          Called when A end element is reached.
 void anchorStart(ElementStart e)
          Called when A start element is reached.
 void bodyEnd(ElementEnd element, ElementStart es)
          Called when end element is reached.
 void bodyStart(ElementStart es)
          Called when BODY start element is reached.
 void comment(java.lang.String comment)
          Called when comment is reached in the input HTML document.
 void commonElementEnd(ElementEnd element, ElementStart es)
          Called when HTML end element is reached and special method for the element doesn't exist.
 void commonElementStart(ElementStart element)
          Called when HTML start element is reached and special method for the element doesn't exist.
private  java.lang.String convertCharEntitites(java.lang.String str)
          Converts HTML character entities to LaTeX commands.
private  java.lang.String convertLaTeXSpecialChars(java.lang.String str)
          Converts LaTeX special characters (ie. '{') to LaTeX commands.
 void cssStyleEnd(ElementStart e)
          Prints CSS style converted to LaTeX command.
 void cssStyleStart(ElementStart e)
          Prints CSS style converted to LaTeX command.
 void destroy()
          Closes the output file.
private  CSSStyle[] findStyles(ElementStart e)
          Finds styles for the specified element.
 void fontEnd(ElementEnd e, ElementStart es)
          Called when FONT end element is reached.
 void fontStart(ElementStart es)
          Called when FONT start element is reached.
 void characters(java.lang.String str)
          Called when text content is reached in the input HTML document.
 void imgStart(ElementStart es)
          Called when IMG start element is reached.
 void metaStart(ElementStart es)
          Called when META start element is reached.
private  void processAttributes(ElementStart element)
          Processes HTML elements' attributes.
 void tableCellEnd(ElementEnd element, ElementStart e)
          Called when TD end element is reached.
 void tableCellStart(ElementStart e)
          Called when TD start element is reached.
 void tableEnd(ElementEnd e, ElementStart es)
          Called when TABLE end element is reached.
 void tableRowEnd(ElementEnd e, ElementStart es)
          Called when TR end element is reached.
 void tableRowStart(ElementStart e)
          Called when TR start element is reached.
 void tableStart(ElementStart e)
          Called when TABLE start element is reached.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

_config

private Configuration _config
Program configuration.


_outputFile

private java.io.File _outputFile
Output file.


_fw

private java.io.FileWriter _fw
Output file.


_writer

private java.io.BufferedWriter _writer
Output file.


_countLeaveTextElements

private int _countLeaveTextElements
Counter telling in how many elements with "leaveText" attribute the parser is.


_countIgnoreContentElements

private int _countIgnoreContentElements
Counter telling in how many elements with "ignoreContent" attribute the parser is.


_firstCell

private boolean _firstCell
If table cell is reached is it first table cell?


_firstRow

private boolean _firstRow
If table row is reached is it first table row?


_printBorder

private boolean _printBorder
Shall border be printed in current table.


_biblio

private java.util.HashMap<java.lang.String,java.lang.String> _biblio
Document's bibliography.
key : bibitem name
value : bibitem description

Constructor Detail

Convertor

Convertor(java.io.File outputFile)
    throws FatalErrorException
Opens the output file.

Parameters:
outputFile - output LaTeX file
Throws:
FatalErrorException - when output file can't be opened
Method Detail

destroy

public void destroy()
Closes the output file.


commonElementStart

public void commonElementStart(ElementStart element)
                        throws java.io.IOException,
                               NoItemException
Called when HTML start element is reached and special method for the element doesn't exist.

Parameters:
element - HTML start tag
Throws:
java.io.IOException - output error occurs
NoItemException - tag not found in the configuration

commonElementEnd

public void commonElementEnd(ElementEnd element,
                             ElementStart es)
                      throws java.io.IOException,
                             NoItemException
Called when HTML end element is reached and special method for the element doesn't exist.

Parameters:
element - corresponding end tag
es - start tag
Throws:
java.io.IOException - output error occurs
NoItemException - tag not found in the configuration

characters

public void characters(java.lang.String str)
                throws java.io.IOException
Called when text content is reached in the input HTML document.

Parameters:
str - text content reached
Throws:
java.io.IOException - when output error occurs

comment

public void comment(java.lang.String comment)
             throws java.io.IOException
Called when comment is reached in the input HTML document.

Parameters:
comment - comment (without <!-- and -->)
Throws:
java.io.IOException - when output error occurs

convertLaTeXSpecialChars

private java.lang.String convertLaTeXSpecialChars(java.lang.String str)
Converts LaTeX special characters (ie. '{') to LaTeX commands.

Parameters:
str - input string
Returns:
converted string

convertCharEntitites

private java.lang.String convertCharEntitites(java.lang.String str)
Converts HTML character entities to LaTeX commands.

Parameters:
str - input string
Returns:
converted string

processAttributes

private void processAttributes(ElementStart element)
                        throws java.io.IOException
Processes HTML elements' attributes. "Title" and "cite" attributes are converted to footnotes.

Parameters:
element - HTML start tag
Throws:
java.io.IOException - when output error occurs

cssStyleStart

public void cssStyleStart(ElementStart e)
                   throws java.io.IOException
Prints CSS style converted to LaTeX command. Called when HTML start element is reached.

Parameters:
e - HTML start element
Throws:
java.io.IOException - when output error occurs

cssStyleEnd

public void cssStyleEnd(ElementStart e)
                 throws java.io.IOException
Prints CSS style converted to LaTeX command. Called when HTML end element is reached.

Parameters:
e - corresponding HTML start element
Throws:
java.io.IOException - when output error occurs

findStyles

private CSSStyle[] findStyles(ElementStart e)
Finds styles for the specified element.

Parameters:
e - HTML element
Returns:
array with styles in this order: element name style, 'class' style, 'id' style (if style not found null is stored in the array)

anchorStart

public void anchorStart(ElementStart e)
                 throws java.io.IOException,
                        NoItemException
Called when A start element is reached.

Parameters:
e - start tag
Throws:
java.io.IOException - output error occurs
NoItemException - tag not found in the configuration

anchorEnd

public void anchorEnd(ElementEnd element,
                      ElementStart es)
               throws java.io.IOException,
                      NoItemException
Called when A end element is reached.

Parameters:
element - corresponding end tag
es - start tag
Throws:
java.io.IOException - output error occurs
NoItemException - tag not found in the configuration

tableRowStart

public void tableRowStart(ElementStart e)
                   throws java.io.IOException,
                          NoItemException
Called when TR start element is reached.

Parameters:
e - start tag
Throws:
java.io.IOException - output error occurs
NoItemException - tag not found in the configuration

tableRowEnd

public void tableRowEnd(ElementEnd e,
                        ElementStart es)
                 throws java.io.IOException
Called when TR end element is reached.

Parameters:
e - corresponding end tag
es - start tag
Throws:
java.io.IOException - output error occurs

tableCellStart

public void tableCellStart(ElementStart e)
                    throws java.io.IOException,
                           NoItemException
Called when TD start element is reached.

Parameters:
e - start tag
Throws:
java.io.IOException - output error occurs
NoItemException - tag not found in the configuration

tableCellEnd

public void tableCellEnd(ElementEnd element,
                         ElementStart e)
                  throws java.io.IOException,
                         NoItemException
Called when TD end element is reached.

Parameters:
element - corresponding end tag
e - start tag
Throws:
java.io.IOException - output error occurs
NoItemException - tag not found in the configuration

tableStart

public void tableStart(ElementStart e)
                throws java.io.IOException,
                       NoItemException
Called when TABLE start element is reached.

Parameters:
e - start tag
Throws:
java.io.IOException - output error occurs
NoItemException - tag not found in the configuration

tableEnd

public void tableEnd(ElementEnd e,
                     ElementStart es)
              throws java.io.IOException,
                     NoItemException
Called when TABLE end element is reached.

Parameters:
e - corresponding end tag
es - start tag
Throws:
java.io.IOException - output error occurs
NoItemException - tag not found in the configuration

bodyStart

public void bodyStart(ElementStart es)
               throws java.io.IOException,
                      NoItemException
Called when BODY start element is reached.

Parameters:
es - start tag
Throws:
java.io.IOException - output error occurs
NoItemException - tag not found in the configuration

imgStart

public void imgStart(ElementStart es)
              throws java.io.IOException,
                     NoItemException
Called when IMG start element is reached.

Parameters:
es - start tag
Throws:
java.io.IOException - output error occurs
NoItemException - tag not found in the configuration

metaStart

public void metaStart(ElementStart es)
               throws java.io.IOException,
                      NoItemException
Called when META start element is reached. Recognizes basic charsets (cp1250, utf8, latin2)

Parameters:
es - start tag
Throws:
java.io.IOException - output error occurs
NoItemException - tag not found in the configuration

fontStart

public void fontStart(ElementStart es)
               throws java.io.IOException,
                      NoItemException
Called when FONT start element is reached.

Parameters:
es - start tag
Throws:
java.io.IOException - output error occurs
NoItemException - tag not found in the configuration

fontEnd

public void fontEnd(ElementEnd e,
                    ElementStart es)
             throws java.io.IOException,
                    NoItemException
Called when FONT end element is reached.

Parameters:
e - corresponding end tag
es - start tag
Throws:
java.io.IOException - output error occurs
NoItemException - tag not found in the configuration

bodyEnd

public void bodyEnd(ElementEnd element,
                    ElementStart es)
             throws java.io.IOException,
                    NoItemException
Called when end element is reached.

Parameters:
element - corresponding end tag
es - start tag
Throws:
java.io.IOException - output error occurs
NoItemException - tag not found in the configuration