This paper presents learning-based techniques that support the processing of tables in HTML publications.

Table Classification: An Application of Machine Learning to Web-hosted Financial Documents
Download Resources
PDF Accessibility
One or more of the PDF files on this page fall under E202.2 Legacy Exceptions and may not be completely accessible. You may request an accessible version of a PDF using the form on the Contact Us page.
This paper presents learning-based techniques that support the processing of tables in HTML publications. We are concerned especially with classifying tables as to format and content, focusing on the domain of corporate financials. We present performance results based on multiple classification methods, and make several novel methodological contributions. These include a new evaluation corpus, a clever tech-nique for creating the corpus, and an exhaustive approach to-wards sensitivity analysis for classification features.