Novelty Discovery with Heterogeneous Features

Jul 1, 2010

By Ken Samuel , Peter Mork , Dr. Adriane Chapman , David Moore , Irina Vayndiner , Erik Sax

This paper presents experiments with a unique machine learning method called Cross-Feature Analysis, which is a novelty discovery method that can easily accommodate heterogeneous features.

Download Resources

Novelty Discovery with Heterogeneous Features

PDF Accessibility

One or more of the PDF files on this page fall under E202.2 Legacy Exceptions and may not be completely accessible. You may request an accessible version of a PDF using the form on the Contact Us page.

This paper presents experiments with a unique machine learning method called Cross-Feature Analysis, which is a novelty discovery method that can easily accommodate heterogeneous features. The domain of our work is database security, with the goal of detecting attacks that are similar to those seen in the past as well as completely novel attacks that have not yet been seen. The training data consists of database logs that have no attacks, so supervised machine learning methods cannot apply, and unsupervised machine learning methods are unsatisfactory, because we have a variety of feature types, including numerical features, categorical features, and set-valued features. However, Cross-Feature Analysis transforms our novelty discovery problem into multiple supervised machine learning problems, building one submodel for each feature by treating that feature as the class, Then new instances are analyzed by the submodels to determine whether they are consistent (legitimate) or anomalous (suspicious). In our experiments we discovered that, by setting a limit on the number of submodels that reject an instance, our system can distinguish legitimate instances from attacks with perfect (100%) recall of real attacks and a specificity of 99.9% on legitimate instances for one data set, and on another data set, recall = 97.2% and specificity = 99.9%.