Automated Text Summarization: A Review and Recommendations

By Steven Shearing , Abigail Gertner , Benjamin Wellner, Ph.D. , Liz Merkhofer

This report examines and evaluates modern approaches to natural language text summarization. Recommendations are given for applying automated summarization in various problem domains.

Download Resources

This report presents an examination of a wide variety of automatic summarization models. We broadly assign summarization models into two overarching categories: extractive and abstractive summarization. Extractive summarization essentially reduces the summarization problem to a subset selection problem by returning portions of the input as the summary. This allows extractive models to be simpler, as they do not require the ability to generate new language.

Within the extractive models, we focus primarily on graph-based solutions and recurrent neural networks; however, this is not an exhaustive search of all extractive solutions. Optimization based systems such as submodular optimization and semantic volume maximization or other pre-graph systems can be viable solutions were left outside the scope of this report.

For abstractive systems, we examine recurrent neural networks and their successor the Transformer. Unlike extractive systems, abstractive methods generate entirely new text given the input to be summarized, much like a human might do. As a result, these models often generate summaries which appear more natural.

This comes at the cost of increased model capacity requirements, as the model must learn how to generate new text, not only identify important text. This can also lead to larger failures, as a failure can now not only occur in text identification, but also text generation. As with extractive systems, the models we examine do not represent the entirety of all abstractive solutions; however, our choice of models for both extractive and abstractive summarization cover a majority of the current solution space.

Overall, we find a scaling trade-off between computational resources versus summarization quality as we progress from the early graph models to the most recent Transformer models. Graph algorithms are unsupervised, requiring little to no training data and are often usable right out of the box with no specialized hardware requirements. However, they have a low performance ceiling, and will generally not be able to match the quality of summaries generated by later models.

Despite this low ceiling, the unsupervised nature of the algorithm makes it a desirable solution for a number of problem domains, particularly in cases where the user does not have in-domain labeled training data.

On the other hand, Transformer models have extremely high output quality, but have a number of requirements that make them difficult to use in general. Transformer models must be trained on large amounts of unlabeled data to learn strong language generation, and they must be fine-tuned on labeled summarization data, which may not exist for many domains. Due to their immense size, specialized GPU hardware is required to train and run these models. If the particular use case manages to meet all the requirements of a Transformer, it is highly recommended to use one.

Recurrent neural networks provide a middle ground to the graph algorithms and Transformer model. While not fully unsupervised, recurrent neural networks require much less training data than Transformers, often being able to train on datasets that are simply too small to be usable for a Transformer. While recurrent neural networks also require specialized GPU hardware to train, they are less computationally expensive and can be trained faster, with cheaper and smaller GPUs, and may not require a GPU to run inference in a fast enough manner.