Is it the Language Model in Language Modeling?

By Dr. Warren Greiff

This position paper explores the question of whether or not it is the use of language models, per se, that accounts for the recent surge of interest in what has come to be called Language Modeling in Information Retrieval (LMIR).

Download Resources


PDF Accessibility

One or more of the PDF files on this page fall under E202.2 Legacy Exceptions and may not be completely accessible. You may request an accessible version of a PDF using the form on the Contact Us page.

This position paper explores the question of whether or not it is the use of language models, per se, that accounts for the recent surge of interest in what has come to be called Language Modeling in Information Retrieval (LMIR). We conjecture that, for the most part the answer is no. We suggest instead that the principal contribution of Language Modeling is that it makes patent the following: the use of term frequencies in document evaluation can advantageously be viewed as statistical parameter estimation; and, in so doing, Language Modeling (LM) approaches, explicitly or implicitly, address the role of variance reduction in producing models that result in improved retrieval performance. We further suggest that recognition of the importance of estimation variance will have a benefcial effect on the continued development of theoretical foundations for Information Retrieval. With the objective of supporting a more precise formulation of this question, and the discussion of the relevant issues, we begin with a formal definition of "language model". We then propose a description of what "the Language Modeling approach" can be thought to consist of, in the context of IR research; first with a strict interpretation in mind, and then with a more informal view. We conclude with a brief exposition of research into the role of variance reduction in IR recently begun at The MITRE Corporation.