Big Data Quality Case Study Preliminary Findings

By David Becker , Patricia King , William McMullen , Lisa Lalis , David Bloom , Dr. Ali Obaidi , Donna Fickett

A set of four case studies related to data quality in the context of the management and use of Big Data are being performed and reported separately; these will also be compiled into a summary overview report.

Download Resources


PDF Accessibility

One or more of the PDF files on this page fall under E202.2 Legacy Exceptions and may not be completely accessible. You may request an accessible version of a PDF using the form on the Contact Us page.

A set of four case studies related to data quality in the context of the management and use of Big Data are being performed and reported separately; these will also be compiled into a summary overview report. The report herein documents one of those four cases studies.

The purpose of this document is to present information about the various data quality issues related to the design, implementation and operation of a specific data initiative, the U.S. Army's Medical Command (MEDCOM) Medical Operational Data System (MODS) project. While MODS is not currently a Big Data initiative, potential future Big Data requirements under consideration (in the areas of geospatial data, document and records data, and textual data) could easily move MODS into the realm of Big Data. Each of these areas has its own data quality issues that must be considered. By better understanding the data quality issues in these Big Data areas of growth, we hope to explore specific differences in the nature and type of Big Data quality problems from what is typically experienced in traditionally sized data sets. This understanding should facilitate the acquisition of the MODS data warehouse though improvements in the requirements and downstream design efforts. It should also enable the crafting of better strategies and tools for profiling, measurement, assessment, and action processing of Big Data Quality problems.