Data integration systems often provide a uniform interface, called a mediated schema, to a multitude of disparate data sources.
![](/themes/mitre/img/defaults/hero_mobile/MITRE-Building.jpeg)
Analyzing and Revising Mediated Schemas to Improve Their Matchability
Download Resources
PDF Accessibility
One or more of the PDF files on this page fall under E202.2 Legacy Exceptions and may not be completely accessible. You may request an accessible version of a PDF using the form on the Contact Us page.
Data integration systems often provide a uniform interface, called a mediated schema, to a multitude of disparate data sources. To answer user queries posed over the mediated schema, such systems employ a set of semantic matches between this schema and the local schemas of the data sources. Finding such matches is well known to be difficult. Hence much work has focused on developing semi-automatic techniques to efficiently find the matches. In this paper, however, we consider the complementary problem of improving the mediated schema, to make finding such matches easier. Specifically, a mediated schema S will typically be matched with many source schemas. Thus, can the developer of S analyze and revise S in a way that preserves S's semantics, and yet makes it easier to match with in the future? We describe mSeer, a solution to this problem. Given a mediated schema S, mSeer first computes a matchability score that quantifies how well S can be matched against. Next, mSeer generates a matchability report that shows where the problems in matching S come from. Finally, mSeer automatically suggests changes to S (e.g., renaming an attribute, reformatting data values, etc.) that it believes will preserve the semantics of S and yet make it more amenable to matching. The creator of S is free to accept or revise the changes suggested by mSeer. We present extensive experiments over several real-world domains that demonstrate the effectiveness of our approach.