Transformation-based Framework for Record Matching
- Arvind Arasu ,
- Surajit Chaudhuri ,
- Raghav Kaushik
Proceedings of the 24th International Conference on Data Engineering, ICDE 2008 |
Published by IEEE Computer Society
Today’s record matching infrastructure does not allow a flexible way to account for synonyms such as “Robert” and “Bob” which refer to the same name, and more general forms of string transformations such as abbreviations. We propose a programmatic framework of record matching that takes such user-defined string transformations as input. To the best of our knowledge, this is the first proposal for such a framework. This transformational framework, while expressive, poses significant computational challenges which we address. We empirically evaluate our techniques over real data.
Copyright © 2007 IEEE. Reprinted from IEEE Computer Society. This material is posted here with permission of the IEEE. Internal or personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution must be obtained from the IEEE by writing to [email protected]. By choosing to view this document, you agree to all provisions of the copyright laws protecting it.