PC-filter: a Robust filtering technique for duplicate record detection in large databases
Paper
Paper/Presentation Title | PC-filter: a Robust filtering technique for duplicate record detection in large databases |
---|---|
Presentation Type | Paper |
Authors | Zhang, Ji (Author), Ling, Tok Wang (Author), Bruckner, Robert (Author) and Liu, Han (Author) |
Editors | Galindo, Fernando, Takizawa, Makoto and Traunmuller, Roland |
Journal or Proceedings Title | Lecture Notes in Computer Science (Book series) |
Journal Citation | 3180, pp. 486-496 |
Number of Pages | 11 |
Year | 2004 |
Publisher | Springer |
Place of Publication | Germany |
ISSN | 1611-3349 |
0302-9743 | |
ISBN | 9783540229360 |
Digital Object Identifier (DOI) | https://doi.org/10.1007/978-3-540-30075-5_47 |
Web Address (URL) of Paper | https://link.springer.com/chapter/10.1007/978-3-540-30075-5_47 |
Conference/Event | 15th International Conference on Database and Expert Systems Applications (DEXA'04) |
Event Details | 15th International Conference on Database and Expert Systems Applications (DEXA'04) Event Date 30 Aug 2004 to end of 03 Sep 2004 Event Location Zaragoza, Spain |
Abstract | In this paper, we will propose PC-Filter (PC stands for Partition Comparison), a robust data filter for approximately duplicate record detection in large databases. PC-Filter distinguishes itself from all of existing methods by using the notion of partition in duplicate detection. It first sorts the whole database and splits the sorted database into a number of record partitions. The Partition Comparison Graph (PCG) is then constructed by performing fast partition |
Keywords | PC-filter; partition comparision filter; duplicate record detection |
ANZSRC Field of Research 2020 | 469999. Other information and computing sciences not elsewhere classified |
Public Notes | File reproduced in accordance with the copyright policy of the publisher/author. |
Byline Affiliations | University of Toronto, Canada |
National University of Singapore | |
Microsoft, United States |
https://research.usq.edu.au/item/9z2q7/pc-filter-a-robust-filtering-technique-for-duplicate-record-detection-in-large-databases
Download files
1824
total views631
total downloads1
views this month1
downloads this month