Efficient and effective filtering of duplication detection in large database applications
Article
Article Title | Efficient and effective filtering of duplication detection in large database applications |
---|---|
ERA Journal ID | 32139 |
Article Category | Article |
Authors | |
Author | Zhang, Ji |
Journal Title | Journal of Software |
Journal Citation | 7 (11), pp. 2424-2436 |
Number of Pages | 13 |
Year | 2012 |
Place of Publication | Oulu, Finland |
ISSN | 1796-217X |
Digital Object Identifier (DOI) | https://doi.org/10.4304/jsw.7.11.2424-2436 |
Web Address (URL) | http://ojs.academypublisher.com/index.php/jsw/article/view/jsw071124242436/5879 |
Abstract | In this paper, a robust filtering technique, called PC-Filter (PC stands for partition comparison), is proposed for effective and efficient duplicate record detection in large databases. PC-Filter distinguishes itself from all of existing methods by using record partitions in duplicate detection. PC-Filter operates in three steps. It first sorts the whole database and splits the sorted database into a number of record partitions. The Partition Comparison Graph (PCG) is then generated by performing fast partition pruning. Finally, duplicate records are effectively detected through internal |
Keywords | filtering; duplicate record detection; database management; pattern recognition |
ANZSRC Field of Research 2020 | 469999. Other information and computing sciences not elsewhere classified |
Public Notes | Files associated with this item cannot be displayed due to copyright restrictions. |
Byline Affiliations | Centre for Systems Biology |
Institution of Origin | University of Southern Queensland |
https://research.usq.edu.au/item/q1v09/efficient-and-effective-filtering-of-duplication-detection-in-large-database-applications
1874
total views13
total downloads0
views this month0
downloads this month