High average-utility sequential pattern mining based on uncertain databases
Article
Article Title | High average-utility sequential pattern mining based on uncertain databases |
---|---|
ERA Journal ID | 18060 |
Article Category | Article |
Authors | Lin, Jerry Chun-Wei (Author), Li, Ting (Author), Pirouz, Matin (Author), Zhang, Ji (Author) and Fournier-Viger, Philippe (Author) |
Journal Title | Knowledge and Information Systems |
Journal Citation | 62 (3), pp. 1199-1228 |
Number of Pages | 30 |
Year | 2020 |
Place of Publication | London, United Kingdom |
ISSN | 0219-1377 |
0219-3116 | |
Digital Object Identifier (DOI) | https://doi.org/10.1007/s10115-019-01385-8 |
Abstract | emergence and proliferation of the internet of things (IoT) devices have resulted in the generation of big and uncertain data due to the varied accuracy and decay of sensors and their different sensitivity ranges. Since data uncertainty plays an important role in IoT data, mining the useful information from uncertain dataset has become an important issue in recent decades. Past works focus on mining the high sequential patterns from the uncertain database. However, the utility of a derived sequence increases along with the size of the sequence, which is an unfair measure to evaluate the utility of a sequence since any combination of a high-utility sequence will also be the high-utility sequence, even though the utility of a sequence is merely low. In this paper, we address the limitation of the previous potential high-utility sequential pattern mining and present a potentially high average-utility sequential pattern mining framework for discovering the set of potentially high average-utility sequential patterns (PHAUSPs) from the uncertain dataset by considering the size of a sequence, which can provide a fair measure of the patterns than the previous works. First, a baseline potentially high average-utility sequential pattern algorithm and three pruning strategies are introduced to completely mine the set of the desired PHAUSPs. To reduce the computational cost and accelerate the mining process, a projection algorithm called PHAUP is then designed, which leads to a reduction in the size of candidates of the desired patterns. Several experiments in terms of runtime, number of candidates, memory overhead, number of discovered pattern, and scalability are then evaluated on both real-life and artificial datasets, and the results showed that the proposed algorithm achieves promising performance, especially the PHAUP approach. |
Keywords | data mining, high average-utility sequential pattern mining, sequential patterns, uncertain database |
ANZSRC Field of Research 2020 | 469999. Other information and computing sciences not elsewhere classified |
Public Notes | Files associated with this item cannot be displayed due to copyright restrictions. |
Institution of Origin | University of Southern Queensland |
Byline Affiliations | Western Norway University of Applied Sciences, Norway |
Harbin Institute of Technology, China | |
California State University Fullerton, United States | |
School of Agricultural, Computational and Environmental Sciences |
https://research.usq.edu.au/item/q59y8/high-average-utility-sequential-pattern-mining-based-on-uncertain-databases
195
total views9
total downloads0
views this month0
downloads this month