Method and dataset entity mining in scientific literature: A CNN + BiLSTM model with self-attention
Article
Article Title | Method and dataset entity mining in scientific literature: A CNN + BiLSTM model with self-attention |
---|---|
ERA Journal ID | 18062 |
Article Category | Article |
Authors | Hou, Linlin, Zhang, Ji, Wu, Ou, Yu, Ting, Wang, Zhen, Li, Zhao, Gao, Jianliang, Ye, Yingchun and Yao, Rujing |
Journal Title | Knowledge-Based Systems |
Journal Citation | 235 |
Article Number | 107621 |
Number of Pages | 14 |
Year | 2022 |
Publisher | Elsevier |
Place of Publication | Netherlands |
ISSN | 0950-7051 |
1872-7409 | |
Digital Object Identifier (DOI) | https://doi.org/10.1016/j.knosys.2021.107621 |
Web Address (URL) | https://www.sciencedirect.com/science/article/pii/S0950705121008832 |
Abstract | The traditional literature analysis mainly focuses on the literature metadata such as topics, authors, keywords, references, and rarely pays attention to the main content of papers. However, in many scientific domains such as science, computing, engineering, the methods and datasets involved in the papers published carry important information and are quite useful for domain analysis and recommendation. Method and dataset entities have various forms, which are more difficult to recognize than common entities. In this paper, we propose a novel Method and Dataset Entity Recognition model (MDER), which is able to effectively extract the method and dataset entities from the main textual content of scientific papers. The model is the first to combine rule embedding, a parallel structure of Convolutional Neural Network (CNN) and a two-layer Bi-directional Long Short-Term Memory (BiLSTM) with the self-attention mechanism. We evaluate the proposed model on datasets constructed from the published papers of different research areas in computer science. Our model performs well in multiple areas and features a good capacity for cross-area learning and recognition. Ablation study indicates that building different modules collectively contributes to the good entity recognition performance as a whole. The data augmentation positively contributes to model training, making our model much more robust. We finally apply the proposed model on PAKDD papers published from 2009–2019 to mine insightful results over a long time span.1 |
Keywords | Literature analysis; Named entity recognition; Methods and datasets mining; CNN+BiLSTM-Attention-CRF structure |
ANZSRC Field of Research 2020 | 460599. Data management and data science not elsewhere classified |
Public Notes | File reproduced in accordance with the copyright policy of the publisher/author. |
Byline Affiliations | Nankai University, China |
University of Southern Queensland | |
Tianjin University, China | |
Zhejiang Lab, China | |
Alibaba Group, China | |
Central South University, China |
https://research.usq.edu.au/item/z0213/method-and-dataset-entity-mining-in-scientific-literature-a-cnn-bilstm-model-with-self-attention
Download files
31
total views36
total downloads0
views this month4
downloads this month