Identifying informative tweets during a pandemic via a topic‑aware neural language model
Article
Article Title | Identifying informative tweets during a pandemic via a topic‑aware neural language model |
---|---|
ERA Journal ID | 32110 |
Article Category | Article |
Authors | Gao, Wang (Author), Li, Lin (Author), Tao, Xiaohui (Author), Zhou, Jing (Author) and Tao, Jun (Author) |
Journal Title | World Wide Web |
Journal Citation | 26 (1), pp. 55-70 |
Number of Pages | 16 |
Year | 2023 |
Publisher | Springer |
Place of Publication | United States |
ISSN | 1386-145X |
1573-1413 | |
Digital Object Identifier (DOI) | https://doi.org/10.1007/s11280-022-01034-1 |
Web Address (URL) | https://link.springer.com/article/10.1007/s11280-022-01034-1 |
Abstract | Every epidemic affects the real lives of many people around the world and leads to terrible consequences. Recently, many tweets about the COVID-19 pandemic have been shared publicly on social media platforms. The analysis of these tweets is helpful for emergency response organizations to prioritize their tasks and make better decisions. However, most of these tweets are non-informative, which is a challenge for establishing an automated system to detect useful information in social media. Furthermore, existing methods ignore unlabeled data and topic background knowledge, which can provide additional semantic information. In this paper, we propose a novel Topic-Aware BERT (TABERT) model to solve the above challenges. TABERT first leverages a topic model to extract the latent topics of tweets. Secondly, a flexible framework is used to combine topic information with the output of BERT. Finally, we adopt adversarial training to achieve semi-supervised learning, and a large amount of unlabeled data can be used to improve inner representations of the model. Experimental results on the dataset of COVID-19 English tweets show that our model outperforms classic and state-of-the-art baselines. |
Keywords | Adversarial training; Informative tweet identification; Social media; Topic model |
ANZSRC Field of Research 2020 | 461101. Adversarial machine learning |
460208. Natural language processing | |
461104. Neural networks | |
Public Notes | Files associated with this item cannot be displayed due to copyright restrictions. |
Byline Affiliations | Jianghan University, China |
Wuhan University of Technology, China | |
School of Sciences | |
Institution of Origin | University of Southern Queensland |
https://research.usq.edu.au/item/q753q/identifying-informative-tweets-during-a-pandemic-via-a-topic-aware-neural-language-model
125
total views4
total downloads0
views this month0
downloads this month