Debiased Learning of Self-Labeled Twitter Data for User Demographic Prediction
Paper
Paper/Presentation Title | Debiased Learning of Self-Labeled Twitter Data for User Demographic Prediction |
---|---|
Presentation Type | Paper |
Authors | Wang, Zhen, Cooley, Madison, Zhang, Yang, Lan, Chao and Zhang, Ji |
Journal or Proceedings Title | Proceedings of the 10th IEEE International Conference on Big Data (2022) |
Journal Citation | pp. 6827-6829 |
Number of Pages | 3 |
Year | 2023 |
Publisher | IEEE (Institute of Electrical and Electronics Engineers) |
Place of Publication | United States |
Digital Object Identifier (DOI) | https://doi.org/10.1109/BigData55660.2022.10020166 |
Web Address (URL) of Paper | https://ieeexplore.ieee.org/document/10020166 |
Web Address (URL) of Conference Proceedings | https://ieeexplore.ieee.org/xpl/conhome/10020192/proceeding |
Conference/Event | Proceedings of the 10th IEEE International Conference on Big Data (2022) |
Event Details | Proceedings of the 10th IEEE International Conference on Big Data (2022) Parent IEEE International Conference on Big Data Delivery In person Event Date 17 to end of 20 Dec 2022 Event Location Osaka, Japan |
Abstract | Labeling sufficient data for supervised learning remains an open challenge in social network analysis. An alternative is to collect self-labeled data, i.e. the data labeled by their owners. Emmery et al show that standard models can be trained and perform well on self-labeled data, suggesting the effectiveness of this approach. In this paper, we argue self-labeled data may not be representative of the population. Taking Twitter demographic prediction as an example, we show the popular FastText model standardly trained on self-labeled data does not generalize well on random testing samples. We then present a new learner DeFastText that aims to correct data bias using the kernel means matching technique. In experiment, we show it achieves lower generalization errors than FastText. This research raises an attention of the data bias problem when learning from self-labeled data in social network analysis. |
ANZSRC Field of Research 2020 | 460599. Data management and data science not elsewhere classified |
Public Notes | Files associated with this item cannot be displayed due to copyright restrictions. |
Byline Affiliations | Zhejiang Lab, China |
University of Utah, United States | |
University of Oklahoma, United States | |
School of Mathematics, Physics and Computing |
https://research.usq.edu.au/item/z5900/debiased-learning-of-self-labeled-twitter-data-for-user-demographic-prediction
47
total views1
total downloads3
views this month0
downloads this month