Generous teacher: Good at distilling knowledge for student learning
Article
Ding, Yifeng, Yang, Gaoming, Yin, Shuting, Zhang, Ji, Fang, Xianjin and Yang, Wencheng. 2024. "Generous teacher: Good at distilling knowledge for student learning." Image and Vision Computing. 150. https://doi.org/10.1016/j.imavis.2024.105199
Article Title | Generous teacher: Good at distilling knowledge for student learning |
---|---|
ERA Journal ID | 1247 |
Article Category | Article |
Authors | Ding, Yifeng, Yang, Gaoming, Yin, Shuting, Zhang, Ji, Fang, Xianjin and Yang, Wencheng |
Journal Title | Image and Vision Computing |
Journal Citation | 150 |
Article Number | 105199 |
Number of Pages | 16 |
Year | 2024 |
Publisher | Elsevier |
Place of Publication | Netherlands |
ISSN | 0262-8856 |
1872-8138 | |
Digital Object Identifier (DOI) | https://doi.org/10.1016/j.imavis.2024.105199 |
Web Address (URL) | https://www.sciencedirect.com/science/article/abs/pii/S0262885624003044 |
Abstract | Knowledge distillation is a technique that aims to transfer valuable knowledge from a large, well-trained model (the teacher) to a lightweight model (the student), with the primary goal of improving the student's performance on a given task. In recent years, mainstream distillation methods have focused on modifying student learning styles, resulting in less attention being paid to the knowledge provided by the teacher. However, upon re-examining the knowledge transferred by the teacher, we find that it still has untapped potential, which is crucial to bridging the performance gap between teachers and students. Therefore, we study knowledge distillation from the teacher's perspective and introduce a novel teacher knowledge enhancement method termed “Generous Teacher.” The Generous Teacher is a specially trained teacher model that can provide more valuable knowledge for the student model. This is achieved by integrating a standardly trained teacher (Standard Teacher) to assist in the training process of the Generous Teacher. As a result, the Generous Teacher accomplishes the task at hand and assimilates distilled knowledge from the Standard Teacher, effectively adapting to distillation teaching in advance. Specifically, we recognize that non-target class knowledge plays a crucial role in improving the distillation effect for students. To leverage this, we decouple logit outputs and selectively use the Standard Teacher's non-target class knowledge to enhance the Generous Teacher. By setting the temperature as a multiple of the logit standard deviation, we ensure that the additional knowledge absorbed by the Generous Teacher is more suitable for student distillation. Experimental results on standard benchmarks demonstrate that the Generous Teacher surpasses the Standard Teacher in terms of accuracy when applied to standard knowledge distillation. Furthermore, the Generous Teacher can be seamlessly integrated into existing distillation methods, bringing general improvements at a low additional computational cost. The code will be publicly available at |
Keywords | Absorbing distilled knowledge; Knowledge distillation; Generous teacher; Decouple logit |
Contains Sensitive Content | Does not contain sensitive content |
ANZSRC Field of Research 2020 | 460402. Data and information privacy |
Public Notes | Files associated with this item cannot be displayed due to copyright restrictions. |
Byline Affiliations | Anhui University of Science and Technology, China |
School of Mathematics, Physics and Computing |
Permalink -
https://research.usq.edu.au/item/z99qz/generous-teacher-good-at-distilling-knowledge-for-student-learning
15
total views0
total downloads4
views this month0
downloads this month