Benchmarking In-the-wild Multimodal Disease Recognition and A Versatile Baseline
Paper
Paper/Presentation Title | Benchmarking In-the-wild Multimodal Disease Recognition and A Versatile Baseline |
---|---|
Presentation Type | Paper |
Authors | Wei, Tianqi, Chen, Zhi, Huang, Zi and Yu, Xin |
Journal or Proceedings Title | Proceedings of the 32nd ACM International Conference on Multimedia (MM '24) |
Journal Citation | pp. 1593-1601 |
Number of Pages | 9 |
Year | 2024 |
Publisher | Association for Computing Machinery (ACM) |
Place of Publication | United States |
ISBN | 9798400706868 |
Digital Object Identifier (DOI) | https://doi.org/10.1145/3664647.3680599 |
Web Address (URL) of Paper | https://dl.acm.org/doi/10.1145/3664647.3680599 |
Web Address (URL) of Conference Proceedings | https://dl.acm.org/doi/proceedings/10.1145/3664647 |
Conference/Event | 32nd ACM International Conference on Multimedia (MM '24) |
Event Details | 32nd ACM International Conference on Multimedia (MM '24) Parent ACM International Conference on Multimedia Delivery In person Event Date 28 Oct 2024 to end of 01 Nov 2024 Event Location Melbourne, Australia |
Abstract | Existing plant disease classification models have achieved remarkable performance in recognizing in-laboratory diseased images. However, their performance often significantly degrades in classifying in-the-wild images. Furthermore, we observed that in-the-wild plant images may exhibit similar appearances across various diseases (i.e., small inter-class discrepancy) while the same diseases may look quite different (i.e., large intra-class variance). Motivated by this observation, we propose an in-the-wild multimodal plant disease recognition dataset that contains the largest number of disease classes but also text-based descriptions for each disease. Particularly, the newly provided text descriptions are introduced to provide rich information in textual modality and facilitate in-the-wild disease classification with small inter-class discrepancy and large intra-class variance issues. Therefore, our proposed dataset can be regarded as an ideal testbed for evaluating disease recognition methods in the real world. In addition, we further present a strong yet versatile baseline that models text descriptions and visual data through multiple prototypes for a given class. By fusing the contributions of multimodal prototypes in classification, our baseline can effectively address the small inter-class discrepancy and large intra-class variance issues. Remarkably, our baseline model can not only classify diseases but also recognize diseases in few-shot or training-free scenarios. Extensive benchmarking results demonstrate that our proposed in-the-wild multimodal dataset sets many new challenges to the plant disease recognition task and there is a large space to improve for future works. |
Keywords | Plant disease; Vision language models |
Contains Sensitive Content | Does not contain sensitive content |
ANZSRC Field of Research 2020 | 4602. Artificial intelligence |
Public Notes | Files associated with this item cannot be displayed due to copyright restrictions. |
Byline Affiliations | University of Queensland |
https://research.usq.edu.au/item/zyx47/benchmarking-in-the-wild-multimodal-disease-recognition-and-a-versatile-baseline
10
total views0
total downloads6
views this month0
downloads this month