Continual Text-to-Video Retrieval with Frame Fusion and Task-Aware Routing

Paper


Zhao, Zecheng, Chen, Zhi, Huang, Zi, Sadiq, Shazia and Chen, Tong. 2025. "Continual Text-to-Video Retrieval with Frame Fusion and Task-Aware Routing." 48th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '25). Padua, Italy 13 - 18 Jul 2025 United States. Association for Computing Machinery (ACM). https://doi.org/10.1145/3726302.3729936
Paper/Presentation Title

Continual Text-to-Video Retrieval with Frame Fusion and Task-Aware Routing

Presentation TypePaper
AuthorsZhao, Zecheng, Chen, Zhi, Huang, Zi, Sadiq, Shazia and Chen, Tong
Journal or Proceedings TitleProceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '25)
Journal Citationpp. 1011-1021
Number of Pages11
Year2025
PublisherAssociation for Computing Machinery (ACM)
Place of PublicationUnited States
ISBN9798400715921
Digital Object Identifier (DOI)https://doi.org/10.1145/3726302.3729936
Web Address (URL) of Paperhttps://dl.acm.org/doi/10.1145/3726302.3729936
Web Address (URL) of Conference Proceedingshttps://dl.acm.org/doi/proceedings/10.1145/3726302
Conference/Event48th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '25)
Event Details
48th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '25)
Parent
ACM International Conference on Research and Development in Information Retrieval
Delivery
In person
Event Date
13 to end of 18 Jul 2025
Event Location
Padua, Italy
Abstract

Text-to-Video Retrieval (TVR) aims to retrieve relevant videos based on textual queries. However, as video content evolves continuously, adapting TVR systems to new data remains a critical yet under-explored challenge. In this paper, we introduce the first benchmark for Continual Text-to-Video Retrieval (CTVR) to address the limitations of existing approaches. Current Pre-Trained Model (PTM)-based TVR methods struggle with maintaining model plasticity when adapting to new tasks, while existing Continual Learning (CL) methods suffer from catastrophic forgetting, leading to semantic misalignment between historical queries and stored video features. To address these two challenges, we propose FrameFusionMoE, a novel CTVR framework that comprises two key components: (1) the Frame Fusion Adapter (FFA), which captures temporal video dynamics while preserving model plasticity, and (2) the Task-Aware Mixture-of-Experts (TAME), which ensures consistent semantic alignment between queries across tasks and the stored video features. Thus, FrameFusionMoE enables effective adaptation to new video content while preserving historical text-video relevance to mitigate catastrophic forgetting. We comprehensively evaluate FrameFusionMoE on two benchmark datasets under various task settings. Results demonstrate that FrameFusionMoE outperforms existing CL and TVR methods, achieving superior retrieval performance with minimal degradation on earlier tasks when handling continuous video streams. Our code is available at: https://github.com/JasonCodeMaker/CTVR

KeywordsContinual Text-to-Video Retrieval; Continual Learning; Video Representation Learning
Contains Sensitive ContentDoes not contain sensitive content
ANZSRC Field of Research 20204602. Artificial intelligence
Byline AffiliationsUniversity of Queensland
Permalink -

https://research.usq.edu.au/item/zyx4z/continual-text-to-video-retrieval-with-frame-fusion-and-task-aware-routing

Download files


Published Version
3726302.3729936.pdf
License: CC BY 4.0
File access level: Anyone

  • 60
    total views
  • 6
    total downloads
  • 53
    views this month
  • 3
    downloads this month

Export as

Related outputs

Dynamic Target Distribution Estimation for Source-Free Open-Set Domain Adaptation
Yu, Zhiqi, Liao, Zhichao, Li, Jingjing, Chen, Zhi and Zhu, Lei. 2025. "Dynamic Target Distribution Estimation for Source-Free Open-Set Domain Adaptation." 39th AAAI Conference on Artificial Intelligence (AAAI 2025). Philadelphia, Pennsylvania, United States 25 Feb - 04 Mar 2025 United States. Association for the Advancement of Artificial Intelligence (AAAI). https://doi.org/10.1609/aaai.v39i21.34380
Snap and diagnose: An advanced multimodal retrieval system for identifying plant diseases in the wild
Wei, Tianqi, Chen, Zhi and Yu, Xin. 2024. "Snap and diagnose: An advanced multimodal retrieval system for identifying plant diseases in the wild." 6th ACM International Conference on Multimedia in Asia (MMAsia '24). Auckland, New Zealand 03 - 06 Dec 2024 United States. Association for Computing Machinery (ACM). https://doi.org/10.1145/3696409.3700293
DiPEx: Dispersing Prompt Expansion for Class-Agnostic Object Detection
Lim, Jia Syuen, Chen, Zhuoxiao, Baktashmotlagh, Mahsa, Chen, Zhi, Yu, Xin, Huang, Zi and Luo, Yadan. 2024. "DiPEx: Dispersing Prompt Expansion for Class-Agnostic Object Detection." 38th Annual Conference on Neural Information Processing Systems (NeurIPS 2024). Vancouver, Canada 10 - 15 Dec 2024 Canada.
Towards Cost-Efficient Federated Multi-agent RL with Learnable Aggregation
Zhang, Yi, Wang, Sen, Chen, Zhi, Xu, Xuwei, Funiak, Stano and Liu, Jiajun. 2024. "Towards Cost-Efficient Federated Multi-agent RL with Learnable Aggregation." 28th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2024). Taipei, Taiwan 07 - 10 May 2024 Springer. https://doi.org/10.1007/978-981-97-2253-2_14
Benchmarking In-the-wild Multimodal Disease Recognition and A Versatile Baseline
Wei, Tianqi, Chen, Zhi, Huang, Zi and Yu, Xin. 2024. "Benchmarking In-the-wild Multimodal Disease Recognition and A Versatile Baseline." 32nd ACM International Conference on Multimedia (MM '24). Melbourne, Australia 28 Oct - 01 Nov 2024 United States. Association for Computing Machinery (ACM). https://doi.org/10.1145/3664647.3680599
Secondary analysis of newly diagnosed type 2 diabetes subgroups and treatment responses in the MARCH cohort
Wang, Weihao, Li, Xinyao, Chen, Fei, Wei, Ran, Chen, Zhi, Li, Jingjing, Qiao, Jingtao, Pan, Qi, Yang, Wenying and Guo, Lixin. 2024. "Secondary analysis of newly diagnosed type 2 diabetes subgroups and treatment responses in the MARCH cohort." Diabetes and Metabolic Syndrome: Clinical Research and Reviews. 18 (1). https://doi.org/10.1016/j.dsx.2023.102936
Optimizing taxi route planning based on taxi trajectory data analysis
Yang, Xinyi, Chen, Zhi and Luo, Yadan. 2023. "Optimizing taxi route planning based on taxi trajectory data analysis." 34th Australasian Database Conference (ADC 2023). Melbourne, Australia 01 202 - 03 Nov 2023 Switzerland . Springer. https://doi.org/10.1007/978-3-031-47843-7_4
Cal-SFDA: Source-Free Domain-adaptive Semantic Segmentation with Differentiable Expected Calibration Error
Wang, Zixin, Luo, Yadan, Chen, Zhi, Wang, Sen and Huang, Zi. 2023. "Cal-SFDA: Source-Free Domain-adaptive Semantic Segmentation with Differentiable Expected Calibration Error." 31st ACM International Conference on Multimedia (MM '23). Ottawa, Canada 29 Oct 202 - 03 Nov 2023 United States. Association for Computing Machinery (ACM). https://doi.org/10.1145/3581783.3611808
Zero-Shot Learning by Harnessing Adversarial Samples
Chen, Zhi, Zhang, Pengfei, Li, Jingjing, Wang, Sen and Huang, Zi. 2023. "Zero-Shot Learning by Harnessing Adversarial Samples." 31st ACM International Conference on Multimedia (MM '23). Ottawa, Canada 29 Oct 202 - 03 Nov 2023 United States. Association for Computing Machinery (ACM). https://doi.org/10.1145/3581783.3611823
Cost-effective synchrophasor data source authentication based on multiscale adaptive coupling correlation detrended analysis
Bai, Feifei, Cui, Yi, Yan, Ruifeng, Yin, Hongzhi, Chen, Tong, Dart, David and Yaghoobi, Jalil. 2023. "Cost-effective synchrophasor data source authentication based on multiscale adaptive coupling correlation detrended analysis." International Journal of Electrical Power and Energy Systems. 144. https://doi.org/10.1016/j.ijepes.2022.108606
Multiscale Adaptive Multifractal Detrended Fluctuation Analysis-Based Source Identification of Synchrophasor Data
Cui, Yi, Bai, Feifei, Yin, Hongzhi, Chen, Tong, Dart, David, Zillmann, Matthew and Ko, Ryan K. L.. 2022. "Multiscale Adaptive Multifractal Detrended Fluctuation Analysis-Based Source Identification of Synchrophasor Data." IEEE Transactions on Smart Grid. 13 (6), pp. 4957-4960. https://doi.org/10.1109/tsg.2022.3207066
FluMA: An Intelligent Platform for Influenza Monitoring and Analysis
Chen, Xi, Chen, Zhi, Wang, Zijian, Qui, Ruihong and Luo, Yadan. 2022. "FluMA: An Intelligent Platform for Influenza Monitoring and Analysis." 33rd Australasian Database Conference (ADC 2022). Sydney, Australia 02 - 04 Sep 2022 Switzerland . Springer. https://doi.org/10.1007/978-3-031-15512-3_12
GSMFlow: Generation Shifts Mitigating Flow for Generalized Zero-Shot Learning
Chen, Zhi, Luo, Yadan, Wang, Sen, Li, Jingjing and Huang, Zi. 2022. "GSMFlow: Generation Shifts Mitigating Flow for Generalized Zero-Shot Learning." IEEE Transactions on Multimedia. 25, pp. 5374-5385. https://doi.org/10.1109/TMM.2022.3190678
Application of Novel Subgroups of Chinese Inpatients with Diabetes Based on Machine Learning Paradigm
Wang, Weihao, Chen, Zhi, Wang, Sen, Chen, Fei, Deng, Mingqun, Fan, Qi and Guo, Lixin. 2022. "Application of Novel Subgroups of Chinese Inpatients with Diabetes Based on Machine Learning Paradigm." Diabetes and Metabolic Syndrome: Clinical Research and Reviews. 16 (7). https://doi.org/10.1016/j.dsx.2022.102556
Pixel Exclusion: Uncertainty-aware Boundary Discovery for Active Cross-Domain Semantic Segmentation
You, Fuming, Li, Jingjing, Chen, Zhi and Zhu, Lei. 2022. "Pixel Exclusion: Uncertainty-aware Boundary Discovery for Active Cross-Domain Semantic Segmentation." 30th ACM International Conference on Multimedia (MM '22). Lisbon, Portugal 10 - 14 Oct 2022 United States. Association for Computing Machinery (ACM). https://doi.org/10.1145/3503161.3548079
Distinguishing Unseen from Seen for Generalized Zero-shot Learning
Su, Hongzu, Li, Jingjing, Chen, Zhi, Zhu, Lei and Lu, Ke. 2022. "Distinguishing Unseen from Seen for Generalized Zero-shot Learning." 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New Orleans, LA, United States 18 - 24 Jun 2022 United States. IEEE (Institute of Electrical and Electronics Engineers). https://doi.org/10.1109/CVPR52688.2022.00773
Semantics Disentangling for Generalized Zero-Shot Learning
Chen, Zhi, Luo, Yadan, Qui, Ruihong, Wang, Sen, Huang, Zi, Li, Jingjing and Zhang, Zheng. 2022. "Semantics Disentangling for Generalized Zero-Shot Learning." 2021 IEEE/CVF International Conference on Computer Vision (ICCV). Montreal, Canada 10 - 17 Oct 2021 United States. IEEE (Institute of Electrical and Electronics Engineers). https://doi.org/10.1109/ICCV48922.2021.00859
Local graph convolutional networks for cross-modal hashing
Zhang, Yudong, Wang, Sen, Lu, Jianglin, Chen, Zhi, Zhang, Zheng and Huang, Zi. 2021. "Local graph convolutional networks for cross-modal hashing." 29th ACM International Conference on Multimedia (MM '21). 20 - 24 Oct 2021 United States. Association for Computing Machinery (ACM). https://doi.org/10.1145/3474085.3475346
Domain Adaptive Semantic Segmentation Without Source Data
You, Fuming, Li, Jingjing, Zhu, Lei, Chen, Zhi and Huang, Zi. 2021. "Domain Adaptive Semantic Segmentation Without Source Data." 29th ACM International Conference on Multimedia (MM '21). 20 - 24 Oct 2021 United States. Association for Computing Machinery (ACM). https://doi.org/10.1145/3474085.3475482
Application of New International Classification of Adult‐Onset Diabetes in Chinese Inpatients with Diabetes Mellitus
Wang, Weihao, Pei, Xiaobei, Zhang, Lina, Chen, Zhi, Lin, Dong, Duan, Xiaoye, Fan, Jingwen, Pan, Qi and Guo, Lixin. 2021. "Application of New International Classification of Adult‐Onset Diabetes in Chinese Inpatients with Diabetes Mellitus." Diabetes - Metabolism: Research and Reviews. 37 (7). https://doi.org/10.1002/dmrr.3427
Mitigating Generation Shifts for Generalized Zero-Shot Learning
Chen, Zhi, Luo, Yadan, Wang, Sen, Qui, Ruihong, Li, Jingjing and Huang, Zi. 2021. "Mitigating Generation Shifts for Generalized Zero-Shot Learning." 29th ACM International Conference on Multimedia (MM '21). 20 - 24 Oct 2021 United States. Association for Computing Machinery (ACM). https://doi.org/10.1145/3474085.3475258
CausalRec: Causal Inference for Visual Debiasing in Visually-Aware Recommendation
Qui, Ruihong, Wang, Sen, Chen, Zhi, Yin, Hongzhi and Huang, Zi. 2021. "CausalRec: Causal Inference for Visual Debiasing in Visually-Aware Recommendation." 29th ACM International Conference on Multimedia (MM '21). 20 - 24 Oct 2021 United States. Association for Computing Machinery (ACM). https://doi.org/10.1145/3474085.3475266
Entropy-Based Uncertainty Calibration for Generalized Zero-Shot Learning
Chen, Zhi, Huang, Zi, Li, Jingjing and Zhang, Zheng. 2021. "Entropy-Based Uncertainty Calibration for Generalized Zero-Shot Learning." 32nd Australasian Database Conference (ADC 2021). Dunedin, New Zealand 29 Jan - 05 Feb 2021 Switzerland . Springer. https://doi.org/10.1007/978-3-030-69377-0_12
Rethinking Generative Zero-Shot Learning: An Ensemble Learning Perspective for Recognising Visual Patches
Chen, Zhi, Wang, Sen, Li, Jingjing and Huang, Zi. 2020. "Rethinking Generative Zero-Shot Learning: An Ensemble Learning Perspective for Recognising Visual Patches." 28th ACM International Conference on Multimedia (MM '20). Seattle, United States 12 - 16 Oct 2020 United States. Association for Computing Machinery (ACM). https://doi.org/10.1145/3394171.3413813
Canzsl: Cycle-Consistent Adversarial Networks for Zero-Shot Learning from Natural Language
Chen, Zhi, Li, Jingjing, Luo, Yadan, Huang, Zi and Yang, Yang. 2020. "Canzsl: Cycle-Consistent Adversarial Networks for Zero-Shot Learning from Natural Language." 2020 IEEE Winter Conference on Applications of Computer Vision (WACV). Snowmass, United States 01 - 05 Mar 2020 United Stated. IEEE (Institute of Electrical and Electronics Engineers). https://doi.org/10.1109/WACV45572.2020.9093610
Cycle-Consistent Diverse Image Synthesis from Natural Language
Chen, Zhi and Luo, Yadan. 2019. "Cycle-Consistent Diverse Image Synthesis from Natural Language." 2019 IEEE International Conference on Multimedia & Expo Workshops (ICMEW). Shanghai, China 08 - 12 Jul 2019 United States. IEEE (Institute of Electrical and Electronics Engineers). https://doi.org/10.1109/ICMEW.2019.00085