中国科学技术信息研究所--国家工程技术数字图书馆

MTCAM: A Novel Weakly-Supervised Audio-Visual Saliency Prediction Model With Multi-Modal Transformer

[期刊]

《》 2024年8卷2期

原文获取收藏分享

摘要 : Although various video saliency models have achieved considerable performance gains, existing deep learning-based audio-visual saliency prediction models are still in the early exploration stage. The major challenge is that there ... 展开

作者	Dandan Zhu Kun Zhu Weiping Ding Nana Zhang Xiongkuo Min Guangtao Zhai Xiaokang Yang
作者单位	Institute of AI Education Shanghai East China Normal University Shanghai China Key Laboratory of Embedded System and Service Computing Ministry of Education Shanghai China\|National (Province-Ministry Joint) Collaborative Innovation Center for Financial Network Security Tongji University Shanghai China\|Department of Computer Science and Technology Tongji University Shanghai China School of Computer and Science and Technology Nantong University Jiangsu China School of Computer Science and Technology Donghua University Shanghai China Institute of Image Communication and Network Engineering Shanghai Jiao Tong University Shanghai China MoE Key Lab of Artificial Intelligence AI Institute Shanghai Jiao Tong University Shanghai China
页码/总页数	1756-1771 / 16
语种/中图分类号	英语 / TM0
关键词	Training Predictive models Feature extraction Visualization Task analysis Object detection Transformers
DOI	10.1109/TETCI.2024.3358184
馆藏号	IELEP0446