中国科学技术信息研究所--国家工程技术数字图书馆

RUArt: A Novel Text-Centered Solution for Text-Based Visual Question Answering

[期刊]

《IEEE transactions on multimedia》 2023年25卷1期

原文获取收藏分享

摘要 : Text-based visual question answering (VQA) requires to read and understand text in an image to correctly answer a given question. However, most current methods simply add optical character recognition (OCR) tokens extracted from t... 展开

作者	Zan-Xia Jin Heran Wu Chun Yang Fang Zhou Jingyan Qin Lei Xiao Xu-Cheng Yin
作者单位	Department of Computer Science and Technology School of Computer and Communication Engineering University of Science and Technology Beijing Beijing China Department of Industrial Design School of Mechanical Engineering University of Science and Technology Beijing Beijing China\|Department of Computer Science and Technology School of Computer and Communication Engineering University of Science and Technology Beijing Beijing China Tencent Technology (Shenzhen) Company Limited Shenzhen China Department of Computer Science and Technology School of Computer and Communication Engineering University of Science and Technology Beijing Beijing China\|Institute of Artificial Intelligence University of Science and Technology Beijing Beijing China\|USTB-EEasyTech Joint Laboratory of Artificial Intelligence University of Science and Technology Beijing Beijing China
期刊名称	《IEEE transactions on multimedia 》
页码/总页数	1-12 / 12
语种/中图分类号	英语 / TP37
关键词	Optical character recognition software Semantics Visualization Cognition Knowledge discovery Task analysis Electronic mail
DOI	10.1109/TMM.2021.3120194
馆藏号	IELEP0172