摘要 : Text-based visual question answering (VQA) requires to read and understand text in an image to correctly answer a given question. However, most current methods simply add optical character recognition (OCR) tokens extracted from t... 展开
作者 | Zan-Xia Jin Heran Wu Chun Yang Fang Zhou Jingyan Qin Lei Xiao Xu-Cheng Yin |
---|---|
作者单位 | |
期刊名称 | 《IEEE transactions on multimedia 》 |
页码/总页数 | 1-12 / 12 |
语种/中图分类号 | 英语 / TP37 |
关键词 | Optical character recognition software Semantics Visualization Cognition Knowledge discovery Task analysis Electronic mail |
DOI | 10.1109/TMM.2021.3120194 |
馆藏号 | IELEP0172 |