[期刊]
  • 《IEEE transactions on multimedia》 2023年25卷1期

摘要 : Text-based visual question answering (VQA) requires to read and understand text in an image to correctly answer a given question. However, most current methods simply add optical character recognition (OCR) tokens extracted from t... 展开

相关作者
相关关键词