[期刊]
  • 《IEEE transactions on multimedia》 2023年25卷1期

摘要 : Fusion and interaction of multimodal features are essential for video question answering. Structural information composed of the relationships between different objects in videos is very complex, which restricts understanding and ... 展开

相关作者
相关关键词