Overcoming Language Priors via Shuffling Language Bias for Robust Visual Question Answering
Recent research has revealed the notorious language prior problem in visual question answering (VQA) tasks based on visual-textual interaction, which indicates that well-developed VQA models rely essie cause and reflect on learning shortcuts from questions without fully considering visual evidence.To tackle this problem, most existing methods focus