Knowing When to Answer: Adaptive Confidence Refinement for Reliable Audio-Visual Question Answering
arXiv:2602.04924v1 Announce Type: new Abstract: We present a formal problem formulation for textit{Reliable} Audio-Visual Question Answering ($mathcal{R}$-AVQA), where we prefer abstention over answering incorrectly. While recent AVQA models have high accuracy, their ability to identify when they are likely wrong and their consequent abstention from answering remain underexplored areas of research. To fill this gap, we explore several approaches and then propose Adaptive Confidence Refinement (ACR), a lightweight method to further enhance the performance of $mathcal{R}$-AVQA. Our key […]