
Xiangxi Shi Stefan Lee
WACV 2024
When faced with an out-of-distribution (OOD) question or image, VQA systems may provide unreliable answers. This work benchmarks a suite of OOD detection approaches for the multimodal VQA task, using popular VQA datasets and composite settings to isolate different OOD factors (e.g., visual novelty, linguistic novelty, and image–question agreement). The results suggest that answer confidence alone is often a poor signal, while question-generation-based and attention-based methods can significantly improve detection, though ungrounded pairs and small image distribution shifts remain challenging.
BibTeX
@InProceedings{Shi_2024_WACV,
author = {Shi, Xiangxi and Lee, Stefan},
title = {Benchmarking Out-of-Distribution Detection in Visual Question Answering},
booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
month = {January},
year = {2024},
pages = {5485-5495}
}