RA-QA: Towards Respiratory Audio-based Health Question Answering

arXiv:2602.18452v1 Announce Type: new
Abstract: Respiratory diseases are a leading cause of death globally, highlighting the urgent need for early and accessible screening methods. While some lung auscultation analysis has been automated and machine learning audio based models are able to predict respiratory pathologies, there remains a critical gap: the lack of intelligent systems that can interact in real-time consultations using natural language. Unlike other clinical domains, such as electronic health records, radiological images, and biosignals, where numerous question-answering (QA) datasets and models have been established, audio-based modalities remain notably underdeveloped.
We curated and harmonized data from 11 diverse respiratory audio datasets to construct the first Respiratory Audio Question Answering (RA-QA) dataset. As the first multimodal QA resource of its kind focused specifically on respiratory health, RA-QA bridges clinical audio and natural language in a structured, scalable format. This new data resource contains about 7.5 million QA pairs spanning more than 60 attributes and three question types: single verification, multiple choice, and open-ended questions. Building upon this dataset, we introduce a novel benchmark that compares audio-text generation models with traditional audio classifiers to evaluate their respective performance.\Our experiments reveal interesting performance variations across different attributes and question types, establishing a baseline and paving the way for more advanced architectures that could further improve the performance. By bridging machine learning with real-world clinical dialogue, our work opens the door to the development of more interactive, intelligent, and accessible diagnostic tools in respiratory healthcare.

Liked Liked