Automated Classification of Research Papers Toward Sustainable Development Goals: A Boolean Query-Based Computational Framework
arXiv:2601.16988v1 Announce Type: new
Abstract: The rapid expansion of scholarly publications across diverse disciplines has made it increasingly difficult to systematically evaluate how research contributes to the United Nations Sustainable Development Goals (SDGs). Domain classification of research articles done manually through research experts is extremely impractical because of the number of publications, expensive in time and may not be consistent when done by human beings. This paper proposes an automated and rule-based computational model of classifying research papers based on SDGs with expert curated Boolean query mappings to overcome these challenges. The proposed system has a web-based interface to input data and display results, a backend application programming interface to do high throughput processing, and a Python-based classification engine which uses structured Boolean expressions to process bibliographic metadata (titles, abstracts, and keywords). The framework can be used to support single-paper-based classification and batch-based classification as well as offer clear and understandable outputs that clearly show what query parts motivated each SDG assignment. The experimental testing on massive bibliographic data sets has shown that the system can process thousands of research records in an hour with reproducible and consistent results. The proposed approach provides a viable solution to institutions, researchers and policymakers who are interested in analysis of research alignment with the goal of sustainability in a systematic fashion that would not involve the use of machine learning models whose inputs and outputs are not easily understandable.