Enhancing Tool Calling in LLMs with the International Tool Calling Dataset

arXiv:2603.05515v1 Announce Type: new
Abstract: Tool calling allows large language models (LLMs) to interact with external systems like APIs, enabling applications in customer support, data analysis, and dynamic content generation. While recent benchmarks have advanced tool-use research, they suffer from key limitations, including reliance on simulated or restricted APIs, limited reproducibility, and a lack of cultural and geographic diversity. To address these gaps, we introduce International Tool Calling (ITC), a large-scale, multilingual benchmark designed for realistic, globally distributed tool-calling scenarios. ITC includes 3,571 real APIs and 17,540 tool calling tasks across 20 categories and 40 countries. Experiments reveal substantial performance gaps between open- and closed-source LLMs, while fine-tuning on ITC yields significant improvements, particularly for non-English queries, enhancing cross-lingual generalization, reasoning consistency, and robustness to out-of-domain tools. ITC provides a valuable benchmark for advancing LLM robustness and performance in complex, multi-tool, and international scenarios. Dataset: https://anonymous.4open.science/r/International-Tool-Calling-ITC-dataset-FAF4/.

Liked Liked