AgriM-LLM: An Agriculture-Specific Multimodal Large Language Model for Intelligent Crop Disease and Pest Management

Crop diseases and pests pose significant threats to global food security, demanding precise and efficient management solutions. While Multimodal Large Language Models (M-LLMs) offer promising avenues for intelligent agricultural diagnosis, general-purpose models often falter due to a lack of specialized visual feature extraction, inadequate understanding of agricultural terminology, and insufficient precision in prevention advice. To address these challenges, this paper introduces AgriM-LLM, a novel agriculture-specific multimodal large language model designed for enhanced crop disease and pest identification and prevention. AgriM-LLM integrates several key innovations: an Enhanced Vision Encoder featuring a Multi-Scale Feature Fusion module for capturing subtle visual symptoms; an Agriculture-Knowledge-Enhanced Q-Former that injects structured agricultural knowledge to guide cross-modal alignment; and a Domain-Adaptive Language Model employing a multi-stage progressive fine-tuning strategy for expert-level advice generation. Furthermore, an efficient LoRA-based fine-tuning strategy ensures practical computational resource utilization. Evaluated on a comprehensive Chinese agricultural multimodal dataset, AgriM-LLM consistently outperforms existing general-purpose and domain-specific baselines. Our ablation studies confirm the critical contribution of each proposed component, and detailed analyses demonstrate superior visual encoding, knowledge integration, and linguistic specialization. AgriM-LLM represents a significant step towards providing timely, accurate, and actionable intelligent decision support for farmers, thereby fostering sustainable agricultural development.

Liked Liked