March 2026

AgriChat: A Multimodal Large Language Model for Agriculture Image Understanding

digitado ⋅ 19 de March de 2026

arXiv:2603.16934v1 Announce Type: new Abstract: The deployment of Multimodal Large Language Models (MLLMs) in agriculture is currently stalled by a critical trade-off: the existing literature lacks the large-scale agricultural datasets required for robust model development and evaluation, while current state-of-the-art models lack the verified domain expertise necessary to reason across diverse taxonomies. To address these challenges, we propose the Vision-to-Verified-Knowledge (V2VK) pipeline, a novel generative AI-driven annotation framework that integrates visual captioning with web-augmented scientific retrieval to autonomously […]

Ver mais

Like 0

Liked Liked

technocracy

Look Where It Matters: High-Resolution Crops Retrieval for Efficient VLMs

digitado ⋅ 19 de March de 2026

arXiv:2603.16932v1 Announce Type: new Abstract: Vision-language models (VLMs) typically process images at a native high-resolution, forcing a trade-off between accuracy and computational efficiency: high-resolution inputs capture fine details but incur significant computational costs, while low-resolution inputs advocate for efficiency, they potentially miss critical visual information, like small text. We present AwaRes, a spatial-on-demand framework that resolves this accuracy-efficiency trade-off by operating on a low-resolution global view and using tool-calling to retrieve only high-resolution segments needed for a given […]

Ver mais

Like 0

Liked Liked

technocracy

Script-to-Slide Grounding: Grounding Script Sentences to Slide Objects for Automatic Instructional Video Generation

digitado ⋅ 19 de March de 2026

arXiv:2603.16931v1 Announce Type: new Abstract: While slide-based videos augmented with visual effects are widely utilized in education and research presentations, the video editing process — particularly applying visual effects to ground spoken content to slide objects — remains highly labor-intensive. This study aims to develop a system that automatically generates such instructional videos from slides and corresponding scripts. As a foundational step, this paper proposes and formulates Script-to-Slide Grounding (S2SG), defined as the task of grounding script sentences […]

Ver mais

Like 0

Liked Liked

technocracy

Facial beauty prediction fusing transfer learning and broad learning system

digitado ⋅ 19 de March de 2026

arXiv:2603.16930v1 Announce Type: new Abstract: Facial beauty prediction (FBP) is an important and challenging problem in the fields of computer vision and machine learning. Not only it is easily prone to overfitting due to the lack of large-scale and effective data, but also difficult to quickly build robust and effective facial beauty evaluation models because of the variability of facial appearance and the complexity of human perception. Transfer Learning can be able to reduce the dependence on large […]

Ver mais

Like 0

Liked Liked

technocracy

MHPO: Modulated Hazard-aware Policy Optimization for Stable Reinforcement Learning

digitado ⋅ 19 de March de 2026

arXiv:2603.16929v1 Announce Type: new Abstract: Regulating the importance ratio is critical for the training stability of Group Relative Policy Optimization (GRPO) based frameworks. However, prevailing ratio control methods, such as hard clipping, suffer from non-differentiable boundaries and vanishing gradient regions, failing to maintain gradient fidelity. Furthermore, these methods lack a hazard-aware mechanism to adaptively suppress extreme deviations, leaving the optimization process vulnerable to abrupt policy shifts. To address these challenges, we propose Modulated Hazard-aware Policy Optimization (MHPO), a […]

Ver mais

Like 0

Liked Liked

technocracy

Noticing the Watcher: LLM Agents Can Infer CoT Monitoring from Blocking Feedback

digitado ⋅ 19 de March de 2026

arXiv:2603.16928v1 Announce Type: new Abstract: Chain-of-thought (CoT) monitoring is proposed as a method for overseeing the internal reasoning of language-model agents. Prior work has shown that when models are explicitly informed that their reasoning is being monitored, or are fine-tuned to internalize this fact, they may learn to obfuscate their CoTs in ways that allow them to evade CoT-based monitoring systems. We ask whether reasoning agents can autonomously infer that their supposedly private CoT is under surveillance, and […]

Ver mais

Like 0

Liked Liked

technocracy

Leveraging Large Vision Model for Multi-UAV Co-perception in Low-Altitude Wireless Networks

digitado ⋅ 19 de March de 2026

arXiv:2603.16927v1 Announce Type: new Abstract: Multi-uncrewed aerial vehicle (UAV) cooperative perception has emerged as a promising paradigm for diverse low-altitude economy applications, where complementary multi-view observations are leveraged to enhance perception performance via wireless communications. However, the massive visual data generated by multiple UAVs poses significant challenges in terms of communication latency and resource efficiency. To address these challenges, this paper proposes a communication-efficient cooperative perception framework, termed Base-Station-Helped UAV (BHU), which reduces communication overhead while enhancing perception […]

Ver mais

Like 0

Liked Liked

technocracy

Music Source Restoration with Ensemble Separation and Targeted Reconstruction

digitado ⋅ 19 de March de 2026

arXiv:2603.16926v1 Announce Type: new Abstract: The Inaugural Music Source Restoration (MSR) Challenge targets the recovery of original, unprocessed stems from fully mixed and mastered music. Unlike conventional music source separation, MSR requires reversing complex production processes such as equalization, compression, reverberation, and other real-world degradations. To address MSR, we propose a two-stage system. First, an ensemble of pre-trained separation models produces preliminary source estimates. Then a set of pre-trained BSRNN-based restoration models performs targeted reconstruction to refine these […]

Ver mais

Like 0

Liked Liked

technocracy

VisceroHaptics: Investigating the Effects of Gut-based Audio-Haptic Feedback on Gastric Feelings and Gastric Interoceptive Behavior

digitado ⋅ 19 de March de 2026

arXiv:2603.16919v1 Announce Type: new Abstract: Gastric interoception influences eating behavior and emotions, making its modulation valuable for healthcare and human-computer-interaction applications. However, whether gastric interoception can be modulated noninvasively in humans remains unclear. While previous research indicates that abdominal-sound-driven haptic feedback resembles gut sensations, its impact on feelings and gastric interoceptive behavior is unknown. We conducted three experiments totalling 55 participants to investigate how gut-sound-driven audio-haptic feedback applied to the stomach (1) affects user’s feelings (2) influences perception […]

Ver mais

Like 0

Liked Liked

technocracy

Privacy and Safety Experiences and Concerns of U.S. Women Using Generative AI for Seeking Sexual and Reproductive Health Information

digitado ⋅ 19 de March de 2026

arXiv:2603.16918v1 Announce Type: new Abstract: The rapid adoption of generative AI (GenAI) chatbots has reshaped access to sexual and reproductive health (SRH) information, particularly following the overturning of Roe v. Wade, as individuals assigned female at birth increasingly turn to online sources. However, existing research remains largely model-centered, paying limited attention to user privacy and safety. We conducted semi-structured interviews with 18 U.S.-based participants from both restrictive and non-restrictive states who had used GenAI chatbots to seek SRH […]

Ver mais

Like 0

Liked Liked