March 2026

Solution for 10th Competition on Ambivalence/Hesitancy (AH) Video Recognition Challenge using Divergence-Based Multimodal Fusion

digitado ⋅ 19 de March de 2026

arXiv:2603.16939v1 Announce Type: new Abstract: We address the Ambivalence/Hesitancy (A/H) Video Recognition Challenge at the 10th ABAW Competition (CVPR 2026). We propose a divergence-based multimodal fusion that explicitly measures cross-modal conflict between visual, audio, and textual channels. Visual features are encoded as Action Units (AUs) extracted via Py-Feat, audio via Wav2Vec 2.0, and text via BERT. Each modality is processed by a BiLSTM with attention pooling and projected into a shared embedding space. The fusion module computes pairwise […]

Ver mais

Like 0

Liked Liked

technocracy

Cryptographic Runtime Governance for Autonomous AI Systems: The Aegis Architecture for Verifiable Policy Enforcement

digitado ⋅ 19 de March de 2026

arXiv:2603.16938v1 Announce Type: new Abstract: Contemporary AI governance frameworks rely heavily on post hoc oversight, policy guidance, and behavioral alignment techniques, yet these mechanisms become fragile as systems gain autonomy, speed, and operational opacity. This paper presents Aegis, a runtime governance architecture for autonomous AI systems that treats policy and legal constraints as execution conditions rather than advisory principles. Aegis binds each governed agent to a cryptographically sealed Immutable Ethics Policy Layer (IEPL) at system genesis and enforces […]

Ver mais

Like 0

Liked Liked

technocracy

Integrating Explainable Machine Learning and Mixed-Integer Optimization for Personalized Sleep Quality Intervention

digitado ⋅ 19 de March de 2026

arXiv:2603.16937v1 Announce Type: new Abstract: Sleep quality is influenced by a complex interplay of behavioral, environmental, and psychosocial factors, yet most computational studies focus mainly on predictive risk identification rather than actionable intervention design. Although machine learning models can accurately predict subjective sleep outcomes, they rarely translate predictive insights into practical intervention strategies. To address this gap, we propose a personalized predictive-prescriptive framework that integrates interpretable machine learning with mixed-integer optimization. A supervised classifier trained on survey data […]

Ver mais

Like 0

Liked Liked

technocracy

TDMM-LM: Bridging Facial Understanding and Animation via Language Models

digitado ⋅ 19 de March de 2026

arXiv:2603.16936v1 Announce Type: new Abstract: Text-guided human body animation has advanced rapidly, yet facial animation lags due to the scarcity of well-annotated, text-paired facial corpora. To close this gap, we leverage foundation generative models to synthesize a large, balanced corpus of facial behavior. We design prompts suite covering emotions and head motions, generate about 80 hours of facial videos with multiple generators, and fit per-frame 3D facial parameters, yielding large-scale (prompt and parameter) pairs for training. Building on […]

Ver mais

Like 0

Liked Liked

technocracy

GenLie: A Global-Enhanced Lie Detection Network under Sparsity and Semantic Interference

digitado ⋅ 19 de March de 2026

arXiv:2603.16935v1 Announce Type: new Abstract: Video-based lie detection aims to identify deceptive behaviors from visual cues. Despite recent progress, its core challenge lies in learning sparse yet discriminative representations. Deceptive signals are typically subtle and short-lived, easily overwhelmed by redundant information, while individual and contextual variations introduce strong identity-related noise. To address this issue, we propose GenLie, a Global-Enhanced Lie Detection Network that performs local feature modeling under global supervision. Specifically, sparse and subtle deceptive cues are captured […]

Ver mais

Like 0

Liked Liked

technocracy

AgriChat: A Multimodal Large Language Model for Agriculture Image Understanding

digitado ⋅ 19 de March de 2026

arXiv:2603.16934v1 Announce Type: new Abstract: The deployment of Multimodal Large Language Models (MLLMs) in agriculture is currently stalled by a critical trade-off: the existing literature lacks the large-scale agricultural datasets required for robust model development and evaluation, while current state-of-the-art models lack the verified domain expertise necessary to reason across diverse taxonomies. To address these challenges, we propose the Vision-to-Verified-Knowledge (V2VK) pipeline, a novel generative AI-driven annotation framework that integrates visual captioning with web-augmented scientific retrieval to autonomously […]

Ver mais

Like 0

Liked Liked

technocracy

Look Where It Matters: High-Resolution Crops Retrieval for Efficient VLMs

digitado ⋅ 19 de March de 2026

arXiv:2603.16932v1 Announce Type: new Abstract: Vision-language models (VLMs) typically process images at a native high-resolution, forcing a trade-off between accuracy and computational efficiency: high-resolution inputs capture fine details but incur significant computational costs, while low-resolution inputs advocate for efficiency, they potentially miss critical visual information, like small text. We present AwaRes, a spatial-on-demand framework that resolves this accuracy-efficiency trade-off by operating on a low-resolution global view and using tool-calling to retrieve only high-resolution segments needed for a given […]

Ver mais

Like 0

Liked Liked

technocracy

Script-to-Slide Grounding: Grounding Script Sentences to Slide Objects for Automatic Instructional Video Generation

digitado ⋅ 19 de March de 2026

arXiv:2603.16931v1 Announce Type: new Abstract: While slide-based videos augmented with visual effects are widely utilized in education and research presentations, the video editing process — particularly applying visual effects to ground spoken content to slide objects — remains highly labor-intensive. This study aims to develop a system that automatically generates such instructional videos from slides and corresponding scripts. As a foundational step, this paper proposes and formulates Script-to-Slide Grounding (S2SG), defined as the task of grounding script sentences […]

Ver mais

Like 0

Liked Liked

technocracy

Facial beauty prediction fusing transfer learning and broad learning system

digitado ⋅ 19 de March de 2026

arXiv:2603.16930v1 Announce Type: new Abstract: Facial beauty prediction (FBP) is an important and challenging problem in the fields of computer vision and machine learning. Not only it is easily prone to overfitting due to the lack of large-scale and effective data, but also difficult to quickly build robust and effective facial beauty evaluation models because of the variability of facial appearance and the complexity of human perception. Transfer Learning can be able to reduce the dependence on large […]

Ver mais

Like 0

Liked Liked

technocracy

MHPO: Modulated Hazard-aware Policy Optimization for Stable Reinforcement Learning

digitado ⋅ 19 de March de 2026

arXiv:2603.16929v1 Announce Type: new Abstract: Regulating the importance ratio is critical for the training stability of Group Relative Policy Optimization (GRPO) based frameworks. However, prevailing ratio control methods, such as hard clipping, suffer from non-differentiable boundaries and vanishing gradient regions, failing to maintain gradient fidelity. Furthermore, these methods lack a hazard-aware mechanism to adaptively suppress extreme deviations, leaving the optimization process vulnerable to abrupt policy shifts. To address these challenges, we propose Modulated Hazard-aware Policy Optimization (MHPO), a […]

Ver mais

Like 0

Liked Liked