March 2026

TimeSpot: Benchmarking Geo-Temporal Understanding in Vision-Language Models in Real-World Settings

digitado ⋅ 10 de March de 2026

arXiv:2603.06687v1 Announce Type: new Abstract: Geo-temporal understanding, the ability to infer location, time, and contextual properties from visual input alone, underpins applications such as disaster management, traffic planning, embodied navigation, world modeling, and geography education. Although recent vision-language models (VLMs) have advanced image geo-localization using cues like landmarks and road signs, their ability to reason about temporal signals and physically grounded spatial cues remains limited. To address this gap, we introduce TimeSpot, a benchmark for evaluating real-world geo-temporal […]

Ver mais

Like 0

Liked Liked

technocracy

One step further with Monte-Carlo sampler to guide diffusion better

digitado ⋅ 10 de March de 2026

arXiv:2603.06685v1 Announce Type: new Abstract: Stochastic differential equation (SDE)-based generative models have achieved substantial progress in conditional generation via training-free differentiable loss-guided approaches. However, existing methodologies utilizing posterior sam- pling typically confront a substantial estimation error, which results in inaccu- rate gradients for guidance and leading to inconsistent generation results. To mitigate this issue, we propose that performing an additional backward denois- ing step and Monte-Carlo sampling (ABMS) can achieve better guided diffu- sion, which is a plug-and-play […]

Ver mais

Like 0

Liked Liked

technocracy

Three-dimensional reconstruction and segmentation of an aggregate stockpile for size and shape analyses

digitado ⋅ 10 de March de 2026

arXiv:2603.06684v1 Announce Type: new Abstract: Aggregate size and shape are key properties for determining quality of aggregate materials used in road construction and transportation geotechnics applications. The composition and packing, layer stiffness, and load response are all influenced by these morphological characteristics of aggregates. Many aggregate imaging systems developed to date only focus on analyses of individual or manually separated aggregate particles. There is a need to develop a convenient and affordable system for acquiring 3D aggregate information […]

Ver mais

Like 0

Liked Liked

technocracy

ECHO: Event-Centric Hypergraph Operations via Multi-Agent Collaboration for Multimedia Event Extraction

digitado ⋅ 10 de March de 2026

arXiv:2603.06683v1 Announce Type: new Abstract: Multimedia Event Extraction (M2E2) involves extracting structured event records from both textual and visual content. Existing approaches, ranging from specialized architectures to direct Large Language Model (LLM) prompting, typically rely on a linear, end-to-end generation and thus suffer from cascading errors: early cross-modal misalignments often corrupt downstream role assignment under strict grounding constraints. We propose ECHO (Event-Centric Hypergraph Operations), a multi-agent framework that iteratively refines a shared Multimedia Event Hypergraph (MEHG), which serves […]

Ver mais

Like 0

Liked Liked

technocracy

RADAR: A Multimodal Benchmark for 3D Image-Based Radiology Report Review

digitado ⋅ 10 de March de 2026

arXiv:2603.06681v1 Announce Type: new Abstract: Radiology reports for the same patient examination may contain clinically meaningful discrepancies arising from interpretation differences, reporting variability, or evolving assessments. Systematic analysis of such discrepancies is important for quality assurance, clinical decision support, and multimodal model development, yet remains limited by the lack of standardized benchmarks. We present RADAR, a multimodal benchmark for radiology report discrepancy analysis that pairs 3D medical images with a preliminary report and corresponding candidate edits for the […]

Ver mais

Like 0

Liked Liked

technocracy

VB: Visibility Benchmark for Visibility and Perspective Reasoning in Images

digitado ⋅ 10 de March de 2026

arXiv:2603.06680v1 Announce Type: new Abstract: We present VB, a benchmark that tests whether vision-language models can determine what is and is not visible in a photograph, and abstain when a human viewer cannot reliably answer. Each item pairs a single photo with a short yes/no visibility claim; the model must output VISIBLY_TRUE, VISIBLY_FALSE, or ABSTAIN, together with a confidence score. Items are organized into 100 families using a 2×2 design that crosses a minimal image edit with a […]

Ver mais

Like 0

Liked Liked

technocracy

MultiGen: Level-Design for Editable Multiplayer Worlds in Diffusion Game Engines

digitado ⋅ 10 de March de 2026

arXiv:2603.06679v1 Announce Type: new Abstract: Video world models have shown immense promise for interactive simulation and entertainment, but current systems still struggle with two important aspects of interactivity: user control over the environment for reproducible, editable experiences, and shared inference where players hold influence over a common world. To address these limitations, we introduce an explicit external memory into the system, a persistent state operating independent of the model’s context window, that is continually updated by user actions […]

Ver mais

Like 0

Liked Liked

technocracy

The mathematical landscape of partial information decomposition: A comprehensive review of properties and measures

digitado ⋅ 10 de March de 2026

arXiv:2603.06678v1 Announce Type: new Abstract: Partial Information Decomposition (PID) has become one of the most prominent information-theoretic frameworks for describing the structure and quality of information in complex systems. Despite its widespread utility, there exists no unique solution constraining precisely how a PID should be constructed, leading to a multiverse of different formalisms with different mathematical commitments. In this work, we provide a comprehensive overview of the mathematical landscape of PID. By integrating existing PID measures into a […]

Ver mais

Like 0

Liked Liked

technocracy

Chart Deep Research in LVLMs via Parallel Relative Policy Optimization

digitado ⋅ 10 de March de 2026

arXiv:2603.06677v1 Announce Type: new Abstract: With the rapid advancement of data science, charts have evolved from simple numerical presentation tools to essential instruments for insight discovery and decision-making support. However, current chart data intelligence exhibits significant limitations in deep research capabilities, with existing methods predominantly addressing shallow tasks such as visual recognition or factual question-answering, rather than the complex reasoning and high-level data analysis that deep research requires. This limitation stems from two primary technical bottlenecks: at the […]

Ver mais

Like 0

Liked Liked

technocracy

XAI and Few-shot-based Hybrid Classification Model for Plant Leaf Disease Prognosis

digitado ⋅ 10 de March de 2026

arXiv:2603.06676v1 Announce Type: new Abstract: Performing a timely and accurate identification of crop diseases is vital to maintain agricultural productivity and food security. The current work presents a hybrid few-shot learning model that integrates Explainable Artificial Intelligence (XAI) and Few-Shot Learning (FSL) to address the challenge of identifying and classifying the stages of disease of the diseases of maize, rice, and wheat leaves under limited annotated data conditions. The proposed model integrates Siamese and Prototypical Networks within an […]

Ver mais

Like 0

Liked Liked