March 2026

CineSRD: Leveraging Visual, Acoustic, and Linguistic Cues for Open-World Visual Media Speaker Diarization

digitado ⋅ 19 de March de 2026

arXiv:2603.16966v1 Announce Type: new Abstract: Traditional speaker diarization systems have primarily focused on constrained scenarios such as meetings and interviews, where the number of speakers is limited and acoustic conditions are relatively clean. To explore open-world speaker diarization, we extend this task to the visual media domain, encompassing complex audiovisual programs such as films and TV series. This new setting introduces several challenges, including long-form video understanding, a large number of speakers, cross-modal asynchrony between audio and visual […]

Ver mais

Like 0

Liked Liked

technocracy

Behavior-Centric Extraction of Scenarios from Highway Traffic Data and their Domain-Knowledge-Guided Clustering using CVQ-VAE

digitado ⋅ 19 de March de 2026

arXiv:2603.16964v1 Announce Type: new Abstract: Approval of ADS depends on evaluating its behavior within representative real-world traffic scenarios. A common way to obtain such scenarios is to extract them from real-world data recordings. These can then be grouped and serve as basis on which the ADS is subsequently tested. This poses two central challenges: how scenarios are extracted and how they are grouped. Existing extraction methods rely on heterogeneous definitions, hindering scenario comparability. For the grouping of scenarios, […]

Ver mais

Like 0

Liked Liked

technocracy

Impacts of Electric Vehicle Charging Regimes and Infrastructure Deployments on System Performance: An Agent-Based Study

digitado ⋅ 19 de March de 2026

arXiv:2603.16961v1 Announce Type: new Abstract: The rapid growth of electric vehicles (EVs) requires more effective charging infrastructure planning. Infrastructure layout not only determines deployment cost, but also reshapes charging behavior and influences overall system performance. In addition, destination charging and en-route charging represent distinct charging regimes associated with different power requirements, which may lead to substantially different infrastructure deployment outcomes. This study applies an agent-based modeling framework to generate trajectory-level latent public charging demand under three charging regimes […]

Ver mais

Like 0

Liked Liked

technocracy

Adversarial attacks against Modern Vision-Language Models

digitado ⋅ 19 de March de 2026

arXiv:2603.16960v1 Announce Type: new Abstract: We study adversarial robustness of open-source vision-language model (VLM) agents deployed in a self-contained e-commerce environment built to simulate realistic pre-deployment conditions. We evaluate two agents, LLaVA-v1.5-7B and Qwen2.5-VL-7B, under three gradient-based attacks: the Basic Iterative Method (BIM), Projected Gradient Descent (PGD), and a CLIP-based spectral attack. Against LLaVA, all three attacks achieve substantial attack success rates (52.6%, 53.8%, and 66.9% respectively), demonstrating that simple gradient-based methods pose a practical threat to open-source […]

Ver mais

Like 0

Liked Liked

technocracy

PhysQuantAgent: An Inference Pipeline of Mass Estimation for Vision-Language Models

digitado ⋅ 19 de March de 2026

arXiv:2603.16958v1 Announce Type: new Abstract: Vision-Language Models (VLMs) are increasingly applied to robotic perception and manipulation, yet their ability to infer physical properties required for manipulation remains limited. In particular, estimating the mass of real-world objects is essential for determining appropriate grasp force and ensuring safe interaction. However, current VLMs lack reliable mass reasoning capabilities, and most existing benchmarks do not explicitly evaluate physical quantity estimation under realistic sensing conditions. In this work, we propose PhysQuantAgent, a framework […]

Ver mais

Like 0

Liked Liked

technocracy

On the Extension Theorem for Packing Steiner Forests

digitado ⋅ 19 de March de 2026

arXiv:2603.16956v1 Announce Type: new Abstract: We consider the problem of packing edge-disjoint Steiner forests in a graph. The input consists of a multi-graph $G=(V,E)$ and a collection of $h$ vertex subsets $S = {S_1,S_2,ldots,S_h}$. A Steiner forest for $S$, also called an $S$-forest, is a forest of $G$ in which each $S_i$ is connected. In the case where $h=1$, this is the Steiner Tree packing problem. Kriesell’s conjecture postulates that $2k$-edge-connectivity of $S_1$ is sufficient to find $k$ […]

Ver mais

Like 0

Liked Liked

technocracy

Minimum-Action Learning: Energy-Constrained Symbolic Model Selection for Physical Law Identification from Noisy Data

digitado ⋅ 19 de March de 2026

arXiv:2603.16951v1 Announce Type: new Abstract: Identifying physical laws from noisy observational data is a central challenge in scientific machine learning. We present Minimum-Action Learning (MAL), a framework that selects symbolic force laws from a pre-specified basis library by minimizing a Triple-Action functional combining trajectory reconstruction, architectural sparsity, and energy-conservation enforcement. A wide-stencil acceleration-matching technique reduces noise variance by 10,000x, transforming an intractable problem (SNR ~0.02) into a learnable one (SNR ~1.6); this preprocessing is the critical enabler shared […]

Ver mais

Like 0

Liked Liked

technocracy

Entropy-Aware Task Offloading in Mobile Edge Computing

digitado ⋅ 19 de March de 2026

arXiv:2603.16949v1 Announce Type: new Abstract: Mobile Edge Computing (MEC) technology has been introduced to enable could computing at the edge of the network in order to help resource limited mobile devices with time sensitive data processing tasks. In this paradigm, mobile devices can offload their computationally heavy tasks to more efficient nearby MEC servers via wireless communication. Consequently, the main focus of researches on the subject has been on development of efficient offloading schemes, leaving the privacy of […]

Ver mais

Like 0

Liked Liked

technocracy

EmergeNav: Structured Embodied Inference for Zero-Shot Vision-and-Language Navigation in Continuous Environments

digitado ⋅ 19 de March de 2026

arXiv:2603.16947v1 Announce Type: new Abstract: Zero-shot vision-and-language navigation in continuous environments (VLN-CE) remains challenging for modern vision-language models (VLMs). Although these models encode useful semantic priors, their open-ended reasoning does not directly translate into stable long-horizon embodied execution. We argue that the key bottleneck is not missing knowledge alone, but missing an execution structure for organizing instruction following, perceptual grounding, temporal progress, and stage verification. We propose EmergeNav, a zero-shot framework that formulates continuous VLN as structured embodied […]

Ver mais

Like 0

Liked Liked

technocracy

Joint Optimization of Storage and Loading for High-Performance 3D Point Cloud Data Processing

digitado ⋅ 19 de March de 2026

arXiv:2603.16945v1 Announce Type: new Abstract: With the rapid development of computer vision and deep learning, significant advancements have been made in 3D vision, partic- ularly in autonomous driving, robotic perception, and augmented reality. 3D point cloud data, as a crucial representation of 3D information, has gained widespread attention. However, the vast scale and complexity of point cloud data present significant chal- lenges for loading and processing and traditional algorithms struggle to handle large-scale datasets.The diversity of storage formats […]

Ver mais

Like 0

Liked Liked