February 2026

Is Hierarchical Quantization Essential for Optimal Reconstruction?

digitado ⋅ 2 de February de 2026

arXiv:2601.22244v1 Announce Type: new Abstract: Vector-quantized variational autoencoders (VQ-VAEs) are central to models that rely on high reconstruction fidelity, from neural compression to generative pipelines. Hierarchical extensions, such as VQ-VAE2, are often credited with superior reconstruction performance because they split global and local features across multiple levels. However, since higher levels derive all their information from lower levels, they should not carry additional reconstructive content beyond what the lower-level already encodes. Combined with recent advances in training objectives […]

Ver mais

Like 0

Liked Liked

technocracy

Aligning Microscopic Vehicle and Macroscopic Traffic Statistics: Reconstructing Driving Behavior from Partial Data

digitado ⋅ 2 de February de 2026

arXiv:2601.22242v1 Announce Type: new Abstract: A driving algorithm that aligns with good human driving practices, or at the very least collaborates effectively with human drivers, is crucial for developing safe and efficient autonomous vehicles. In practice, two main approaches are commonly adopted: (i) supervised or imitation learning, which requires comprehensive naturalistic driving data capturing all states that influence a vehicle’s decisions and corresponding actions, and (ii) reinforcement learning (RL), where the simulated driving environment either matches or is […]

Ver mais

Like 0

Liked Liked

technocracy

Investigating the Interplay of Parameterization and Optimizer in Gradient-Free Topology Optimization: A Cantilever Beam Case Study

digitado ⋅ 2 de February de 2026

arXiv:2601.22241v1 Announce Type: new Abstract: Gradient-free black-box optimization (BBO) is widely used in engineering design and provides a flexible framework for topology optimization (TO), enabling the discovery of high-performing structural designs without requiring gradient information from simulations. Yet, its success depends on two key choices: the geometric parameterization defining the search space and the optimizer exploring it. This study investigates this interplay through a compliance minimization problem for a cantilever beam subject to a connectivity constraint. We benchmark […]

Ver mais

Like 0

Liked Liked

technocracy

A Systematic Literature Review on LLM Defenses Against Prompt Injection and Jailbreaking: Expanding NIST Taxonomy

digitado ⋅ 2 de February de 2026

arXiv:2601.22240v1 Announce Type: new Abstract: The rapid advancement and widespread adoption of generative artificial intelligence (GenAI) and large language models (LLMs) has been accompanied by the emergence of new security vulnerabilities and challenges, such as jailbreaking and other prompt injection attacks. These maliciously crafted inputs can exploit LLMs, causing data leaks, unauthorized actions, or compromised outputs, for instance. As both offensive and defensive prompt injection techniques evolve quickly, a structured understanding of mitigation strategies becomes increasingly important. To […]

Ver mais

Like 0

Liked Liked

technocracy

Geometry without Position? When Positional Embeddings Help and Hurt Spatial Reasoning

digitado ⋅ 2 de February de 2026

arXiv:2601.22231v1 Announce Type: new Abstract: This paper revisits the role of positional embeddings (PEs) within vision transformers (ViTs) from a geometric perspective. We show that PEs are not mere token indices but effectively function as geometric priors that shape the spatial structure of the representation. We introduce token-level diagnostics that measure how multi-view geometric consistency in ViT representation depends on consitent PEs. Through extensive experiments on 14 foundation ViT models, we reveal how PEs influence multi-view geometry and […]

Ver mais

Like 0

Liked Liked

technocracy

DAJ: Data-Reweighted LLM Judge for Test-Time Scaling in Code Generation

digitado ⋅ 2 de February de 2026

arXiv:2601.22230v1 Announce Type: new Abstract: Test-time scaling for code generation commonly relies on Best-of-N selection, in which multiple candidate solutions are sampled from a base model, and the best one is selected by an LLM judge. However, training reliable LLM judges is challenging due to severe distribution shifts, including imbalances between easy and hard problems, mismatches between training tasks and evaluation benchmarks, and trajectory mismatch arising from training data generated by cheaper models whose behavior differs from that […]

Ver mais

Like 0

Liked Liked

technocracy

Lost in Space? Vision-Language Models Struggle with Relative Camera Pose Estimation

digitado ⋅ 2 de February de 2026

arXiv:2601.22228v1 Announce Type: new Abstract: Vision-Language Models (VLMs) perform well in 2D perception and semantic reasoning compared to their limited understanding of 3D spatial structure. We investigate this gap using relative camera pose estimation (RCPE), a fundamental vision task that requires inferring relative camera translation and rotation from a pair of images. We introduce VRRPI-Bench, a benchmark derived from unlabeled egocentric videos with verbalized annotations of relative camera motion, reflecting realistic scenarios with simultaneous translation and rotation around […]

Ver mais

Like 0

Liked Liked

technocracy

What Lies Beneath: A Call for Distribution-based Visual Question & Answer Datasets

digitado ⋅ 2 de February de 2026

arXiv:2601.22218v1 Announce Type: new Abstract: Visual Question Answering (VQA) has become an important benchmark for assessing how large multimodal models (LMMs) interpret images. However, most VQA datasets focus on real-world images or simple diagrammatic analysis, with few focused on interpreting complex scientific charts. Indeed, many VQA datasets that analyze charts do not contain the underlying data behind those charts or assume a 1-to-1 correspondence between chart marks and underlying data. In reality, charts are transformations (i.e. analysis, simplification, […]

Ver mais

Like 0

Liked Liked

technocracy

Latent Spherical Flow Policy for Reinforcement Learning with Combinatorial Actions

digitado ⋅ 2 de February de 2026

arXiv:2601.22211v1 Announce Type: new Abstract: Reinforcement learning (RL) with combinatorial action spaces remains challenging because feasible action sets are exponentially large and governed by complex feasibility constraints, making direct policy parameterization impractical. Existing approaches embed task-specific value functions into constrained optimization programs or learn deterministic structured policies, sacrificing generality and policy expressiveness. We propose a solver-induced emph{latent spherical flow policy} that brings the expressiveness of modern generative policies to combinatorial RL while guaranteeing feasibility by design. Our method, […]

Ver mais

Like 0

Liked Liked

technocracy

Learning to Recommend Multi-Agent Subgraphs from Calling Trees

digitado ⋅ 2 de February de 2026

arXiv:2601.22209v1 Announce Type: new Abstract: Multi-agent systems (MAS) increasingly solve complex tasks by orchestrating agents and tools selected from rapidly growing marketplaces. As these marketplaces expand, many candidates become functionally overlapping, making selection not just a retrieval problem: beyond filtering relevant agents, an orchestrator must choose options that are reliable, compatible with the current execution context, and able to cooperate with other selected agents. Existing recommender systems — largely built for item-level ranking from flat user-item logs — […]

Ver mais

Like 0

Liked Liked