digitado – Page 329

The Causal Impact of Tool Affordance on Safety Alignment in LLM Agents

digitado ⋅ 24 de March de 2026

arXiv:2603.20320v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly deployed as agents with access to executable tools, enabling direct interaction with external systems. However, most safety evaluations remain text-centric and assume that compliant language implies safe behavior, an assumption that becomes unreliable once models are allowed to act. In this work, we empirically examine how executable tool affordance alters safety alignment in LLM agents using a paired evaluation framework that compares text-only chatbot behavior with tool-enabled […]

Ver mais

Like 0

Liked Liked

technocracy

Language Family Matters: Evaluating LLM-Based ASR Across Linguistic Boundaries

digitado ⋅ 28 de January de 2026

arXiv:2601.18899v1 Announce Type: new Abstract: Large Language Model (LLM)-powered Automatic Speech Recognition (ASR) systems achieve strong performance with limited resources by linking a frozen speech encoder to a pretrained LLM via a lightweight connector. Prior work trains a separate connector per language, overlooking linguistic relatedness. We propose an efficient and novel connector-sharing strategy based on linguistic family membership, enabling one connector per family, and empirically validate its effectiveness across two multilingual LLMs and two real-world corpora spanning curated […]

Ver mais

Like 0

Liked Liked

technocracy

Dark Patterns and Consumer Protection Law for App Makers

digitado ⋅ 12 de March de 2026

arXiv:2603.10020v1 Announce Type: new Abstract: Dark patterns in online commerce, especially deceptive user interface designs for apps and websites, undermine consumer autonomy and distort online markets. Although sometimes deception is intentional, the complex app development process can also unintentionally produce manipulative user interfaces. This paper discusses common design pitfalls and proposes strategies for app makers to avoid infringing user autonomy or incurring legal liability under emerging principles of consumer protection law. By focusing on choice architecture and transparent […]

Ver mais

Like 0

Liked Liked

technocracy

Thermo-LIO: A Novel Multi-Sensor Integrated System for Structural Health Monitoring

digitado ⋅ 15 de January de 2026

arXiv:2601.08977v1 Announce Type: new Abstract: Traditional two-dimensional thermography, despite being non-invasive and useful for defect detection in the construction field, is limited in effectively assessing complex geometries, inaccessible areas, and subsurface defects. This paper introduces Thermo-LIO, a novel multi-sensor system that can enhance Structural Health Monitoring (SHM) by fusing thermal imaging with high-resolution LiDAR. To achieve this, the study first develops a multimodal fusion method combining thermal imaging and LiDAR, enabling precise calibration and synchronization of multimodal data […]

Ver mais

Like 0

Liked Liked

technocracy

New Anthropic Research Suggests AI Can Conceal Risk Internally

digitado ⋅ 16 de April de 2026

For years, the AI industry has relied on a simple assumption: if a model sounds safe, it is safe. But new interpretability research is pulling back a layer most people never think about. Large language models can develop internal activation patterns that resemble emotional states, and those hidden signals can quietly steer behavior in ways the polished output never betrays. The standard approach to evaluating AI safety relies overwhelmingly on one signal: what the model produces. If the […]

Ver mais

Like 0

Liked Liked

technocracy

[Project Review] Attempting Multi-Warehouse VRP with Heterogeneous Fleet (REINFORCE). Stuck on the “Efficiency vs. Effectiveness” trade-off

digitado ⋅ 18 de January de 2026

Hi everyone, I am an RL novice working on my first “real” project: a solver for the Multi-Warehouse Vehicle Routing Problem (MWVRP). My background is limited (I’ve essentially only read the DeepMDV paper and some standard VRP literature), so I am looking for a sanity check on my approach, as well as recommendations for papers or codebases that tackle similar constraints. The Problem Setting: I am modeling a supply chain with: Multiple Depots & Heterogeneous Fleet (Vans, Medium […]

Ver mais

Like 0

Liked Liked

technocracy

From $P(y|x)$ to $P(y)$: Investigating Reinforcement Learning in Pre-train Space

digitado ⋅ 15 de April de 2026

While reinforcement learning with verifiable rewards (RLVR) significantly enhances LLM reasoning by optimizing the conditional distribution P(y|x), its potential is fundamentally bounded by the base model’s existing output distribution. Optimizing the marginal distribution P(y) in the Pre-train Space addresses this bottleneck by encoding reasoning ability and preserving broad exploration capacity. Yet, conventional pre-training relies on static corpora for passive learning, leading to a distribution shift that hinders targeted reasoning enhancement. In this paper, we introduce PreRL (Pre-train Space […]

Ver mais

Like 0

Liked Liked

technocracy

Gradient Regularization Prevents Reward Hacking in Reinforcement Learning from Human Feedback and Verifiable Rewards

digitado ⋅ 20 de February de 2026

Reinforcement Learning from Human Feedback (RLHF) or Verifiable Rewards (RLVR) are two key steps in the post-training of modern Language Models (LMs). A common problem is reward hacking, where the policy may exploit inaccuracies of the reward and learn an unintended behavior. Most previous works address this by limiting the policy update with a Kullback-Leibler (KL) penalty towards a reference model. We propose a different framing: Train the LM in a way that biases policy updates towards regions […]

Ver mais

Like 0

Liked Liked

technocracy

Social Engineering Attacks: A Systemisation of Knowledge on People Against Humans

digitado ⋅ 9 de January de 2026

arXiv:2601.04215v1 Announce Type: new Abstract: Our systematisation of knowledge on Social Engineering Attacks (SEAs), identifies the human, organisational, and adversarial dimensions of cyber threats. It addresses the growing risks posed by SEAs, highly relevant in the context physical cyber places, such as travellers at airports and residents in smart cities, and synthesizes findings from peer reviewed studies, industry and government reports to inform effective countermeasures that can be embedded into future smart city strategies. SEAs increasingly sidestep technical […]

Ver mais

Like 0

Liked Liked

technocracy

Sound of Touch: Active Acoustic Tactile Sensing via String Vibrations

digitado ⋅ 20 de February de 2026

arXiv:2602.16846v1 Announce Type: new Abstract: Distributed tactile sensing remains difficult to scale over large areas: dense sensor arrays increase wiring, cost, and fragility, while many alternatives provide limited coverage or miss fast interaction dynamics. We present Sound of Touch, an active acoustic tactile-sensing methodology that uses vibrating tensioned strings as sensing elements. The string is continuously excited electromagnetically, and a small number of pickups (contact microphones) observe spectral changes induced by contact. From short-duration audio signals, our system […]

Ver mais

Like 0

Liked Liked