digitado

About digitado

https://www.digitado.com.br

Posts by :

ClawSafety: “Safe” LLMs, Unsafe Agents

digitado ⋅ 3 de April de 2026

arXiv:2604.01438v1 Announce Type: new Abstract: Personal AI agents like OpenClaw run with elevated privileges on users’ local machines, where a single successful prompt injection can leak credentials, redirect financial transactions, or destroy files. This threat goes well beyond conventional text-level jailbreaks, yet existing safety evaluations fall short: most test models in isolated chat settings, rely on synthetic environments, and do not account for how the agent framework itself shapes safety outcomes. We introduce CLAWSAFETY, a benchmark of 120 […]

Ver mais

Like 0

Liked Liked

technocracy

Reproducible, Explainable, and Effective Evaluations of Agentic AI for Software Engineering

digitado ⋅ 3 de April de 2026

arXiv:2604.01437v1 Announce Type: new Abstract: With the advancement of Agentic AI, researchers are increasingly leveraging autonomous agents to address challenges in software engineering (SE). However, the large language models (LLMs) that underpin these agents often function as black boxes, making it difficult to justify the superiority of Agentic AI approaches over baselines. Furthermore, missing information in the evaluation design description frequently renders the reproduction of results infeasible. To synthesize current evaluation practices for Agentic AI in SE, this […]

Ver mais

Like 0

Liked Liked

technocracy

Leveraging the Value of Information in POMDP Planning

digitado ⋅ 3 de April de 2026

arXiv:2604.01434v1 Announce Type: new Abstract: Partially observable Markov decision processes (POMDPs) offer a principled formalism for planning under state and transition uncertainty. Despite advances made towards solving large POMDPs, obtaining performant policies under limited planning time remains a major challenge due to the curse of dimensionality and the curse of history. For many POMDP problems, the value of information (VOI) – the expected performance gain from reasoning about observations – varies over the belief space. We introduce a […]

Ver mais

Like 0

Liked Liked

technocracy

Semantically Annotated Multimodal Dataset for RF Interpretation and Prediction

digitado ⋅ 3 de April de 2026

arXiv:2604.01433v1 Announce Type: new Abstract: Current limitations in wireless modeling and radio frequency (RF)-based AI are primarily driven by a lack of high-quality, measurement-based datasets that connect RF signals to their physical environments. RF heatmaps, the typical form of such data, are high-dimensional and complex but lack the geometric and semantic context needed for interpretation, constraining the development of supervised machine learning models. To address this bottleneck, we propose a new class of multimodal datasets that combines RF […]

Ver mais

Like 0

Liked Liked

technocracy

Are Finer Citations Always Better? Rethinking Granularity for Attributed Generation

digitado ⋅ 3 de April de 2026

arXiv:2604.01432v1 Announce Type: new Abstract: Citation granularity – whether to cite individual sentences, paragraphs, or documents – is a critical design choice in attributed generation. While fine-grained citations are often preferred for precise human verification, their impact on model performance remains under-explored. We analyze four model scales (8B-120B) and demonstrate that enforcing fine-grained citations degrades attribution quality by 16-276% compared to the best-performing granularity. We observe a consistent performance pattern where attribution quality peaks at intermediate granularities (paragraph-level). […]

Ver mais

Like 0

Liked Liked

technocracy

Improving Latent Generalization Using Test-time Compute

digitado ⋅ 3 de April de 2026

arXiv:2604.01430v1 Announce Type: new Abstract: Language Models (LMs) exhibit two distinct mechanisms for knowledge acquisition: in-weights learning (i.e., encoding information within the model weights) and in-context learning (ICL). Although these two modes offer complementary strengths, in-weights learning frequently struggles to facilitate deductive reasoning over the internalized knowledge. We characterize this limitation as a deficit in latent generalization, of which the reversal curse is one example. Conversely, in-context learning demonstrates highly robust latent generalization capabilities. To improve latent generalization […]

Ver mais

Like 0

Liked Liked

technocracy

The power of context: Random Forest classification of near synonyms. A case study in Modern Hindi

digitado ⋅ 3 de April de 2026

arXiv:2604.01425v1 Announce Type: new Abstract: Synonymy is a widespread yet puzzling linguistic phenomenon. Absolute synonyms theoretically should not exist, as they do not expand language’s expressive potential. However, it was suggested that even if synonyms denote the same concept, they may reflect different perspectives or carry distinct cultural associations, claims that have rarely been tested quantitatively. In Hindi, prolonged contact with Persian produced many Perso-Arabic loanwords coexisting with their Sanskrit counterpart, forming numerous synonym pairs. This study investigates […]

Ver mais

Like 0

Liked Liked

technocracy

EgoFlow: Gradient-Guided Flow Matching for Egocentric 6DoF Object Motion Generation

digitado ⋅ 3 de April de 2026

arXiv:2604.01421v1 Announce Type: new Abstract: Understanding and predicting object motion from egocentric video is fundamental to embodied perception and interaction. However, generating physically consistent 6DoF trajectories remains challenging due to occlusions, fast motion, and the lack of explicit physical reasoning in existing generative models. We present EgoFlow, a flow-matching framework that synthesizes realistic and physically plausible trajectories conditioned on multimodal egocentric observations. EgoFlow employs a hybrid Mamba-Transformer-Perceiver architecture to jointly model temporal dynamics, scene geometry, and semantic intent, […]

Ver mais

Like 0

Liked Liked

technocracy

Cost-Efficient Estimation of General Abilities Across Benchmarks

digitado ⋅ 3 de April de 2026

arXiv:2604.01418v1 Announce Type: new Abstract: Thousands of diverse benchmarks have been developed to measure the quality of large language models (LLMs). Yet prior work has demonstrated that LLM performance is often sufficiently explained by a small set of latent factors, or abilities. This suggests the potential for more efficient and principled benchmarking, but it remains difficult to compare the quality of different methods. Motivated by predictive validity, we argue that the quality of a benchmarking framework should be […]

Ver mais

Like 0

Liked Liked

technocracy

ReFormeR: Learning and Applying Explicit Query Reformulation Patterns

digitado ⋅ 3 de April de 2026

arXiv:2604.01417v1 Announce Type: new Abstract: We present ReFormeR, a pattern-guided approach for query reformulation. Instead of prompting a language model to generate reformulations of a query directly, ReFormeR first elicits short reformulation patterns from pairs of initial queries and empirically stronger reformulations, consolidates them into a compact library of transferable reformulation patterns, and then selects an appropriate reformulation pattern for a new query given its retrieval context. The selected pattern constrains query reformulation to controlled operations such as […]

Ver mais

Like 0

Liked Liked