digitado – Page 426

OpaqueToolsBench: Learning Nuances of Tool Behavior Through Interaction

digitado ⋅ 18 de February de 2026

arXiv:2602.15197v1 Announce Type: new Abstract: Tool-calling is essential for Large Language Model (LLM) agents to complete real-world tasks. While most existing benchmarks assume simple, perfectly documented tools, real-world tools (e.g., general “search” APIs) are often opaque, lacking clear best practices or failure modes. Can LLM agents improve their performance in environments with opaque tools by interacting and subsequently improving documentation? To study this, we create OpaqueToolsBench, a benchmark consisting of three distinct task-oriented environments: general function calling, interactive […]

Ver mais

Like 0

Liked Liked

technocracy

A Future Capabilities Agent for Tactical Air Traffic Control

digitado ⋅ 9 de January de 2026

arXiv:2601.04285v1 Announce Type: new Abstract: Escalating air traffic demand is driving the adoption of automation to support air traffic controllers, but existing approaches face a trade-off between safety assurance and interpretability. Optimisation-based methods such as reinforcement learning offer strong performance but are difficult to verify and explain, while rules-based systems are transparent yet rarely check safety under uncertainty. This paper outlines Agent Mallard, a forward-planning, rules-based agent for tactical control in systemised airspace that embeds a stochastic digital […]

Ver mais

Like 0

Liked Liked

technocracy

Riesz Representer Fitting under Bregman Divergence: A Unified Framework for Debiased Machine Learning

digitado ⋅ 12 de January de 2026

Estimating the Riesz representer is a central problem in debiased machine learning for causal and structural parameter estimation. Various methods for Riesz representer estimation have been proposed, including Riesz regression and covariate balancing. This study unifies these methods within a single framework. Our framework fits a Riesz representer model to the true Riesz representer under a Bregman divergence, which includes the squared loss and the Kullback–Leibler (KL) divergence as special cases. We show that the squared loss corresponds […]

Ver mais

Like 0

Liked Liked

technocracy

Deconstructing Taste: Toward a Human-Centered AI Framework for Modeling Consumer Aesthetic Perceptions

digitado ⋅ 27 de January de 2026

arXiv:2601.17134v1 Announce Type: new Abstract: Understanding and modeling consumers’ stylistic taste such as “sporty” is crucial for creating designs that truly connect with target audiences. However, capturing taste during the design process remains challenging because taste is abstract and subjective, and preference data alone provides limited guidance for concrete design decisions. This paper proposes an integrated human-centered computational framework that links subjective evaluations (e.g., perceived luxury of car wheels) with domain-specific features (e.g., spoke configuration) and computer vision-based […]

Ver mais

Like 0

Liked Liked

technocracy

Fine Grained Evaluation of LLMs-as-Judges

digitado ⋅ 15 de January de 2026

arXiv:2601.08919v1 Announce Type: new Abstract: A good deal of recent research has focused on how Large Language Models (LLMs) may be used as `judges’ in place of humans to evaluate the quality of the output produced by various text / image processing systems. Within this broader context, a number of studies have investigated the specific question of how effectively LLMs can be used as relevance assessors for the standard ad hoc task in Information Retrieval (IR). We extend […]

Ver mais

Like 0

Liked Liked

technocracy

Assessing the Reliability of Persona-Conditioned LLMs as Synthetic Survey Respondents

digitado ⋅ 24 de February de 2026

arXiv:2602.18462v1 Announce Type: new Abstract: Using persona-conditioned LLMs as synthetic survey respondents has become a common practice in computational social science and agent-based simulations. Yet, it remains unclear whether multi-attribute persona prompting improves LLM reliability or instead introduces distortions. Here we contribute to this assessment by leveraging a large dataset of U.S. microdata from the World Values Survey. Concretely, we evaluate two open-weight chat models and a random-guesser baseline across more than 70K respondent-item instances. We find that […]

Ver mais

Like 0

Liked Liked

technocracy

[P] I made Screen Vision, turn any confusing UI into a step-by-step guide via screen sharing (open source)

digitado ⋅ 10 de January de 2026

I built Screen Vision, an open source website that guides you through any task by screen sharing with AI. Privacy Focused: Your screen data is never stored or used to train models. Local LLM Support: If you don’t trust cloud APIs, the app has a “Local Mode” that connects to local AI models running on your own machine. Your data never leaves your computer. Web-Native: No desktop app or extension required. Works directly on your browser. How it […]

Ver mais

Like 0

Liked Liked

technocracy

Joint Routing and Model Pruning for Decentralized Federated Learning in Bandwidth-Constrained Multi-Hop Wireless Networks

digitado ⋅ 16 de March de 2026

Decentralized federated learning (D-FL) enables privacy-preserving training without a central server, but multi-hop model exchanges and aggregation are often bottlenecked by communication resource constraints. To address this issue, we propose a joint routing-and-pruning framework that optimizes routing paths and pruning rates to maintain communication latency within prescribed limits. We analyze how the sum of model biases across all clients affects the convergence bound of D-FL and formulate an optimization problem that maximizes the model retention rate to minimize […]

Ver mais

Like 0

Liked Liked

technocracy

Two-year-old Surface PCs get $300 price hikes as sub-$1,000 models go away

digitado ⋅ 14 de April de 2026

If you’ve been waiting for Microsoft to update its Surface PC lineup—perhaps with Qualcomm’s new Snapdragon X2 Elite processors—I’ve got bad news for you. Microsoft is shaking up its PC lineup, but it’s doing so by instituting big price hikes. This means you’ll be paying at least $1,500 for Surface devices that launched at $1,000 just two years ago and that Microsoft no longer offers new Surface devices under $1,000 at all. The 12-inch Surface Pro tablet that originally […]

Ver mais

Like 0

Liked Liked

technocracy

Hierarchical Sparse Plus Low Rank Compression of LLM

digitado ⋅ 14 de January de 2026

arXiv:2601.07839v1 Announce Type: new Abstract: Modern large language models (LLMs) place extraordinary pressure on memory and compute budgets, making principled compression indispensable for both deployment and continued training. We present Hierarchical Sparse Plus Low-Rank (HSS) compression, a two-stage scheme that (i) removes the largest-magnitude weights into a sparse matrix S and (ii) applies a recursive Hierarchically Sparse Separable (HSS) low-rank factorisation to the dense residual matrix. A recursive rank-reducing strategy and a reverse Cuthill-Mckee (RCM) permutation are introduced […]

Ver mais

Like 0

Liked Liked