digitado – Page 326

CUDABench: Benchmarking LLMs for Text-to-CUDA Generation

digitado ⋅ 4 de March de 2026

arXiv:2603.02236v1 Announce Type: new Abstract: Recent studies have demonstrated the potential of Large Language Models (LLMs) in generating GPU Kernels. Current benchmarks focus on the translation of high-level languages into CUDA, overlooking the more general and challenging task of text-to-CUDA generation. Furthermore, given the hardware-specific and performance-critical features of GPU programming, accurately assessing the performance of LLM-generated GPU programs is nontrivial. In this work, we introduce CUDABench, a comprehensive benchmark designed to evaluate the text-to-CUDA capabilities of LLMs. […]

Ver mais

Like 0

Liked Liked

technocracy

Beyond Ground: Map-Free LiDAR Relocalization for UAVs

digitado ⋅ 17 de February de 2026

arXiv:2602.13267v1 Announce Type: new Abstract: Localization is a fundamental capability in unmanned aerial vehicle (UAV) systems. Map-free LiDAR relocalization offers an effective solution for achieving high-precision positioning in environments with weak or unavailable GNSS signals. However, existing LiDAR relocalization methods are primarily tailored to autonomous driving, exhibiting significantly degraded accuracy in UAV scenarios. In this paper, we propose MAILS, a novel map-free LiDAR relocalization framework for UAVs. A Locality-Preserving Sliding Window Attention module is first introduced to extract […]

Ver mais

Like 0

Liked Liked

technocracy

PRISMA: Reinforcement Learning Guided Two-Stage Policy Optimization in Multi-Agent Architecture for Open-Domain Multi-Hop Question Answering

digitado ⋅ 12 de January de 2026

arXiv:2601.05465v1 Announce Type: new Abstract: Answering real-world open-domain multi-hop questions over massive corpora is a critical challenge in Retrieval-Augmented Generation (RAG) systems. Recent research employs reinforcement learning (RL) to end-to-end optimize the retrieval-augmented reasoning process, directly enhancing its capacity to resolve complex queries. However, reliable deployment is hindered by two obstacles. 1) Retrieval Collapse: iterative retrieval over large corpora fails to locate intermediate evidence containing bridge answers without reasoning-guided planning, causing downstream reasoning to collapse. 2) Learning Instability: […]

Ver mais

Like 0

Liked Liked

technocracy

Critically Engaged Pragmatism: A Scientific Norm and Social, Pragmatist Epistemology for AI Science Evaluation Tools

digitado ⋅ 16 de January de 2026

arXiv:2601.09753v1 Announce Type: new Abstract: Crises in peer review capacity, study replication, and AI-fabricated science have intensified interest in automated tools for assessing scientific research. However, the scientific community has a history of decontextualizing and repurposing credibility markers in inapt ways. I caution that AI science evaluation tools are particularly prone to these kinds of inference by false ascent due to contestation about the purposes to which they should be put, their portability across purposes, and technical demands […]

Ver mais

Like 0

Liked Liked

technocracy

Reinforcement Learning via Self-Distillation

digitado ⋅ 28 de January de 2026

Large language models are increasingly post-trained with reinforcement learning in verifiable domains such as code and math. Yet, current methods for reinforcement learning with verifiable rewards (RLVR) learn only from a scalar outcome reward per attempt, creating a severe credit-assignment bottleneck. Many verifiable environments actually provide rich textual feedback, such as runtime errors or judge evaluations, that explain why an attempt failed. We formalize this setting as reinforcement learning with rich feedback and introduce Self-Distillation Policy Optimization (SDPO), […]

Ver mais

Like 0

Liked Liked

technocracy

HEARTS: Benchmarking LLM Reasoning on Health Time Series

digitado ⋅ 10 de March de 2026

arXiv:2603.06638v1 Announce Type: new Abstract: The rise of large language models (LLMs) has shifted time series analysis from narrow analytics to general-purpose reasoning. Yet, existing benchmarks cover only a small set of health time series modalities and tasks, failing to reflect the diverse domains and extensive temporal dependencies inherent in real-world physiological modeling. To bridge these gaps, we introduce HEARTS (Health Reasoning over Time Series), a unified benchmark for evaluating hierarchical reasoning capabilities of LLMs over general health […]

Ver mais

Like 0

Liked Liked

technocracy

DeZent: Decentralized z-Anonymity with Privacy-Preserving Coordination

digitado ⋅ 11 de March de 2026

arXiv:2603.08854v1 Announce Type: new Abstract: Analyzing large volumes of sensor network data, such as electricity consumption measurements from smart meters, is essential for modern applications but raises significant privacy concerns. Privacy-enhancing technologies like z-anonymity offer efficient anonymization for continuous data streams by suppressing rare values that could lead to re-identification, making it particularly suited for resource-constrained environments. Originally designed for centralized architectures, z-anonymity assumes a trusted central entity. In this paper, we introduce deZent, a decentralized implementation of […]

Ver mais

Like 0

Liked Liked

technocracy

Character Beyond Speech: Leveraging Role-Playing Evaluation in Audio Large Language Models via Reinforcement Learning

digitado ⋅ 15 de April de 2026

The rapid evolution of multimodal large models has revolutionized the simulation of diverse characters in speech dialogue systems, enabling a novel interactive paradigm. Character attributes are manifested not only in textual responses but also through vocal features, as speech conveys rich paralinguistic information that is challenging to quantify. This poses significant difficulties in evaluating the character alignment of role-playing agents. To address these challenges, we present RoleJudge, an evaluation framework that leverages audio large language models to systematically […]

Ver mais

Like 0

Liked Liked

technocracy

STDec: Spatio-Temporal Stability Guided Decoding for dLLMs

digitado ⋅ 9 de April de 2026

arXiv:2604.06330v1 Announce Type: new Abstract: Diffusion Large Language Models (dLLMs) have achieved rapid progress, viewed as a promising alternative to the autoregressive paradigm. However, most dLLM decoders still adopt a global confidence threshold, and do not explicitly model local context from neighboring decoded states or temporal consistency of predicted token IDs across steps. To address this issue, we propose a simple spatio-temporal stability guided decoding approach, named STDec. We observe strong spatio-temporal stability in dLLM decoding: newly decoded […]

Ver mais

Like 0

Liked Liked

technocracy

Google Pixel 10a review: The sidegrade

digitado ⋅ 4 de March de 2026

Google’s budget Pixels have long been a top recommendation for anyone who needs a phone with a good camera and doesn’t want to pay flagship prices. This year, Google’s A-series Pixel doesn’t see many changes, and the formula certainly isn’t different. The Pixel 10a isn’t so much a downgraded version of the Pixel 10 as it is a refresh of the Pixel 9a. In fact, it’s hardly deserving of a new name. The new Pixel gets a couple […]

Ver mais

Like 0

Liked Liked