February 2026

When and What to Ask: AskBench and Rubric-Guided RLVR for LLM Clarification

digitado ⋅ 13 de February de 2026

arXiv:2602.11199v1 Announce Type: new Abstract: Large language models (LLMs) often respond even when prompts omit critical details or include misleading information, leading to hallucinations or reinforced misconceptions. We study how to evaluate and improve LLMs’ ability to decide when and what to ask for clarification without sacrificing task performance. We introduce AskBench, an interactive benchmark that converts standard QA pairs into multi-turn interactions with explicit checkpoints. A unified judge loop evaluates final answers and simulates user responses as […]

Ver mais

Like 0

Liked Liked

technocracy

DDL2PropBank Agent: Benchmarking Multi-Agent Frameworks’ Developer Experience Through a Novel Relational Schema Mapping Task

digitado ⋅ 13 de February de 2026

arXiv:2602.11198v1 Announce Type: new Abstract: Multi-agent frameworks promise to simplify LLM-driven software development, yet there is no principled way to evaluate their developer experience in a controlled setting. We introduce DDL2PropBank, a novel benchmark task that maps relational database schemas to PropBank rolesets, requiring autonomous retrieval of candidate frames and fine-grained linguistic reasoning over table names, columns, and relations. Using the Agent-as-a-Tool pattern, we implement identical agent logic across 10 frameworks and evaluate along two dimensions: (i) code […]

Ver mais

Like 0

Liked Liked

technocracy

Predicting the post-wildfire mudflow onset using machine learning models on multi-parameter experimental data

digitado ⋅ 13 de February de 2026

arXiv:2602.11194v1 Announce Type: new Abstract: Post-wildfire mudflows are increasingly hazardous due to the prevalence of wildfires, including those on the wildland-urban interface. Upon burning, soil on the surface or immediately beneath becomes hydrophobic, a phenomenon that occurs predominantly on sand-based hillslopes. Rainwater and eroded soil blanket the downslope, leading to catastrophic debris flows. Soil hydrophobicity enhances erosion, resulting in post-wildfire debris flows that differ from natural mudflows in intensity, duration, and destructiveness. Thus, it is crucial to understand […]

Ver mais

Like 0

Liked Liked

technocracy

MELINOE: Fine-Tuning Enables Memory-Efficient Inference for Mixture-of-Experts Models

digitado ⋅ 13 de February de 2026

arXiv:2602.11192v1 Announce Type: new Abstract: Mixture-of-Experts (MoE) model architectures can significantly reduce the number of activated parameters per token, enabling computationally efficient training and inference. However, their large overall parameter counts and model sizes have precluded their widespread usage in resource-constrained settings as all of the parameters must still be loaded into GPU memory. Prior works aim to address this memory bottleneck by offloading certain experts into CPU memory and porting them to GPU memory only when they […]

Ver mais

Like 0

Liked Liked

technocracy

Time-TK: A Multi-Offset Temporal Interaction Framework Combining Transformer and Kolmogorov-Arnold Networks for Time Series Forecasting

digitado ⋅ 13 de February de 2026

arXiv:2602.11190v1 Announce Type: new Abstract: Time series forecasting is crucial for the World Wide Web and represents a core technical challenge in ensuring the stable and efficient operation of modern web services, such as intelligent transportation and website throughput. However, we have found that existing methods typically employ a strategy of embedding each time step as an independent token. This paradigm introduces a fundamental information bottleneck when processing long sequences, the root cause of which is that independent […]

Ver mais

Like 0

Liked Liked

technocracy

TDPNavigator-Placer: Thermal- and Wirelength-Aware Chiplet Placement in 2.5D Systems Through Multi-Agent Reinforcement Learning

digitado ⋅ 13 de February de 2026

arXiv:2602.11187v1 Announce Type: new Abstract: The rapid growth of electronics has accelerated the adoption of 2.5D integrated circuits, where effective automated chiplet placement is essential as systems scale to larger and more heterogeneous chiplet assemblies. Existing placement methods typically focus on minimizing wirelength or transforming multi-objective optimization into a single objective through weighted sum, which limits their ability to handle competing design requirements. Wirelength reduction and thermal management are inherently conflicting objectives, making prior approaches inadequate for practical […]

Ver mais

Like 0

Liked Liked

technocracy

GAC-KAN: An Ultra-Lightweight GNSS Interference Classifier for GenAI-Powered Consumer Edge Devices

digitado ⋅ 13 de February de 2026

arXiv:2602.11186v1 Announce Type: new Abstract: The integration of Generative AI (GenAI) into Consumer Electronics (CE)–from AI-powered assistants in wearables to generative planning in autonomous Uncrewed Aerial Vehicles (UAVs)–has revolutionized user experiences. However, these GenAI applications impose immense computational burdens on edge hardware, leaving strictly limited resources for fundamental security tasks like Global Navigation Satellite System (GNSS) signal protection. Furthermore, training robust classifiers for such devices is hindered by the scarcity of real-world interference data. To address the dual […]

Ver mais

Like 0

Liked Liked

technocracy

Spectra: Rethinking Optimizers for LLMs Under Spectral Anisotropy

digitado ⋅ 13 de February de 2026

arXiv:2602.11185v1 Announce Type: new Abstract: Gradient signals in LLM training are highly anisotropic: recurrent linguistic structure concentrates energy into a small set of dominant spectral directions, while context specific information resides in a long tail. We show that this spike tail separation persists throughout training, with the spike occupying only about 1.5% of directions yet dominating optimizer statistics. This dominance suppresses tail learning by contracting tail updates through second moment normalization and tightening the globally stable learning rate […]

Ver mais

Like 0

Liked Liked

technocracy

KBVQ-MoE: KLT-guided SVD with Bias-Corrected Vector Quantization for MoE Large Language Models

digitado ⋅ 13 de February de 2026

arXiv:2602.11184v1 Announce Type: new Abstract: Mixture of Experts (MoE) models have achieved great success by significantly improving performance while maintaining computational efficiency through sparse expert activation. However, their enormous parameter sizes and memory demands pose major challenges for deployment in resource-constrained environments. Vector Quantization (VQ) offers a promising approach for ultra-low-bit compression in Large Language Models (LLMs) by leveraging a codebook, where weight vectors are mapped to the most similar discrete codewords. Yet, directly applying VQ to MoEs […]

Ver mais

Like 0

Liked Liked

technocracy

Mitigating Error Accumulation in Continuous Navigation via Memory-Augmented Kalman Filtering

digitado ⋅ 13 de February de 2026

arXiv:2602.11183v1 Announce Type: new Abstract: Continuous navigation in complex environments is critical for Unmanned Aerial Vehicle (UAV). However, the existing Vision-Language Navigation (VLN) models follow the dead-reckoning, which iteratively updates its position for the next waypoint prediction, and subsequently construct the complete trajectory. Then, such stepwise manner will inevitably lead to accumulated errors of position over time, resulting in misalignment between internal belief and objective coordinates, which is known as “state drift” and ultimately compromises the full trajectory […]

Ver mais

Like 0

Liked Liked