February 2026

Self-Hinting Language Models Enhance Reinforcement Learning

digitado ⋅ 3 de February de 2026

Group Relative Policy Optimization (GRPO) has recently emerged as a practical recipe for aligning large language models with verifiable objectives. However, under sparse terminal rewards, GRPO often stalls because rollouts within a group frequently receive identical rewards, causing relative advantages to collapse and updates to vanish. We propose self-hint aligned GRPO with privileged supervision (SAGE), an on-policy reinforcement learning framework that injects privileged hints during training to reshape the rollout distribution under the same terminal verifier reward. For […]

Ver mais

Like 0

Liked Liked

technocracy

Enhanced Parcel Arrival Forecasting for Logistic Hubs: An Ensemble Deep Learning Approach

digitado ⋅ 3 de February de 2026

The rapid expansion of online shopping has increased the demand for timely parcel delivery, compelling logistics service providers to enhance the efficiency, agility, and predictability of their hub networks. In order to solve the problem, we propose a novel deep learning-based ensemble framework that leverages historical arrival patterns and real-time parcel status updates to forecast upcoming workloads at logistic hubs. This approach not only facilitates the generation of short-term forecasts, but also improves the accuracy of future hub […]

Ver mais

Like 0

Liked Liked

technocracy

Feature, Alignment, and Supervision in Category Learning: A Comparative Approach with Children and Neural Networks

digitado ⋅ 3 de February de 2026

Understanding how humans and machines learn from sparse data is central to cognitive science and machine learning. Using a species-fair design, we compare children and convolutional neural networks (CNNs) in a few-shot semi-supervised category learning task. Both learners are exposed to novel object categories under identical conditions. Learners receive mixtures of labeled and unlabeled exemplars while we vary supervision (1/3/6 labels), target feature (size, shape, pattern), and perceptual alignment (high/low). We find that children generalize rapidly from minimal […]

Ver mais

Like 0

Liked Liked

technocracy

Dimension-Free Multimodal Sampling via Preconditioned Annealed Langevin Dynamics

digitado ⋅ 3 de February de 2026

arXiv:2602.01449v1 Announce Type: cross Abstract: Designing algorithms that can explore multimodal target distributions accurately across successive refinements of an underlying high-dimensional problem is a central challenge in sampling. Annealed Langevin dynamics (ALD) is a widely used alternative to classical Langevin since it often yields much faster mixing on multimodal targets, but there is still a gap between this empirical success and existing theory: when, and under which design choices, can ALD be guaranteed to remain stable as dimension […]

Ver mais

Like 0

Liked Liked

technocracy

Offline Discovery of Interpretable Skills from Multi-Task Trajectories

digitado ⋅ 3 de February de 2026

arXiv:2602.01018v1 Announce Type: new Abstract: Hierarchical Imitation Learning is a powerful paradigm for acquiring complex robot behaviors from demonstrations. A central challenge, however, lies in discovering reusable skills from long-horizon, multi-task offline data, especially when the data lacks explicit rewards or subtask annotations. In this work, we introduce LOKI, a three-stage end-to-end learning framework designed for offline skill discovery and hierarchical imitation. The framework commences with a two-stage, weakly supervised skill discovery process: Stage one performs coarse, task-aware […]

Ver mais

Like 0

Liked Liked

technocracy

How Does Unfaithful Reasoning Emerge from Autoregressive Training? A Study of Synthetic Experiments

digitado ⋅ 3 de February de 2026

arXiv:2602.01017v1 Announce Type: new Abstract: Chain-of-thought (CoT) reasoning generated by large language models (LLMs) is often unfaithful: intermediate steps can be logically inconsistent or fail to reflect the causal relationship leading to the final answer. Despite extensive empirical observations, a fundamental understanding of CoT is lacking–what constitutes faithful CoT reasoning, and how unfaithfulness emerges from autoregressive training. We study these questions using well-controlled synthetic experiments, training small transformers on noisy data to solve modular arithmetic expressions step by […]

Ver mais

Like 0

Liked Liked

technocracy

Large Language Models as Students Who Think Aloud: Overly Coherent, Verbose, and Confident

digitado ⋅ 3 de February de 2026

arXiv:2602.01015v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly embedded in AI-based tutoring systems. Can they faithfully model novice reasoning and metacognitive judgments? Existing evaluations emphasize problem-solving accuracy, overlooking the fragmented and imperfect reasoning that characterizes human learning. We evaluate LLMs as novices using 630 think-aloud utterances from multi-step chemistry tutoring problems with problem-solving logs of student hint use, attempts, and problem context. We compare LLM-generated reasoning to human learner utterances under minimal and extended contextual […]

Ver mais

Like 0

Liked Liked

technocracy

Mitigating Data Centers Load Risks and Enabling Grid Support Functions through Grid-Forming Control

digitado ⋅ 3 de February de 2026

arXiv:2602.01013v1 Announce Type: new Abstract: The rapid growth of hyperscale data centers driven by Large Language Models and Artificial Intelligence workloads has introduced new challenges for power systems. These facilities experience abrupt power variations during model training and check-point-saving events, causing voltage deviations and frequency disturbances. Moreover, they operate as passive loads that draw power without offering any grid support. This paper presents an integrated architecture that combines Battery Energy Storage Systems (BESSs) within data centers using Grid-Forming […]

Ver mais

Like 0

Liked Liked

technocracy

LocalScore: Local Density-Aware Similarity Scoring for Biometrics

digitado ⋅ 3 de February de 2026

arXiv:2602.01012v1 Announce Type: new Abstract: Open-set biometrics faces challenges with probe subjects who may not be enrolled in the gallery, as traditional biometric systems struggle to detect these non-mated probes. Despite the growing prevalence of multi-sample galleries in real-world deployments, most existing methods collapse intra-subject variability into a single global representation, leading to suboptimal decision boundaries and poor open-set robustness. To address this issue, we propose LocalScore, a simple yet effective scoring algorithm that explicitly incorporates the local […]

Ver mais

Like 0

Liked Liked

technocracy

Multi-Agent Teams Hold Experts Back

digitado ⋅ 3 de February de 2026

arXiv:2602.01011v1 Announce Type: new Abstract: Multi-agent LLM systems are increasingly deployed as autonomous collaborators, where agents interact freely rather than execute fixed, pre-specified workflows. In such settings, effective coordination cannot be fully designed in advance and must instead emerge through interaction. However, most prior work enforces coordination through fixed roles, workflows, or aggregation rules, leaving open the question of how well self-organizing teams perform when coordination is unconstrained. Drawing on organizational psychology, we study whether self-organizing LLM teams […]

Ver mais

Like 0

Liked Liked