February 2026

On the Learning Dynamics of RLVR at the Edge of Competence

digitado ⋅ 16 de February de 2026

Reinforcement learning with verifiable rewards (RLVR) has been a main driver of recent breakthroughs in large reasoning models. Yet it remains a mystery how rewards based solely on final outcomes can help overcome the long-horizon barrier to extended reasoning. To understand this, we develop a theory of the training dynamics of RL for transformers on compositional reasoning tasks. Our theory characterizes how the effectiveness of RLVR is governed by the smoothness of the difficulty spectrum. When data contains […]

Ver mais

Like 0

Liked Liked

technocracy

Interactionless Inverse Reinforcement Learning: A Data-Centric Framework for Durable Alignment

digitado ⋅ 16 de February de 2026

AI alignment is growing in importance, yet current approaches suffer from a critical structural flaw that entangles the safety objectives with the agent’s policy. Methods such as Reinforcement Learning from Human Feedback and Direct Preference Optimization create opaque, single-use alignment artifacts, which we term Alignment Waste. We propose Interactionless Inverse Reinforcement Learning to decouple alignment artifact learning from policy optimization, producing an inspectable, editable, and model-agnostic reward model. Additionally, we introduce the Alignment Flywheel, a human-in-the-loop lifecycle that […]

Ver mais

Like 0

Liked Liked

technocracy

Exploring the limits of pre-trained embeddings in machine-guided protein design: a case study on predicting AAV vector viability

digitado ⋅ 16 de February de 2026

Effective representations of protein sequences are widely recognized as a cornerstone of machine learning-based protein design. Yet, protein bioengineering poses unique challenges for sequence representation, as experimental datasets typically feature few mutations, which are either sparsely distributed across the entire sequence or densely concentrated within localized regions. This limits the ability of sequence-level representations to extract functionally meaningful signals. In addition, comprehensive comparative studies remain scarce, despite their crucial role in clarifying which representations best encode relevant information […]

Ver mais

Like 0

Liked Liked

technocracy

Learning State-Tracking from Code Using Linear RNNs

digitado ⋅ 16 de February de 2026

Over the last years, state-tracking tasks, particularly permutation composition, have become a testbed to understand the limits of sequence models architectures like Transformers and RNNs (linear and non-linear). However, these are often sequence-to-sequence tasks: learning to map actions (permutations) to states, which is incompatible with the next-token prediction setting commonly used to train language models. We address this gap by converting permutation composition into code via REPL traces that interleave state-reveals through prints and variable transformations. We show […]

Ver mais

Like 0

Liked Liked

technocracy

Hybrid Feature Learning with Time Series Embeddings for Equipment Anomaly Prediction

digitado ⋅ 16 de February de 2026

In predictive maintenance of equipment, deep learning-based time series anomaly detection has garnered significant attention; however, pure deep learning approaches often fail to achieve sufficient accuracy on real-world data. This study proposes a hybrid approach that integrates 64-dimensional time series embeddings from Granite TinyTimeMixer with 28-dimensional statistical features based on domain knowledge for HVAC equipment anomaly prediction tasks. Specifically, we combine time series embeddings extracted from a Granite TinyTimeMixer encoder fine-tuned with LoRA (Low-Rank Adaptation) and 28 types […]

Ver mais

Like 0

Liked Liked

technocracy

Michigan antitrust lawsuit says oil companies hobbled EVs and renewables

digitado ⋅ 16 de February de 2026

Michigan is taking on major oil and gas companies in court, joining nearly a dozen other states that have brought climate-related lawsuits against ExxonMobil and its industry peers. But Michigan’s approach is different: accusing Big Oil not of deceiving consumers or misrepresenting climate change risks, but of driving up energy costs by colluding to suppress competition from cleaner and cheaper technologies like solar power and electric vehicles. The strategy is risky and might run into challenges, but it […]

Ver mais

Like 0

Liked Liked

technocracy

Return of the Schema: Building Complete Datasets for Machine Learning and Reasoning on Knowledge Graphs

digitado ⋅ 16 de February de 2026

Datasets for the experimental evaluation of knowledge graph refinement algorithms typically contain only ground facts, retaining very limited schema level knowledge even when such information is available in the source knowledge graphs. This limits the evaluation of methods that rely on rich ontological constraints, reasoning or neurosymbolic techniques and ultimately prevents assessing their performance in large-scale, real-world knowledge graphs. In this paper, we present resource{} the first resource that provides a workflow for extracting datasets including both schema […]

Ver mais

Like 0

Liked Liked

technocracy

SA-SSL-MOS: Self-supervised Learning MOS Prediction with Spectral Augmentation for Generalized Multi-Rate Speech Assessment

digitado ⋅ 16 de February de 2026

Designing a speech quality assessment (SQA) system for estimating mean-opinion-score (MOS) of multi-rate speech with varying sampling frequency (16-48 kHz) is a challenging task. The challenge arises due to the limited availability of a MOS-labeled training dataset comprising multi-rate speech samples. While self-supervised learning (SSL) models have been widely adopted in SQA to boost performance, a key limitation is that they are pretrained on 16 kHz speech and therefore discard high-frequency information present in higher sampling rates. To […]

Ver mais

Like 0

Liked Liked

technocracy

Learning Structural Hardness for Combinatorial Auctions: Instance-Dependent Algorithm Selection via Graph Neural Networks

digitado ⋅ 16 de February de 2026

The Winner Determination Problem (WDP) in combinatorial auctions is NP-hard, and no existing method reliably predicts which instances will defeat fast greedy heuristics. The ML-for-combinatorial-optimization community has focused on learning to emph{replace} solvers, yet recent evidence shows that graph neural networks (GNNs) rarely outperform well-tuned classical methods on standard benchmarks. We pursue a different objective: learning to predict emph{when} a given instance is hard for greedy allocation, enabling instance-dependent algorithm selection. We design a 20-dimensional structural feature vector […]

Ver mais

Like 0

Liked Liked

technocracy

Universal Algorithm-Implicit Learning

digitado ⋅ 16 de February de 2026

Current meta-learning methods are constrained to narrow task distributions with fixed feature and label spaces, limiting applicability. Moreover, the current meta-learning literature uses key terms like "universal" and "general-purpose" inconsistently and lacks precise definitions, hindering comparability. We introduce a theoretical framework for meta-learning which formally defines practical universality and introduces a distinction between algorithm-explicit and algorithm-implicit learning, providing a principled vocabulary for reasoning about universal meta-learning methods. Guided by this framework, we present TAIL, a transformer-based algorithm-implicit meta-learner […]

Ver mais

Like 0

Liked Liked