February 2026

Interaction-Grounded Learning for Contextual Markov Decision Processes with Personalized Feedback

digitado ⋅ 9 de February de 2026

In this paper, we study Interaction-Grounded Learning (IGL) [Xie et al., 2021], a paradigm designed for realistic scenarios where the learner receives indirect feedback generated by an unknown mechanism, rather than explicit numerical rewards. While prior work on IGL provides efficient algorithms with provable guarantees, those results are confined to single-step settings, restricting their applicability to modern sequential decision-making systems such as multi-turn Large Language Model (LLM) deployments. To bridge this gap, we propose a computationally efficient algorithm […]

Ver mais

Like 0

Liked Liked

technocracy

Trust-Based Incentive Mechanisms in Semi-Decentralized Federated Learning Systems

digitado ⋅ 9 de February de 2026

In federated learning (FL), decentralized model training allows multi-ple participants to collaboratively improve a shared machine learning model without exchanging raw data. However, ensuring the integrity and reliability of the system is challenging due to the presence of potentially malicious or faulty nodes that can degrade the model’s performance. This paper proposes a novel trust-based incentive mechanism designed to evaluate and reward the quality of contributions in FL systems. By dynamically assessing trust scores based on fac-tors such […]

Ver mais

Like 0

Liked Liked

technocracy

Building a RL agent For Prince of persia(1989)

digitado ⋅ 9 de February de 2026

I’ve been working on a reinforcement learning project around the original Prince of Persia (1989) using SDLPoP. Instead of using raw pixels, I built a grid-based observation directly from the game state. Each room becomes a small multi-channel grid showing platforms, hazards, gates, exits, items, and character positions. The idea is to reduce the CNN’s burden of trying to understand interactable platforms and hazards from just a few pixels and instead give structured spatial information. On the action […]

Ver mais

Like 0

Liked Liked

technocracy

When Do Multi-Agent Systems Outperform? Analysing the Learning Efficiency of Agentic Systems

digitado ⋅ 9 de February de 2026

Reinforcement Learning (RL) has emerged as a crucial method for training or fine-tuning large language models (LLMs), enabling adaptive, task-specific optimizations through interactive feedback. Multi-Agent Reinforcement Learning (MARL), in particular, offers a promising avenue by decomposing complex tasks into specialized subtasks learned by distinct interacting agents, potentially enhancing the ability and efficiency of LLM systems. However, theoretical insights regarding when and why MARL outperforms Single-Agent RL (SARL) remain limited, creating uncertainty in selecting the appropriate RL framework. In […]

Ver mais

Like 0

Liked Liked

technocracy

To Grok Grokking: Provable Grokking in Ridge Regression

digitado ⋅ 9 de February de 2026

arXiv:2601.19791v2 Announce Type: replace-cross Abstract: We study grokking, the onset of generalization long after overfitting, in a classical ridge regression setting. We prove end-to-end grokking results for learning over-parameterized linear regression models using gradient descent with weight decay. Specifically, we prove that the following stages occur: (i) the model overfits the training data early during training; (ii) poor generalization persists long after overfitting has manifested; and (iii) the generalization error eventually becomes arbitrarily small. Moreover, we show, both […]

Ver mais

Like 0

Liked Liked

technocracy

Multi-fidelity graph-based neural networks architectures to learn Navier-Stokes solutions on non-parametrized 2D domains

digitado ⋅ 9 de February de 2026

arXiv:2601.02157v2 Announce Type: replace-cross Abstract: We propose a graph-based, multi-fidelity learning framework for the prediction of stationary Navier–Stokes solutions in non-parametrized two-dimensional geometries. The method is designed to guide the learning process through successive approximations, starting from reduced-order and full Stokes models, and progressively approaching the Navier–Stokes solution. To effectively capture both local and long-range dependencies in the velocity and pressure fields, we combine graph neural networks with Transformer and Mamba architectures. While Transformers achieve the highest accuracy, […]

Ver mais

Like 0

Liked Liked

technocracy

Dynamic Vocabulary Pruning: Stable LLM-RL by Taming the Tail

digitado ⋅ 9 de February de 2026

arXiv:2512.23087v2 Announce Type: replace-cross Abstract: Reinforcement Learning (RL) for Large Language Models (LLMs) faces a fundamental tension: the numerical divergence between high-throughput inference engines and numerically precise training engines. Although these systems share the same parameters, they produce slightly different probability distributions, creating a training-inference mismatch. We prove that the bound on the log-probability divergence arising from this mismatch scales as $(1-p)$, where $p$ is the token probability. This scaling induces a highly asymmetric effect: the bound vanishes […]

Ver mais

Like 0

Liked Liked

technocracy

Path Signatures Enable Model-Free Mapping of RNA Modifications

digitado ⋅ 9 de February de 2026

arXiv:2511.08855v2 Announce Type: replace-cross Abstract: Detecting chemical modifications on RNA molecules remains a key challenge in epitranscriptomics. Traditional reverse transcription-based sequencing methods introduce enzyme- and sequence-dependent biases and fragment RNA molecules, confounding the accurate mapping of modifications across the transcriptome. Nanopore direct RNA sequencing offers a powerful alternative by preserving native RNA molecules, enabling the detection of modifications at single-molecule resolution. However, current computational tools can identify only a limited subset of modification types within well-characterized sequence contexts […]

Ver mais

Like 0

Liked Liked

technocracy

On topological descriptors for graph products

digitado ⋅ 9 de February de 2026

arXiv:2511.08846v2 Announce Type: replace-cross Abstract: Topological descriptors have been increasingly utilized for capturing multiscale structural information in relational data. In this work, we consider various filtrations on the (box) product of graphs and the effect on their outputs on the topological descriptors – the Euler characteristic (EC) and persistent homology (PH). In particular, we establish a complete characterization of the expressive power of EC on general color-based filtrations. We also show that the PH descriptors of (virtual) graph […]

Ver mais

Like 0

Liked Liked

technocracy

A Unified Framework for Lifted Training and Inversion Approaches

digitado ⋅ 9 de February de 2026

arXiv:2510.09796v2 Announce Type: replace-cross Abstract: The training of deep neural networks predominantly relies on a combination of gradient-based optimisation and back-propagation for the computation of the gradient. While incredibly successful, this approach faces challenges such as vanishing or exploding gradients, difficulties with non-smooth activations, and an inherently sequential structure that limits parallelisation. Lifted training methods offer an alternative by reformulating the nested optimisation problem into a higher-dimensional, constrained optimisation problem where the constraints are no longer enforced directly […]

Ver mais

Like 0

Liked Liked