digitado

Preference-based Reinforcement Learning beyond Pairwise Comparisons: Benefits of Multiple Options

digitado ⋅ 6 de February de 2026

arXiv:2510.18713v3 Announce Type: replace-cross Abstract: We study online preference-based reinforcement learning (PbRL) with the goal of improving sample efficiency. While a growing body of theoretical work has emerged-motivated by PbRL’s recent empirical success, particularly in aligning large language models (LLMs)-most existing studies focus only on pairwise comparisons. A few recent works (Zhu et al., 2023, Mukherjee et al., 2024, Thekumparampil et al., 2024) have explored using multiple comparisons and ranking feedback, but their performance guarantees fail to improve-and […]

Ver mais

Like 0

Liked Liked

technocracy

Beyond Retention: Orchestrating Structural Safety and Plasticity in Continual Learning for LLMs

digitado ⋅ 26 de January de 2026

Continual learning in Large Language Models (LLMs) faces the critical challenge of balancing stability (retaining old knowledge) and plasticity (learning new tasks). While Experience Replay (ER) is a standard countermeasure against catastrophic forgetting, its impact across diverse capabilities remains underexplored. In this work, we uncover a critical dichotomy in ER’s behavior: while it induces positive backward transfer on robust, unstructured tasks (e.g., boosting performance on previous NLP classification tasks through repeated rehearsal), it causes severe negative transfer on […]

Ver mais

Like 0

Liked Liked

technocracy

HyFAD: Hybrid Time-Frequency Diffusion with Frequency-Aware Embedding for Time Series Imputation

digitado ⋅ 5 de June de 2026

arXiv:2606.05239v1 Announce Type: new Abstract: Diffusion models have demonstrated strong performance in time series modeling due to their ability to progressively capture complex data distributions through iterative denoising. However, existing approaches struggle with frequency-sensitive denoising, high-frequency reconstruction and balancing global trends with local dynamics. To address these limitations, we propose textbf{HyFAD}, a textbf{Hy}brid time-frequency textbf{D}iffusion model with textbf{F}requency-textbf{A}ware embedding for time series imputation. Built upon the DDPM paradigm, HyFAD adopts a coupled time-frequency diffusion framework, in which the […]

Ver mais

Like 0

Liked Liked

technocracy

Leveraging Language Models and RAG for Efficient Knowledge Discovery in Clinical Environments

digitado ⋅ 9 de January de 2026

arXiv:2601.04209v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly recognized as valuable tools across the medical environment, supporting clinical, research, and administrative workflows. However, strict privacy and network security regulations in hospital settings require that sensitive data be processed within fully local infrastructures. Within this context, we developed and evaluated a retrieval-augmented generation (RAG) system designed to recommend research collaborators based on PubMed publications authored by members of a medical institution. The system utilizes PubMedBERT for […]

Ver mais

Like 0

Liked Liked

technocracy

DD-MDN: Human Trajectory Forecasting with Diffusion-Based Dual Mixture Density Networks and Uncertainty Self-Calibration

digitado ⋅ 13 de February de 2026

arXiv:2602.11214v1 Announce Type: new Abstract: Human Trajectory Forecasting (HTF) predicts future human movements from past trajectories and environmental context, with applications in Autonomous Driving, Smart Surveillance, and Human-Robot Interaction. While prior work has focused on accuracy, social interaction modeling, and diversity, little attention has been paid to uncertainty modeling, calibration, and forecasts from short observation periods, which are crucial for downstream tasks such as path planning and collision avoidance. We propose DD-MDN, an end-to-end probabilistic HTF model that […]

Ver mais

Like 0

Liked Liked

technocracy

The HackerNoon Newsletter: Its Not Kubernetes. It Never Was. (3/15/2026)

digitado ⋅ 15 de March de 2026

How are you, hacker? 🪐 What’s happening in tech today, March 15, 2026? The HackerNoon Newsletter brings the HackerNoon homepage straight to your inbox. On this day, First Newsletter of the Homebrew Computer Club in 1975, and we present you with these top quality stories. From The OpenClaw Saga: How the Last Two Weeks Changed the Agentic AI World Forever to Create a Website Without Code: How Fabricate Turns Conversations Into Full-Stack Apps, let’s dive right in. The […]

Ver mais

Like 0

Liked Liked

technocracy

EnsAug: Augmentation-Driven Ensembles for Human Motion Sequence Analysis

digitado ⋅ 10 de March de 2026

arXiv:2603.06661v1 Announce Type: new Abstract: Data augmentation is a crucial technique for training robust deep learning models for human motion, where annotated datasets are often scarce. However, generic augmentation methods often ignore the underlying geometric and kinematic constraints of the human body, risking the generation of unrealistic motion patterns that can degrade model performance. Furthermore, the conventional approach of training a single generalist model on a dataset expanded with a mixture of all available transformations does not fully […]

Ver mais

Like 0

Liked Liked

technocracy

Hackable PyTorch RL library with distributional algorithms (D4PG, DSAC, DPPO)

digitado ⋅ 6 de May de 2026

I published a paper on distributional RL for legged locomotion a while back and recently resurfaced and cleaned up the code into a standalone repo: https://github.com/e3ntity/e3rl Here’s a DPPO policy trained with this library running on a real robot: https://sites.google.com/leggedrobotics.com/risk-aware-locomotion The library is based on rsl_rl but contains readable PyTorch implementations of the most popular continuous control algorithms (PPO, SAC, TD3, DDPG), plus their distributional counterparts DPPO, DSAC, D4PG. Runs on CUDA, Apple Silicon, or CPU. pip install […]

Ver mais

Like 0

Liked Liked

technocracy

Seeing to Generalize: How Visual Data Corrects Binding Shortcuts

digitado ⋅ 18 de February de 2026

arXiv:2602.15183v1 Announce Type: new Abstract: Vision Language Models (VLMs) are designed to extend Large Language Models (LLMs) with visual capabilities, yet in this work we observe a surprising phenomenon: VLMs can outperform their underlying LLMs on purely text-only tasks, particularly in long-context information retrieval. To investigate this effect, we build a controlled synthetic retrieval task and find that a transformer trained only on text achieves perfect in-distribution accuracy but fails to generalize out of distribution, while subsequent training […]

Ver mais

Like 0

Liked Liked

technocracy

Meta Flow Maps enable scalable reward alignment

digitado ⋅ 22 de January de 2026

arXiv:2601.14430v1 Announce Type: new Abstract: Controlling generative models is computationally expensive. This is because optimal alignment with a reward function–whether via inference-time steering or fine-tuning–requires estimating the value function. This task demands access to the conditional posterior $p_{1|t}(x_1|x_t)$, the distribution of clean data $x_1$ consistent with an intermediate state $x_t$, a requirement that typically compels methods to resort to costly trajectory simulations. To address this bottleneck, we introduce Meta Flow Maps (MFMs), a framework extending consistency models and […]

Ver mais

Like 0

Liked Liked