digitado – Page 264

Learning to Explore with Parameter-Space Noise: A Deep Dive into Parameter-Space Noise for Reinforcement Learning with Verifiable Rewards

digitado ⋅ 4 de February de 2026

arXiv:2602.02555v1 Announce Type: new Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) improves LLM reasoning, yet growing evidence indicates an exploration ceiling: it often reweights existing solution traces rather than discovering new strategies, limiting gains under large sampling budgets (e.g., pass-at-256). We address this limitation with PSN-RLVR, which perturbs policy parameters before rollout generation to induce temporally consistent, trajectory-level exploration that better preserves long-horizon chain-of-thought coherence than action-space noise. To mitigate the resulting sampling-update mismatch, we incorporate truncated importance […]

Ver mais

Like 0

Liked Liked

technocracy

Conditional Normalizing Flows for Forward and Backward Joint State and Parameter Estimation

digitado ⋅ 13 de January de 2026

arXiv:2601.07013v1 Announce Type: new Abstract: Traditional filtering algorithms for state estimation — such as classical Kalman filtering, unscented Kalman filtering, and particle filters – show performance degradation when applied to nonlinear systems whose uncertainty follows arbitrary non-Gaussian, and potentially multi-modal distributions. This study reviews recent approaches to state estimation via nonlinear filtering based on conditional normalizing flows, where the conditional embedding is generated by standard MLP architectures, transformers or selective state-space models (like Mamba-SSM). In addition, we test […]

Ver mais

Like 0

Liked Liked

technocracy

New fractional generalized Gronwall inequalities and Lyapunov theorems with applications

digitado ⋅ 10 de April de 2026

This paper deals with some expressions of the fractional generalized Gronwall inequality when associated with both non-negative and non-positive singular kernels and establishes sharp Mittag- Leffler bounds containing different ingredients. The long-term behavior of non-autonomous fractional order systems by means of modified fractional Lyapunov theorems is analyzed. As an application, we give a few examples that use quadratic Lyapunov functions for typical fractional order systems to predict trajectories that ultimately aim to reach vector 0 as t → […]

Ver mais

Like 0

Liked Liked

technocracy

Choosing the Right Regularizer for Applied ML: Simulation Benchmarks of Popular Scikit-learn Regularization Frameworks

digitado ⋅ 4 de April de 2026

This study surveys the historical development of regularization, tracing its evolution from stepwise regression in the 1960s to recent advancements in formal error control, structured penalties for non-independent features, Bayesian methods, and l0-based regularization (among other techniques). We empirically evaluate the performance of four canonical frameworks — Ridge, Lasso, ElasticNet, and Post-Lasso OLS — across 134,400 simulations spanning a 7-dimensional manifold grounded in eight production-grade machine learning models. Our findings demonstrate that for prediction accuracy when the sample-to-feature […]

Ver mais

Like 0

Liked Liked

technocracy

From Absolute to Relative: Rethinking Reward Shaping in Group-Based Reinforcement Learning

digitado ⋅ 30 de January de 2026

Reinforcement learning has become a cornerstone for enhancing the reasoning capabilities of Large Language Models, where group-based approaches such as GRPO have emerged as efficient paradigms that optimize policies by leveraging intra-group performance differences. However, these methods typically rely on absolute numerical rewards, introducing intrinsic limitations. In verifiable tasks, identical group evaluations often result in sparse supervision, while in open-ended scenarios, the score range instability of reward models undermines advantage estimation based on group means. To address these […]

Ver mais

Like 0

Liked Liked

technocracy

ConceptRM: The Quest to Mitigate Alert Fatigue through Consensus-Based Purity-Driven Data Cleaning for Reflection Modelling

digitado ⋅ 25 de February de 2026

arXiv:2602.20166v1 Announce Type: new Abstract: In many applications involving intelligent agents, the overwhelming volume of alerts (mostly false) generated by the agents may desensitize users and cause them to overlook critical issues, leading to the so-called ”alert fatigue”. A common strategy is to train a reflection model as a filter to intercept false alerts with labelled data collected from user verification feedback. However, a key challenge is the noisy nature of such data as it is often collected […]

Ver mais

Like 0

Liked Liked

technocracy

Amodei torches OpenAI in leaked memo

digitado ⋅ 5 de March de 2026

Read Online | Sign Up | Advertise Good morning, {{ first_name | AI enthusiasts }}. “Straight up lies.” That’s how Dario Amodei described OpenAI’s Pentagon messaging in a newly-leaked internal memo sent to Anthropic employees on Friday. The 1,600-word document rips the controversial deal as “80% safety theater” with personal shots at Sam Altman woven throughout — escalating a rivalry that was already one of the most heated in tech, far past an awkward hand-hold refusal. P.S. — […]

Ver mais

Like 0

Liked Liked

technocracy

The latest AI news we announced in April 2026

digitado ⋅ 5 de May de 2026

Here are Google’s latest AI updates from April 2026

Ver mais

Like 0

Liked Liked

technocracy

Private and interpretable clinical prediction with quantum-inspired tensor train models

digitado ⋅ 9 de February de 2026

arXiv:2602.06110v1 Announce Type: new Abstract: Machine learning in clinical settings must balance predictive accuracy, interpretability, and privacy. Models such as logistic regression (LR) offer transparency, while neural networks (NNs) provide greater predictive power; yet both remain vulnerable to privacy attacks. We empirically assess these risks by designing attacks that identify which public datasets were used to train a model under varying levels of adversarial access, applying them to LORIS, a publicly available LR model for immunotherapy response prediction, […]

Ver mais

Like 0

Liked Liked

technocracy

[Hiring] Reinforcement Learning Engineer @ Verita AI

digitado ⋅ 3 de March de 2026

Verita AI is building the “Gym” for LLM reasoning. We are moving beyond simple chat-based RLHF into complex, grounded RL environments where models must solve multi-step engineering and research problems to receive a reward. The Mission Design robust, un-hackable RL environments (Prompt + Judge + Tools) that challenge top-tier models (GPT-5.2, Claude opus 4.6). Think SWE-Bench, but for AI/ML research. What We’re Looking For Technical Fluency: Deep PyTorch/JAX knowledge and the ability to debug distributed training. Adversarial Thinking: […]

Ver mais

Like 0

Liked Liked