March 2026

Reinforcement Learning for Individual Optimal Policy from Heterogeneous Data

digitado ⋅ 10 de March de 2026

arXiv:2505.09496v3 Announce Type: replace Abstract: Offline reinforcement learning (RL) aims to find optimal policies in dynamic environments in order to maximize the expected total rewards by leveraging pre-collected data. Learning from heterogeneous data is one of the fundamental challenges in offline RL. Traditional methods focus on learning an optimal policy for all individuals with pre-collected data from a single episode or homogeneous batch episodes, and thus, may result in a suboptimal policy for a heterogeneous population. In this […]

Ver mais

Like 0

Liked Liked

technocracy

Impact of Connectivity on Laplacian Representations in Reinforcement Learning

digitado ⋅ 10 de March de 2026

arXiv:2603.08558v1 Announce Type: cross Abstract: Learning compact state representations in Markov Decision Processes (MDPs) has proven crucial for addressing the curse of dimensionality in large-scale reinforcement learning (RL) problems. Existing principled approaches leverage structural priors on the MDP by constructing state representations as linear combinations of the state-graph Laplacian eigenvectors. When the transition graph is unknown or the state space is prohibitively large, the graph spectral features can be estimated directly via sample trajectories. In this work, we […]

Ver mais

Like 0

Liked Liked

technocracy

Breaking the Bias Barrier in Concave Multi-Objective Reinforcement Learning

digitado ⋅ 10 de March de 2026

arXiv:2603.08518v1 Announce Type: cross Abstract: While standard reinforcement learning optimizes a single reward signal, many applications require optimizing a nonlinear utility $f(J_1^pi,dots,J_M^pi)$ over multiple objectives, where each $J_m^pi$ denotes the expected discounted return of a distinct reward function. A common approach is concave scalarization, which captures important trade-offs such as fairness and risk sensitivity. However, nonlinear scalarization introduces a fundamental challenge for policy gradient methods: the gradient depends on $partial f(J^pi)$, while in practice only empirical return estimates […]

Ver mais

Like 0

Liked Liked

technocracy

Efficient Credal Prediction through Decalibration

digitado ⋅ 10 de March de 2026

arXiv:2603.08495v1 Announce Type: cross Abstract: A reliable representation of uncertainty is essential for the application of modern machine learning methods in safety-critical settings. In this regard, the use of credal sets (i.e., convex sets of probability distributions) has recently been proposed as a suitable approach to representing epistemic uncertainty. However, as with other approaches to epistemic uncertainty, training credal predictors is computationally complex and usually involves (re-)training an ensemble of models. The resulting computational complexity prevents their adoption […]

Ver mais

Like 0

Liked Liked

technocracy

Beyond the Markovian Assumption: Robust Optimization via Fractional Weyl Integrals in Imbalanced Data

digitado ⋅ 10 de March de 2026

arXiv:2603.08377v1 Announce Type: cross Abstract: Standard Gradient Descent and its modern variants assume local, Markovian weight updates, making them highly susceptible to noise and overfitting. This limitation becomes critically severe in extremely imbalanced datasets such as financial fraud detection where dominant class gradients systematically overwrite the subtle signals of the minority class. In this paper, we introduce a novel optimization algorithm grounded in Fractional Calculus. By isolating the core memory engine of the generalized fractional derivative, the Weighted […]

Ver mais

Like 0

Liked Liked

technocracy

Towards plausibility in time series counterfactual explanations

digitado ⋅ 10 de March de 2026

arXiv:2603.08349v1 Announce Type: cross Abstract: We present a new method for generating plausible counterfactual explanations for time series classification problems. The approach performs gradient-based optimization directly in the input space. To enforce plausibility, we integrate soft-DTW (dynamic time warping) alignment with $k$-nearest neighbors from the target class, which effectively encourages the generated counterfactuals to adopt a realistic temporal structure. The overall optimization objective is a multi-faceted loss function that balances key counterfactual properties. It incorporates losses for validity, […]

Ver mais

Like 0

Liked Liked

technocracy

Are We Winning the Wrong Game? Revisiting Evaluation Practices for Long-Term Time Series Forecasting

digitado ⋅ 10 de March de 2026

arXiv:2603.08156v1 Announce Type: cross Abstract: Long-term time series forecasting (LTSF) is widely recognized as a central challenge in data mining and machine learning. LTSF has increasingly evolved into a benchmark-driven ”GAME,” where models are ranked, compared, and declared state-of-the-art based primarily on marginal reductions in aggregated pointwise error metrics such as MSE and MAE. Across a small set of canonical datasets and fixed forecasting horizons, progress is communicated through leaderboard-style tables in which lower numerical scores define success. […]

Ver mais

Like 0

Liked Liked

technocracy

Explainable Condition Monitoring via Probabilistic Anomaly Detection Applied to Helicopter Transmissions

digitado ⋅ 10 de March de 2026

arXiv:2603.08130v1 Announce Type: cross Abstract: We present a novel Explainable methodology for Condition Monitoring, relying on healthy data only. Since faults are rare events, we propose to focus on learning the probability distribution of healthy observations only, and detect Anomalies at runtime. This objective is achieved via the definition of probabilistic measures of deviation from nominality, which allow to detect and anticipate faults. The Bayesian perspective underpinning our approach allows us to perform Uncertainty Quantification to inform decisions. […]

Ver mais

Like 0

Liked Liked

technocracy

Amortizing Maximum Inner Product Search with Learned Support Functions

digitado ⋅ 10 de March de 2026

arXiv:2603.08001v1 Announce Type: cross Abstract: Maximum inner product search (MIPS) is a crucial subroutine in machine learning, requiring the identification of key vectors that align best with a given query. We propose amortized MIPS: a learning-based approach that trains neural networks to directly predict MIPS solutions, amortizing the computational cost of matching queries (drawn from a fixed distribution) to a fixed set of keys. Our key insight is that the MIPS value function, the maximal inner product between […]

Ver mais

Like 0

Liked Liked

technocracy

Bayesian Transformer for Probabilistic Load Forecasting in Smart Grids

digitado ⋅ 10 de March de 2026

arXiv:2603.07899v1 Announce Type: cross Abstract: The reliable operation of modern power grids requires probabilistic load forecasts with well-calibrated uncertainty estimates. However, existing deep learning models produce overconfident point predictions that fail catastrophically under extreme weather distributional shifts. This study proposes a Bayesian Transformer (BT) framework that integrates three complementary uncertainty mechanisms into a PatchTST backbone: Monte Carlo Dropout for epistemic parameter uncertainty, variational feed-forward layers with log-uniform weight priors, and stochastic attention with learnable Gaussian noise perturbations on […]

Ver mais

Like 0

Liked Liked