February 2026

Learning path from Q-learning to TD3 (course suggestions?)

digitado ⋅ 6 de February de 2026

I’m a graduate research assistant working on autonomous vehicle–related research. I was given an existing codebase with folders like Q-learning / DQN / DDPG / TD3, and I’m expected to replicate and work with TD3. The problem is that I currently have: Basic Python skills, very limited Intro-level understanding of RL (Q-learning, DQN) and almost no exposure to actor–critic methods I’m looking for a clear learning roadmap that builds knowledge from tabular Q-learning → DQN → policy gradients […]

Ver mais

Like 0

Liked Liked

technocracy

Robust Online Learning

digitado ⋅ 6 de February de 2026

We study the problem of learning robust classifiers where the classifier will receive a perturbed input. Unlike robust PAC learning studied in prior work, here the clean data and its label are also adversarially chosen. We formulate this setting as an online learning problem and consider both the realizable and agnostic learnability of hypothesis classes. We define a new dimension of classes and show it controls the mistake bounds in the realizable setting and the regret bounds in […]

Ver mais

Like 0

Liked Liked

technocracy

Soft Forward-Backward Representations for Zero-shot Reinforcement Learning with General Utilities

digitado ⋅ 6 de February de 2026

Recent advancements in zero-shot reinforcement learning (RL) have facilitated the extraction of diverse behaviors from unlabeled, offline data sources. In particular, forward-backward algorithms (FB) can retrieve a family of policies that can approximately solve any standard RL problem (with additive rewards, linear in the occupancy measure), given sufficient capacity. While retaining zero-shot properties, we tackle the greater problem class of RL with general utilities, in which the objective is an arbitrary differentiable function of the occupancy measure. This […]

Ver mais

Like 0

Liked Liked

technocracy

NASA stage show explores “outer” outer space with Henson’s Fraggles

digitado ⋅ 6 de February de 2026

Move over Snoopy, because NASA has a new character helping to promote its deep space exploration plans. His name is Uncle Traveling Matt. No really, move over. Fraggle Rock: A Space-y Adventure has taken over the same theater the Kennedy Space Center Visitor Complex in Florida previously used for All Systems Are Go, featuring the comic strip beagle. The new stage show stars the Jim Henson Company’s subterranean Muppets as they discover outer (outer) space for the first […]

Ver mais

Like 0

Liked Liked

technocracy

Semantically Labelled Automata for Multi-Task Reinforcement Learning with LTL Instructions

digitado ⋅ 6 de February de 2026

We study multi-task reinforcement learning (RL), a setting in which an agent learns a single, universal policy capable of generalising to arbitrary, possibly unseen tasks. We consider tasks specified as linear temporal logic (LTL) formulae, which are commonly used in formal methods to specify properties of systems, and have recently been successfully adopted in RL. In this setting, we present a novel task embedding technique leveraging a new generation of semantic LTL-to-automata translations, originally developed for temporal synthesis. […]

Ver mais

Like 0

Liked Liked

technocracy

EU says TikTok needs to drop “addictive design”

digitado ⋅ 6 de February de 2026

Brussels has warned TikTok that its endlessly scrolling feeds may breach Europe’s new content rules, as regulators press ahead with efforts to rein in the social effects of big online platforms. In preliminary findings issued on Friday, the European Commission said it believed the group had failed to adequately assess and mitigate the risks posed by addictive design features that could harm users’ physical and mental wellbeing, particularly children and other vulnerable groups. The warning marks one of […]

Ver mais

Like 0

Liked Liked

technocracy

F-GRPO: Don’t Let Your Policy Learn the Obvious and Forget the Rare

digitado ⋅ 6 de February de 2026

Reinforcement Learning with Verifiable Rewards (RLVR) is commonly based on group sampling to estimate advantages and stabilize policy updates. In practice, large group sizes are not feasible due to computational limits, which biases learning toward trajectories that are already likely. Smaller groups often miss rare-correct trajectories while still containing mixed rewards, concentrating probability on common solutions. We derive the probability that updates miss rare-correct modes as a function of group size, showing non-monotonic behavior, and characterize how updates […]

Ver mais

Like 0

Liked Liked

technocracy

Makespan Minimization in Split Learning: From Theory to Practice

digitado ⋅ 6 de February de 2026

Split learning recently emerged as a solution for distributed machine learning with heterogeneous IoT devices, where clients can offload part of their training to computationally-powerful helpers. The core challenge in split learning is to minimize the training time by jointly devising the client-helper assignment and the schedule of tasks at the helpers. We first study the model where each helper has a memory cardinality constraint on how many clients it may be assigned, which represents the case of […]

Ver mais

Like 0

Liked Liked

technocracy

Temperature Scaling Attack Disrupting Model Confidence in Federated Learning

digitado ⋅ 6 de February de 2026

Predictive confidence serves as a foundational control signal in mission-critical systems, directly governing risk-aware logic such as escalation, abstention, and conservative fallback. While prior federated learning attacks predominantly target accuracy or implant backdoors, we identify confidence calibration as a distinct attack objective. We present the Temperature Scaling Attack (TSA), a training-time attack that degrades calibration while preserving accuracy. By injecting temperature scaling with learning rate-temperature coupling during local training, malicious updates maintain benign-like optimization behavior, evading accuracy-based monitoring […]

Ver mais

Like 0

Liked Liked

technocracy

Temperature Scaling Attack Disrupting Model Confidence in Federated Learning

digitado ⋅ 6 de February de 2026

Ver mais

Like 0

Liked Liked