digitado – Page 132

Validating “Streaming Deep RL Finally Works” on 433k Observations of Real Attack Traffic

digitado ⋅ 13 de February de 2026

I’m learning the foundations of RL in alignment with the Alberta Plan for AI research and have been running through sets of experiments to both learn and experiment. To that end I spent the last month validating different methods for streaming deep RL on a non-stationary, adversarial dataset of real SSH honeypot observations. This work focuses on prediction and is in line with steps 1 & 2 of the Alberta Plan (Sutton, Bowling, & Pilarski 2022). After implementing […]

Ver mais

Like 0

Liked Liked

technocracy

ConMeZO: Adaptive Descent-Direction Sampling for Gradient-Free Finetuning of Large Language Models

digitado ⋅ 21 de April de 2026

arXiv:2511.02757v2 Announce Type: replace-cross Abstract: Zeroth-order or derivative-free optimization (MeZO) is an attractive strategy for finetuning large language models (LLMs) because it eliminates the memory overhead of backpropagation. However, it converges slowly due to the inherent curse of dimensionality when searching for descent directions in the high-dimensional parameter space of billion-scale LLMs. We propose ConMeZO, a novel zeroth-order optimizer that accelerates convergence by adaptive directional sampling. Instead of drawing the direction uniformly at random, ConMeZO restricts the sampling […]

Ver mais

Like 0

Liked Liked

technocracy

Optimizing Reinforcement Learning Training over Digital Twin Enabled Multi-fidelity Networks

digitado ⋅ 9 de March de 2026

In this paper, we investigate a novel digital network twin (DNT) assisted deep learning (DL) model training framework. In particular, we consider a physical network where a base station (BS) uses several antennas to serve multiple mobile users, and a DNT that is a virtual representation of the physical network. The BS must adjust its antenna tilt angles to optimize the data rates of all users. Due to user mobility, the BS may not be able to accurately […]

Ver mais

Like 0

Liked Liked

technocracy

Adaptive Data Dropout: Towards Self-Regulated Learning in Deep Neural Networks

digitado ⋅ 14 de April de 2026

Deep neural networks are typically trained by uniformly sampling large datasets across epochs, despite evidence that not all samples contribute equally throughout learning. Recent work shows that progressively reducing the amount of training data can improve efficiency and generalization, but existing methods rely on fixed schedules that do not adapt during training. In this work, we propose Adaptive Data Dropout, a simple framework that dynamically adjusts the subset of training data based on performance feedback. Inspired by self-regulated […]

Ver mais

Like 0

Liked Liked

technocracy

Optimized Architectures for Kolmogorov-Arnold Networks

digitado ⋅ 22 de April de 2026

arXiv:2512.12448v2 Announce Type: replace-cross Abstract: Efforts to improve Kolmogorov–Arnold networks (KANs) with architectural enhancements have been stymied by the complexity those enhancements bring, undermining the interpretability that makes KANs attractive in the first place. Here we study overprovisioned architectures combined with sparsification, deep supervision, and depth selection, to learn compact, interpretable KANs without sacrificing accuracy. Crucially, we focus on differentiable mechanisms under a principled minimum description length objective, jointly optimizing activations, structure, and depth end-to-end. Experiments across function […]

Ver mais

Like 0

Liked Liked

technocracy

The TechBeat: People, Process, Context: The Operating Model Modern Defect Resolution Needs (3/4/2026)

digitado ⋅ 4 de March de 2026

How are you, hacker? 🪐Want to know what’s trending right now?: The Techbeat by HackerNoon has got you covered with fresh content from our trending stories of the day! Set email preference here. ## 6 Ways to Use a Crypto Exchange Aggregator and Save on Swaps By @swapzone [ 7 Min read ] Maximize your crypto swaps! Learn 6 ways an exchange aggregator saves you money. Find the best rates on Bitcoin, Ethereum, & other cryptocurrencies on DEXs. […]

Ver mais

Like 0

Liked Liked

technocracy

Compounding Knowledge With LLMs. Karpathy’s Wiki Pattern in Action

digitado ⋅ 16 de April de 2026

Andrej Karpathy recently published a GitHub Gist that quietly named something every AI practitioner has felt but not quite articulated. When Andrej Karpathy’s Gist dropped, it spread fast, and for good reason. Here is what the Gist says, why it matters, and what it looks like when you build it. Andrej Karpathy, co-founder of OpenAI and former Director of AI at Tesla, just thinks this way naturally. Where most people see a tool, he sees a pattern. Where most […]

Ver mais

Like 0

Liked Liked

technocracy

The Devil Is in Gradient Entanglement: Energy-Aware Gradient Coordinator for Robust Generalized Category Discovery

digitado ⋅ 18 de April de 2026

arXiv:2604.14176v1 Announce Type: new Abstract: Generalized Category Discovery (GCD) leverages labeled data to categorize unlabeled samples from known or unknown classes. Most previous methods jointly optimize supervised and unsupervised objectives and achieve promising results. However, inherent optimization interference still limits their ability to improve further. Through quantitative analysis, we identify a key issue, i.e., gradient entanglement, which 1) distorts supervised gradients and weakens discrimination among known classes, and 2) induces representation-subspace overlap between known and novel classes, reducing […]

Ver mais

Like 0

Liked Liked

technocracy

SkillAttack: Automated Red Teaming of Agent Skills through Attack Path Refinement

digitado ⋅ 8 de April de 2026

arXiv:2604.04989v1 Announce Type: new Abstract: LLM-based agent systems increasingly rely on agent skills sourced from open registries to extend their capabilities, yet the openness of such ecosystems makes skills difficult to thoroughly vet. Existing attacks rely on injecting malicious instructions into skills, making them easily detectable by static auditing. However, non-malicious skills may also harbor latent vulnerabilities that an attacker can exploit solely through adversarial prompting, without modifying the skill itself. We introduce SkillAttack, a red-teaming framework that […]

Ver mais

Like 0

Liked Liked

technocracy

Online Statistical Inference of Constant Sample-averaged Q-Learning

digitado ⋅ 31 de March de 2026

arXiv:2603.26982v1 Announce Type: new Abstract: Reinforcement learning algorithms have been widely used for decision-making tasks in various domains. However, the performance of these algorithms can be impacted by high variance and instability, particularly in environments with noise or sparse rewards. In this paper, we propose a framework to perform statistical online inference for a sample-averaged Q-learning approach. We adapt the functional central limit theorem (FCLT) for the modified algorithm under some general conditions and then construct confidence intervals […]

Ver mais

Like 0

Liked Liked