March 2026

FedBCD:Communication-Efficient Accelerated Block Coordinate Gradient Descent for Federated Learning

digitado ⋅ 5 de March de 2026

Although Federated Learning has been widely studied in recent years, there are still high overhead expenses in each communication round for large-scale models such as Vision Transformer. To lower the communication complexity, we propose a novel Federated Block Coordinate Gradient Descent (FedBCGD) method for communication efficiency. The proposed method splits model parameters into several blocks, including a shared block and enables uploading a specific parameter block by each client, which can significantly reduce communication overhead. Moreover, we also […]

Ver mais

Like 0

Liked Liked

technocracy

Decoupling Task and Behavior: A Two-Stage Reward Curriculum in Reinforcement Learning for Robotics

digitado ⋅ 5 de March de 2026

Deep Reinforcement Learning is a promising tool for robotic control, yet practical application is often hindered by the difficulty of designing effective reward functions. Real-world tasks typically require optimizing multiple objectives simultaneously, necessitating precise tuning of their weights to learn a policy with the desired characteristics. To address this, we propose a two-stage reward curriculum where we decouple task-specific objectives from behavioral terms. In our method, we first train the agent on a simplified task-only reward function to […]

Ver mais

Like 0

Liked Liked

technocracy

How I Cut My LLM Costs by 80% Without Sacrificing Quality.

digitado ⋅ 5 de March de 2026

SECTION 1 — From $847 to $159 a Month. The Bill That Made Me Stop Everything I still remember the exact moment. It was a Tuesday morning. I opened my OpenAI billing dashboard expecting the usual maybe $150, $200 tops. Instead I saw $847.32 staring back at me. For one month. For a side project with maybe 200 active users. I sat there for a full minute just staring at it. No image generation. No fine-tuning. No wild experiments. Just a standard RAG pipeline […]

Ver mais

Like 0

Liked Liked

technocracy

Reward-Conditioned Reinforcement Learning

digitado ⋅ 5 de March de 2026

RL agents are typically trained under a single, fixed reward function, which makes them brittle to reward misspecification and limits their ability to adapt to changing task preferences. We introduce Reward-Conditioned Reinforcement Learning (RCRL), a framework that trains a single agent to optimize a family of reward specifications while collecting experience under only one nominal objective. RCRL conditions the agent on reward parameterizations and learns multiple reward objectives from a shared replay data entirely off-policy, enabling a single […]

Ver mais

Like 0

Liked Liked

technocracy

Deep Learning-Driven Friendly Jamming for Secure Multicarrier ISAC Under Channel Uncertainty

digitado ⋅ 5 de March de 2026

Integrated sensing and communication (ISAC) systems promise efficient spectrum utilization by jointly supporting radar sensing and wireless communication. This paper presents a deep learning-driven framework for enhancing physical-layer security in multicarrier ISAC systems under imperfect channel state information (CSI) and in the presence of unknown eavesdropper (Eve) locations. Unlike conventional ISAC-based friendly jamming (FJ) approaches that require Eve’s CSI or precise angle-of-arrival (AoA) estimates, our method exploits radar echo feedback to guide directional jamming without explicit Eve’s information. […]

Ver mais

Like 0

Liked Liked

technocracy

Asymptotic Behavior of Multi–Task Learning: Implicit Regularization and Double Descent Effects

digitado ⋅ 5 de March de 2026

Multi–task learning seeks to improve the generalization error by leveraging the common information shared by multiple related tasks. One challenge in multi–task learning is identifying formulations capable of uncovering the common information shared between different but related tasks. This paper provides a precise asymptotic analysis of a popular multi–task formulation associated with misspecified perceptron learning models. The main contribution of this paper is to precisely determine the reasons behind the benefits gained from combining multiple related tasks. Specifically, […]

Ver mais

Like 0

Liked Liked

technocracy

Heterogeneous Agent Collaborative Reinforcement Learning

digitado ⋅ 5 de March de 2026

We introduce Heterogeneous Agent Collaborative Reinforcement Learning (HACRL), a new learning paradigm that addresses the inefficiencies of isolated on-policy optimization. HACRL enables collaborative optimization with independent execution: heterogeneous agents share verified rollouts during training to mutually improve, while operating independently at inference time. Unlike LLM-based multi-agent reinforcement learning (MARL), HACRL does not require coordinated deployment, and unlike on-/off-policy distillation, it enables bidirectional mutual learning among heterogeneous agents rather than one-directional teacher-to-student transfer. Building on this paradigm, we propose […]

Ver mais

Like 0

Liked Liked

technocracy

The Cost of Overfitting: Lessons Traders Can Learn From Data Science

digitado ⋅ 5 de March de 2026

In both data science and financial trading, one of the most persistent challenges is striking the right balance between model complexity and predictive accuracy. In data science, overfitting occurs when a model learns not only the true underlying patterns in the training data but also the noise — leading to poor generalization on new, unseen data. Similarly, traders can fall prey to overfitting when their strategies become too finely tuned to past market conditions, failing to hold up […]

Ver mais

Like 0

Liked Liked

technocracy

Amodei torches OpenAI in leaked memo

digitado ⋅ 5 de March de 2026

Read Online | Sign Up | Advertise Good morning, {{ first_name | AI enthusiasts }}. “Straight up lies.” That’s how Dario Amodei described OpenAI’s Pentagon messaging in a newly-leaked internal memo sent to Anthropic employees on Friday. The 1,600-word document rips the controversial deal as “80% safety theater” with personal shots at Sam Altman woven throughout — escalating a rivalry that was already one of the most heated in tech, far past an awkward hand-hold refusal. P.S. — […]

Ver mais

Like 0

Liked Liked

technocracy

POMDPPlanners — open-source Python package for POMDP planning (POMCP, BetaZero, ConstrainedZero + more), with an arXiv paper

digitado ⋅ 5 de March de 2026

Every time I needed to run a POMDP experiment, I ended up gluing together half-maintained repos with incompatible interfaces and no clear way to swap planners or environments. So I built something more cohesive. POMDPPlanners is a unified Python framework for POMDP planning research and industrial applications. Among the included planners: POMCP, POMCPOW, POMCP-DPW, PFT-DPW, Sparse PFT, Sparse Sampling, Open Loop Planners, BetaZero (AlphaZero adapted to belief space), and ConstrainedZero (safety-constrained extension using conformal inference). Environments: Tiger, RockSample, […]

Ver mais

Like 0

Liked Liked