February 2026

ESSAM: A Novel Competitive Evolution Strategies Approach to Reinforcement Learning for Memory Efficient LLMs Fine-Tuning

digitado ⋅ 1 de February de 2026

Reinforcement learning (RL) has become a key training step for improving mathematical reasoning in large language models (LLMs), but it often has high GPU memory usage, which makes it hard to use in settings with limited resources. To reduce these issues, we propose Evolution Strategies with Sharpness-Aware Maximization (ESSAM), a full parameter fine-tuning framework that tightly combines the zero-order search in parameter space from Evolution Strategies (ES) with the Sharpness-Aware Maximization (SAM) to improve generalization. We conduct fine-tuning […]

Ver mais

Like 0

Liked Liked

technocracy

DISPO: Enhancing Training Efficiency and Stability in Reinforcement Learning for Large Language Model Mathematical Reasoning

digitado ⋅ 1 de February de 2026

Reinforcement learning with verifiable rewards has emerged as a promising paradigm for enhancing the reasoning capabilities of large language models particularly in mathematics. Current approaches in this domain present a clear trade-off: PPO-style methods (e.g., GRPO/DAPO) offer training stability but exhibit slow learning trajectories due to their trust-region constraints on policy updates, while REINFORCE-style approaches (e.g., CISPO) demonstrate improved learning efficiency but suffer from performance instability as they clip importance sampling weights while still permitting non-zero gradients outside […]

Ver mais

Like 0

Liked Liked

technocracy

Multimodal Scientific Learning Beyond Diffusions and Flows

digitado ⋅ 1 de February de 2026

Scientific machine learning (SciML) increasingly requires models that capture multimodal conditional uncertainty arising from ill-posed inverse problems, multistability, and chaotic dynamics. While recent work has favored highly expressive implicit generative models such as diffusion and flow-based methods, these approaches are often data-hungry, computationally costly, and misaligned with the structured solution spaces frequently found in scientific problems. We demonstrate that Mixture Density Networks (MDNs) provide a principled yet largely overlooked alternative for multimodal uncertainty quantification in SciML. As explicit […]

Ver mais

Like 0

Liked Liked

technocracy

From drift to adaptation to the failed ml model: Transfer Learning in Industrial MLOps

digitado ⋅ 1 de February de 2026

Model adaptation to production environment is critical for reliable Machine Learning Operations (MLOps), less attention is paid to developing systematic framework for updating the ML models when they fail under data drift. This paper compares the transfer learning enabled model update strategies including ensemble transfer learning (ETL), all-layers transfer learning (ALTL), and last-layer transfer learning (LLTL) for updating the failed feedforward artificial neural network (ANN) model. The flue gas differential pressure across the air preheater unit installed in […]

Ver mais

Like 0

Liked Liked

technocracy

CLAMP: Contrastive Learning for 3D Multi-View Action-Conditioned Robotic Manipulation Pretraining

digitado ⋅ 1 de February de 2026

Leveraging pre-trained 2D image representations in behavior cloning policies has achieved great success and has become a standard approach for robotic manipulation. However, such representations fail to capture the 3D spatial information about objects and scenes that is essential for precise manipulation. In this work, we introduce Contrastive Learning for 3D Multi-View Action-Conditioned Robotic Manipulation Pretraining (CLAMP), a novel 3D pre-training framework that utilizes point clouds and robot actions. From the merged point cloud computed from RGB-D images […]

Ver mais

Like 0

Liked Liked

technocracy

Research roundup: 6 cool stories we almost missed

digitado ⋅ 1 de February de 2026

It’s a regrettable reality that there is never enough time to cover all the interesting scientific stories we come across each month. So every month, we highlight a handful of the best stories that nearly slipped through the cracks. January’s list includes a lip-syncing robot; using brewer’s yeast as scaffolding for lab-grown meat; hunting for Leonardo da Vinci’s DNA in his art; and new evidence that humans really did transport the stones to build Stonehenge from Wales and […]

Ver mais

Like 0

Liked Liked