digitado – Page 52

Do LLMs Need Inherent Reasoning Before Reinforcement Learning? A Study in Korean Self-Correction

digitado ⋅ 12 de January de 2026

arXiv:2601.05459v1 Announce Type: new Abstract: Large Language Models (LLMs) demonstrate strong reasoning and self-correction abilities in high-resource languages like English, but their performance remains limited in low-resource languages such as Korean. In this study, we investigate whether reinforcement learning (RL) can enhance Korean reasoning abilities to a degree comparable to English. Our findings reveal that RL alone yields limited improvements when applied to models lacking inherent Korean reasoning capabilities. To address this, we explore several fine-tuning strategies and […]

Ver mais

Like 0

Liked Liked

technocracy

Learning to Play Blackjack: A Curriculum Learning Perspective

digitado ⋅ 31 de March de 2026

Reinforcement Learning (RL) agents often struggle with efficiency and performance in complex environments. We propose a novel framework that uses a Large Language Model (LLM) to dynamically generate a curriculum over available actions, enabling the agent to incorporate each action individually. We apply this framework to the game of Blackjack, where the LLM creates a multi-stage training path that progressively introduces complex actions to a Tabular Q-Learning and a Deep Q-Network (DQN) agent. Our evaluation in a realistic […]

Ver mais

Like 0

Liked Liked

technocracy

Retrieval-Augmented Multi-LLM Ensemble for Industrial Part Specification Extraction

digitado ⋅ 12 de January de 2026

arXiv:2601.05266v1 Announce Type: new Abstract: Industrial part specification extraction from unstructured text remains a persistent challenge in manufacturing, procurement, and maintenance, where manual processing is both time-consuming and error-prone. This paper introduces a retrieval-augmented multi-LLM ensemble framework that orchestrates nine state-of-the-art Large Language Models (LLMs) within a structured three-phase pipeline. RAGsemble addresses key limitations of single-model systems by combining the complementary strengths of model families including Gemini (2.0, 2.5, 1.5), OpenAI (GPT-4o, o4-mini), Mistral Large, and Gemma (1B, […]

Ver mais

Like 0

Liked Liked

technocracy

OrthoAI v2: From Single-Agent Segmentation to Dual-Agent Treatment Planning for Clear Aligners

digitado ⋅ 18 de March de 2026

arXiv:2603.15663v1 Announce Type: new Abstract: We present OrthoAI v2, the second iteration of our open-source pipeline for AI-assisted orthodontic treatment planning with clear aligners, substantially extending the single-agent framework previously introduced. The first version established a proof-of-concept based on Dynamic Graph Convolutional Neural Networks (dgcnn{}) for tooth segmentation but was limited to per-tooth centroid extraction, lacked landmark-level precision, and produced a scalar quality score without staging simulation. vtwo{} addresses all three limitations through three principal contributions: (i)~a second […]

Ver mais

Like 0

Liked Liked

technocracy

Training-Free Diffusion-Driven Modeling of Pareto Set Evolution for Dynamic Multiobjective Optimization

digitado ⋅ 31 de March de 2026

arXiv:2603.26749v1 Announce Type: new Abstract: Dynamic multiobjective optimization problems (DMOPs) feature time-varying objectives, which cause the Pareto optimal solution (POS) set to drift over time and make it difficult to maintain both convergence and diversity under limited response time. Many existing prediction-based dynamic multiobjective evolutionary algorithms (DMOEAs) either depend on learned models with nontrivial training cost or employ one-step population mapping, which may overlook the gradual nature of POS evolution. This paper proposes DD-DMOEA, a training-free diffusion-based dynamic […]

Ver mais

Like 0

Liked Liked

technocracy

Personalized Spiking Neural Networks with Ferroelectric Synapses for EEG Signal Processing

digitado ⋅ 6 de January de 2026

arXiv:2601.00020v2 Announce Type: new Abstract: Electroencephalography (EEG)-based brain-computer interfaces (BCIs) are strongly affected by non-stationary neural signals that vary across sessions and individuals, limiting the generalization of subject-agnostic models and motivating adaptive and personalized learning on resource-constrained platforms. Programmable memristive hardware offers a promising substrate for such post-deployment adaptation; however, practical realization is challenged by limited weight resolution, device variability, nonlinear programming dynamics, and finite device endurance. In this work, we show that spiking neural networks (SNNs) can […]

Ver mais

Like 0

Liked Liked

technocracy

Coupling Generative Modeling and an Autoencoder with the Causal Bridge

digitado ⋅ 15 de January de 2026

arXiv:2509.25599v2 Announce Type: replace Abstract: We consider inferring the causal effect of a treatment (intervention) on an outcome of interest in situations where there is potentially an unobserved confounder influencing both the treatment and the outcome. This is achievable by assuming access to two separate sets of control (proxy) measurements associated with treatment and outcomes, which are used to estimate treatment effects through a function termed the em causal bridge (CB). We present a new theoretical perspective, associated […]

Ver mais

Like 0

Liked Liked

technocracy

Robotic Assembly Using Deep Reinforcement Learning

digitado ⋅ 21 de October de 2020

Introduction Disclaimer: This article is a cross post from Pytorch Medium Blog Post. One of the most exciting advancements, that has pushed the frontier of the Artificial Intelligence (AI) in recent years, is Deep Reinforcement Learning (DRL). DRL belongs to the family of machine learning algorithms. It assumes that intelligent machines can learn from their actions similar to the way humans learn from experience. Over the recent years we could witness some impressive real-world applications of DRL. The […]

Ver mais

Like 0

Liked Liked

technocracy

Breaking the $O(sqrt{T})$ Cumulative Constraint Violation Barrier while Achieving $O(sqrt{T})$ Static Regret in Constrained Online Convex Optimization

digitado ⋅ 24 de March de 2026

arXiv:2603.20671v1 Announce Type: cross Abstract: The problem of constrained online convex optimization is considered, where at each round, once a learner commits to an action $x_t in mathcal{X} subset mathbb{R}^d$, a convex loss function $f_t$ and a convex constraint function $g_t$ that drives the constraint $g_t(x)le 0$ are revealed. The objective is to simultaneously minimize the static regret and cumulative constraint violation (CCV) compared to the benchmark that knows the loss functions and constraint functions $f_t$ and $g_t$ […]

Ver mais

Like 0

Liked Liked

technocracy

Stop Blaming Your Data. Your BERT Fine-Tuning Strategy Is the Problem.

digitado ⋅ 27 de February de 2026

I Fine-Tuned BERT 47 Times Before I Realized I Was the Problem Fine-tuning BERT looks simple on Hugging Face. Running it in production looks like a different universe. Attempt number 47. Surely the learning rate is the only variable left to change. It was 1:47 AM. The sprint demo was in six hours. I had a BERT model fine-tuned on our customer support ticket dataset. I’d done everything by the book. Pre-trained weights from bert-base-uncased. Hugging Face Transformers. AdamW optimizer. Learning rate […]

Ver mais

Like 0

Liked Liked