digitado – Page 494

From $P(y|x)$ to $P(y)$: Investigating Reinforcement Learning in Pre-train Space

digitado ⋅ 15 de April de 2026

While reinforcement learning with verifiable rewards (RLVR) significantly enhances LLM reasoning by optimizing the conditional distribution P(y|x), its potential is fundamentally bounded by the base model’s existing output distribution. Optimizing the marginal distribution P(y) in the Pre-train Space addresses this bottleneck by encoding reasoning ability and preserving broad exploration capacity. Yet, conventional pre-training relies on static corpora for passive learning, leading to a distribution shift that hinders targeted reasoning enhancement. In this paper, we introduce PreRL (Pre-train Space […]

Ver mais

Like 0

Liked Liked

technocracy

Teach Diffusion Language Models to Learn from Their Own Mistakes

digitado ⋅ 10 de January de 2026

Masked Diffusion Language Models (DLMs) achieve significant speed by generating multiple tokens in parallel. However, this parallel sampling approach, especially when using fewer inference steps, will introduce strong dependency errors and cause quality to deteriorate rapidly as the generation step size grows. As a result, reliable self-correction becomes essential for maintaining high-quality multi-token generation. To address this, we propose Decoupled Self-Correction (DSC), a novel two-stage methodology. DSC first fully optimizes the DLM’s generative ability before freezing the model […]

Ver mais

Like 0

Liked Liked

technocracy

Learning from imperfect quantum data via unsupervised domain adaptation with classical shadows

digitado ⋅ 30 de March de 2026

Learning from quantum data using classical machine learning models has emerged as a promising paradigm toward realizing quantum advantages. Despite extensive analyses on their performance, clean and fully labeled quantum data from the target domain are often unavailable in practical scenarios, forcing models to be trained on data collected under conditions that differ from those encountered at deployment. This mismatch highlights the need for new approaches beyond the common assumptions of prior work. In this work, we address […]

Ver mais

Like 0

Liked Liked

technocracy

Optimistic Training and Convergence of Q-Learning — Extended Version

digitado ⋅ 9 de February de 2026

arXiv:2602.06146v1 Announce Type: cross Abstract: In recent work it is shown that Q-learning with linear function approximation is stable, in the sense of bounded parameter estimates, under the $(varepsilon,kappa)$-tamed Gibbs policy; $kappa$ is inverse temperature, and $varepsilon>0$ is introduced for additional exploration. Under these assumptions it also follows that there is a solution to the projected Bellman equation (PBE). Left open is uniqueness of the solution, and criteria for convergence outside of the standard tabular or linear MDP […]

Ver mais

Like 0

Liked Liked

technocracy

DABench-LLM: Standardized and In-Depth Benchmarking of Post-Moore Dataflow AI Accelerators for LLMs

digitado ⋅ 29 de January de 2026

arXiv:2601.19904v1 Announce Type: new Abstract: The exponential growth of large language models has outpaced the capabilities of traditional CPU and GPU architectures due to the slowdown of Moore’s Law. Dataflow AI accelerators present a promising alternative; however, there remains a lack of in-depth performance analysis and standardized benchmarking methodologies for LLM training. We introduce DABench-LLM, the first benchmarking framework designed for evaluating LLM workloads on dataflow-based accelerators. By combining intra-chip performance profiling and inter-chip scalability analysis, DABench-LLM enables […]

Ver mais

Like 0

Liked Liked

technocracy

Do people expect different behavior from large language models acting on their behalf? Evidence from norm elicitations in two canonical economic games

digitado ⋅ 23 de January de 2026

arXiv:2601.15312v1 Announce Type: new Abstract: While delegating tasks to large language models (LLMs) can save people time, there is growing evidence that offloading tasks to such models produces social costs. We use behavior in two canonical economic games to study whether people have different expectations when decisions are made by LLMs acting on their behalf instead of themselves. More specifically, we study the social appropriateness of a spectrum of possible behaviors: when LLMs divide resources on our behalf […]

Ver mais

Like 0

Liked Liked

technocracy

As AI Accelerates Execution, Product Failures Shift to a Crisis of Understanding

digitado ⋅ 24 de January de 2026

Execution is no longer the hard part. n Understanding is. AI has collapsed the cost of building, shipping, and iterating. n Code is faster. n Content is instant. n Decisions are suggested before we even ask for them. On the surface, this looks like progress. Underneath, it changes what actually breaks. Not the system. n The meaning inside it. Everyone talks about scaling execution. n Few design for shared understanding. That gap is where most modern product failures […]

Ver mais

Like 0

Liked Liked

technocracy

Is anyone interested in the RL ↔ neuroscience “spiral”? Thinking of writing a deep dive series

digitado ⋅ 11 de March de 2026

I’ve been thinking a lot about the relationship between reinforcement learning and neuroscience lately, and something about the usual framing doesn’t quite capture it. People often say the two fields developed in parallel. But historically it feels more like a spiral. Ideas move from neuroscience into computational models, then back again. Each turn sharpens the other. I’m considering writing a deep dive series about this, tentatively called “The RL Spiral.” The goal would be to trace how ideas […]

Ver mais

Like 0

Liked Liked

technocracy

Cultural Perspectives and Expectations for Generative AI: A Global Survey Approach

digitado ⋅ 9 de March de 2026

arXiv:2603.05723v1 Announce Type: new Abstract: There is a lack of empirical evidence about global attitudes around whether and how GenAI should represent cultures. This paper assesses understandings and beliefs about culture as it relates to GenAI from a large-scale global survey. We gathered data about what culture means to different groups, and about how GenAI should approach the representation of cultural artifacts, concepts, or values. We distill working definitions of culture directly from these communities to build an […]

Ver mais

Like 0

Liked Liked

technocracy

One-Step Flow Policy: Self-Distillation for Fast Visuomotor Policies

digitado ⋅ 16 de March de 2026

arXiv:2603.12480v1 Announce Type: new Abstract: Generative flow and diffusion models provide the continuous, multimodal action distributions needed for high-precision robotic policies. However, their reliance on iterative sampling introduces severe inference latency, degrading control frequency and harming performance in time-sensitive manipulation. To address this problem, we propose the One-Step Flow Policy (OFP), a from-scratch self-distillation framework for high-fidelity, single-step action generation without a pre-trained teacher. OFP unifies a self-consistency loss to enforce coherent transport across time intervals, and a […]

Ver mais

Like 0

Liked Liked