digitado – Page 597

Can LLMs Generate Quality Code? A 40,000-Line Experiment

digitado ⋅ 4 de January de 2026

Executive Summary I spent four weeks part-time (probably 80 hours total) building a complete reactive UI framework with 40+ components, a router, and supporting interactive website using only LLM-generated code, it is evident LLMs can produce quality code—but like human developers, they need the right guidance. Key Findings On Code Quality: Well-specified tasks yield clean first-pass code Poorly specified or unique requirements produce sloppy implementations Code degrades over time without deliberate refactoring LLMs defensively over-engineer when asked to […]

Ver mais

Like 0

Liked Liked

technocracy

Growing Concerns as AI Technology Accelerates

digitado ⋅ 5 de January de 2026

In what could be interpreted as a warning of what lies ahead, AI pioneer Yoshua Bengio is warning that AI is showing signs of consciousness or “self-preservation” and humans might need to pull the plug at some point. AI continues to rapidly evolve, moving beyond simple automation into more autonomous systems capable of complex, multi-step tasks. This incredible progress promises immense productivity gains, but it has amplified a wide range of concerns including economic disruption and longer-term societal, […]

Ver mais

Like 0

Liked Liked

technocracy

Field-Theoretic Memory for AI Agents: Continuous Dynamics for Context Preservation

digitado ⋅ 26 de February de 2026

arXiv:2602.21220v1 Announce Type: new Abstract: We present a memory system for AI agents that treats stored information as continuous fields governed by partial differential equations rather than discrete entries in a database. The approach draws from classical field theory: memories diffuse through semantic space, decay thermodynamically based on importance, and interact through field coupling in multi-agent scenarios. We evaluate the system on two established long-context benchmarks: LoCoMo (ACL 2024) with 300-turn conversations across 35 sessions, and LongMemEval (ICLR […]

Ver mais

Like 0

Liked Liked

technocracy

Tacit Coordination of Large Language Models

digitado ⋅ 2 de February de 2026

arXiv:2601.22184v1 Announce Type: new Abstract: In tacit coordination games with multiple outcomes, purely rational solution concepts, such as Nash equilibria, provide no guidance for which equilibrium to choose. Shelling’s theory explains how, in these settings, humans coordinate by relying on focal points: solutions or outcomes that naturally arise because they stand out in some way as salient or prominent to all players. This work studies Large Language Models (LLMs) as players in tacit coordination games, and addresses how, […]

Ver mais

Like 0

Liked Liked

technocracy

LinuxArena: A Control Setting for AI Agents in Live Production Software Environments

digitado ⋅ 20 de April de 2026

arXiv:2604.15384v1 Announce Type: new Abstract: We introduce LinuxArena, a control setting in which agents operate directly on live, multi-service production environments. LinuxArena contains 20 environments, 1,671 main tasks representing legitimate software engineering work, and 184 side tasks representing safety failures such as data exfiltration and backdooring, making it the largest and most diverse control setting for software engineering to date. We validate LinuxArena is useful for control research by running sabotage evaluations, which measure whether attackers can complete […]

Ver mais

Like 0

Liked Liked

technocracy

A Unified Framework for Debiased Machine Learning: Riesz Representer Fitting under Bregman Divergence

digitado ⋅ 11 de February de 2026

arXiv:2601.07752v3 Announce Type: replace-cross Abstract: Estimating the Riesz representer is central to debiased machine learning for causal and structural parameter estimation. We propose generalized Riesz regression, a unified framework for estimating the Riesz representer by fitting a representer model via Bregman divergence minimization. This framework includes various divergences as special cases, such as the squared distance and the Kullback–Leibler (KL) divergence, where the former recovers Riesz regression and the latter recovers tailored loss minimization. Under suitable pairs of […]

Ver mais

Like 0

Liked Liked

technocracy

Support Tokens, Stability Margins, and a New Foundation for Robust LLMs

digitado ⋅ 27 de February de 2026

arXiv:2602.22271v1 Announce Type: new Abstract: Self-attention is usually described as a flexible, content-adaptive way to mix a token with information from its past. We re-interpret causal self-attention transformers, the backbone of modern foundation models, within a probabilistic framework, much like how classical PCA is extended to probabilistic PCA. However, this re-formulation reveals a surprising and deeper structural insight: due to a change-of-variables phenomenon, a barrier constraint emerges on the self-attention parameters. This induces a highly structured geometry on […]

Ver mais

Like 0

Liked Liked

technocracy

Backdoors in RLVR: Jailbreak Backdoors in LLMs From Verifiable Reward

digitado ⋅ 15 de April de 2026

arXiv:2604.09748v1 Announce Type: new Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) is an emerging paradigm that significantly boosts a Large Language Model’s (LLM’s) reasoning abilities on complex logical tasks, such as mathematics and programming. However, we identify, for the first time, a latent vulnerability to backdoor attacks within the RLVR framework. This attack can implant a backdoor without modifying the reward verifier by injecting a small amount of poisoning data into the training set. Specifically, we propose a […]

Ver mais

Like 0

Liked Liked

technocracy

Goal-Oriented Influence-Maximizing Data Acquisition for Learning and Optimization

digitado ⋅ 23 de February de 2026

Active data acquisition is central to many learning and optimization tasks in deep neural networks, yet remains challenging because most approaches rely on predictive uncertainty estimates that are difficult to obtain reliably. To this end, we propose Goal-Oriented Influence- Maximizing Data Acquisition (GOIMDA), an active acquisition algorithm that avoids explicit posterior inference while remaining uncertainty-aware through inverse curvature. GOIMDA selects inputs by maximizing their expected influence on a user-specified goal functional, such as test loss, predictive entropy, or […]

Ver mais

Like 0

Liked Liked

technocracy

Beyond the Prompt in Large Language Models: Comprehension, In-Context Learning, and Chain-of-Thought

digitado ⋅ 13 de March de 2026

arXiv:2603.10000v2 Announce Type: new Abstract: Large Language Models (LLMs) have demonstrated remarkable proficiency across diverse tasks, exhibiting emergent properties such as semantic prompt comprehension, In-Context Learning (ICL), and Chain-of-Thought (CoT) reasoning. Despite their empirical success, the theoretical mechanisms driving these phenomena remain poorly understood. This study dives into the foundations of these observations by addressing three critical questions: (1) How do LLMs accurately decode prompt semantics despite being trained solely on a next-token prediction objective? (2) Through what […]

Ver mais

Like 0

Liked Liked