technocracy

CircuitBuilder: From Polynomials to Circuits via Reinforcement Learning

digitado ⋅ 17 de March de 2026

Motivated by auto-proof generation and Valiant’s VP vs. VNP conjecture, we study the problem of discovering efficient arithmetic circuits to compute polynomials, using addition and multiplication gates. We formulate this problem as a single-player game, where an RL agent attempts to build the circuit within a fixed number of operations. We implement an AlphaZero-style training loop and compare two approaches: Proximal Policy Optimization with Monte Carlo Tree Search (PPO+MCTS) and Soft Actor-Critic (SAC). SAC achieves the highest success […]

Ver mais

Like 0

Liked Liked

technocracy

Double Fairness Policy Learning: Integrating Action Fairness and Outcome Fairness in Decision-making

digitado ⋅ 11 de February de 2026

arXiv:2601.19186v2 Announce Type: replace Abstract: Fairness is a central pillar of trustworthy machine learning, especially in domains where accuracy- or profit-driven optimization is insufficient. While most fairness research focuses on supervised learning, fairness in policy learning remains less explored. Because policy learning is interventional, it induces two distinct fairness targets: action fairness (equitable action assignments) and outcome fairness (equitable downstream consequences). Crucially, equalizing actions does not generally equalize outcomes when groups face different constraints or respond differently to […]

Ver mais

Like 0

Liked Liked

technocracy

Achieving $varepsilon^{-2}$ Dependence for Average-Reward Q-Learning with a New Contraction Principle

digitado ⋅ 29 de January de 2026

We present the convergence rates of synchronous and asynchronous Q-learning for average-reward Markov decision processes, where the absence of contraction poses a fundamental challenge. Existing non-asymptotic results overcome this challenge by either imposing strong assumptions to enforce seminorm contraction or relying on discounted or episodic Markov decision processes as successive approximations, which either require unknown parameters or result in suboptimal sample complexity. In this work, under a reachability assumption, we establish optimal $widetilde{O}(varepsilon^{-2})$ sample complexity guarantees (up to […]

Ver mais

Like 0

Liked Liked

technocracy

Spatial-Temporal Nonlocal Traffic Dynamics: Analytical Properties, Adaptive Kernel Formulation, and Empirical Validation

digitado ⋅ 30 de March de 2026

arXiv:2603.25859v1 Announce Type: new Abstract: This paper presents a new spatial-temporal nonlocal traffic flow model formulated to overcome the boundedness limitations inherent in classical local formulations. The model introduces an adaptive kernel that captures both spatial and temporal nonlocal interactions, allowing the velocity at a given point to depend on aggregated downstream traffic conditions over a finite time horizon. This structure provides a more realistic representation of driver anticipation and reaction behavior. In addition to developing the model, […]

Ver mais

Like 0

Liked Liked

technocracy

GUI-Perturbed: Domain Randomization Reveals Systematic Brittleness in GUI Grounding Models

digitado ⋅ 18 de April de 2026

arXiv:2604.14262v1 Announce Type: new Abstract: GUI grounding models report over 85% accuracy on standard benchmarks, yet drop 27-56 percentage points when instructions require spatial reasoning rather than direct element naming. Current benchmarks miss this because they evaluate each screenshot once with a single fixed instruction. We introduce GUI-Perturbed, a controlled perturbation framework that independently varies visual scenes and instructions to measure grounding robustness. Evaluating three 7B models from the same architecture lineage, we find that relational instructions cause […]

Ver mais

Like 0

Liked Liked

technocracy

Choosing Your AI Coding Engine in 2026

digitado ⋅ 5 de February de 2026

Author(s): Sandip Patel Originally published on Towards AI. Why this guide Enterprise development isn’t just about generating code — it’s about shipping secure, reliable software across large repos, long‑running tasks, and regulated environments. That’s where GPT‑5.2 Codex, running inside Microsoft Foundry, changes the conversation: it’s not merely “autocomplete,” but sustained reasoning designed for the realities of real‑world SDLCs. If you’ve been following the rapid evolution of AI coding models, you’ve probably noticed a shift: we’re no longer talking […]

Ver mais

Like 0

Liked Liked

technocracy

PRISM: Personalized Refinement of Imitation Skills for Manipulation via Human Instructions

digitado ⋅ 9 de March de 2026

arXiv:2603.05574v1 Announce Type: new Abstract: This paper presents PRISM: an instruction-conditioned refinement method for imitation policies in robotic manipulation. This approach bridges Imitation Learning (IL) and Reinforcement Learning (RL) frameworks into a seamless pipeline, such that an imitation policy on a broad generic task, generated from a set of user-guided demonstrations, can be refined through reinforcement to generate new unseen fine-grain behaviours. The refinement process follows the Eureka paradigm, where reward functions for RL are iteratively generated from […]

Ver mais

Like 0

Liked Liked

technocracy

What TPS Really Measures in Blockchains — and When It Misleads

digitado ⋅ 18 de March de 2026

TPS (transactions per second) has become a symbol of performance in the blockchain industry. Networks are often compared by a single number, with claims of “tens of thousands of TPS.” The problem is that TPS by itself says very little about a system’s real throughput. TPS is highly dependent on transaction types, block configuration, and measurement methodology. The same blockchain can report radically different TPS values without any architectural changes — simply by tuning execution parameters such as […]

Ver mais

Like 0

Liked Liked

technocracy

Lemon Agent Technical Report

digitado ⋅ 10 de February de 2026

arXiv:2602.07092v1 Announce Type: new Abstract: Recent advanced LLM-powered agent systems have exhibited their remarkable capabilities in tackling complex, long-horizon tasks. Nevertheless, they still suffer from inherent limitations in resource efficiency, context management, and multimodal perception. Based on these observations, Lemon Agent is introduced, a multi-agent orchestrator-worker system built on a newly proposed AgentCortex framework, which formalizes the classic Planner-Executor-Memory paradigm through an adaptive task execution mechanism. Our system integrates a hierarchical self-adaptive scheduling mechanism that operates at both […]

Ver mais

Like 0

Liked Liked

technocracy

FHECore: Rethinking GPU Microarchitecture for Fully Homomorphic Encryption

digitado ⋅ 27 de February de 2026

arXiv:2602.22229v1 Announce Type: new Abstract: Fully Homomorphic Encryption (FHE) enables computation directly on encrypted data but incurs massive computational and memory overheads, often exceeding plaintext execution by several orders of magnitude. While custom ASIC accelerators can mitigate these costs, their long time-to-market and the rapid evolution of FHE algorithms threaten their long-term relevance. GPUs, by contrast, offer scalability, programmability, and widespread availability, making them an attractive platform for FHE. However, modern GPUs are increasingly specialized for machine learning […]

Ver mais

Like 0

Liked Liked