digitado

Why is Normalization Preferred? A Worst-Case Complexity Theory for Stochastically Preconditioned SGD under Heavy-Tailed Noise

digitado ⋅ 17 de February de 2026

arXiv:2602.13413v1 Announce Type: cross Abstract: We develop a worst-case complexity theory for stochastically preconditioned stochastic gradient descent (SPSGD) and its accelerated variants under heavy-tailed noise, a setting that encompasses widely used adaptive methods such as Adam, RMSProp, and Shampoo. We assume the stochastic gradient noise has a finite $p$-th moment for some $p in (1,2]$, and measure convergence after $T$ iterations. While clipping and normalization are parallel tools for stabilizing training of SGD under heavy-tailed noise, there is […]

Ver mais

Like 0

Liked Liked

technocracy

GUI-Eyes: Tool-Augmented Perception for Visual Grounding in GUI Agents

digitado ⋅ 16 de January de 2026

arXiv:2601.09770v1 Announce Type: new Abstract: Recent advances in vision-language models (VLMs) and reinforcement learning (RL) have driven progress in GUI automation. However, most existing methods rely on static, one-shot visual inputs and passive perception, lacking the ability to adaptively determine when, whether, and how to observe the interface. We present GUI-Eyes, a reinforcement learning framework for active visual perception in GUI tasks. To acquire more informative observations, the agent learns to make strategic decisions on both whether and […]

Ver mais

Like 0

Liked Liked

technocracy

From XAI to MLOps: Explainable Concept Drift Detection with Profile Drift Detection

digitado ⋅ 7 de April de 2026

arXiv:2412.11308v2 Announce Type: replace Abstract: Predictive models often degrade in performance due to evolving data distributions, a phenomenon known as data drift. Among its forms, concept drift, where the relationship between explanatory variables and the response variable changes, is particularly challenging to detect and adapt to. Traditional drift detection methods often rely on metrics such as accuracy or marginal variable distributions, which may fail to capture subtle but important conceptual changes. This paper proposes a novel method, Profile […]

Ver mais

Like 0

Liked Liked

technocracy

Microsoft’s Graphormer: The Transformer That Finally Beats GNNs

digitado ⋅ 28 de February de 2026

:::info Authors: Chengxuan Ying, yingchengsyuan@gmail.com (Dalian University of Technology) Tianle Cai, tianle.cai@princeton.edu (Princeton University) Shengjie Luo, luosj@stu.pku.edu.cn (Peking University) Shuxin Zheng, shuz@microsoft.com (Microsoft Research Asia) Guolin Ke, guoke@microsoft.com (Microsoft Research Asia) Di He, dihe@microsoft.com (Microsoft Research Asia) Yanming Shen, shen@dlut.edu.cn (Dalian University of Technology) Tie-Yan Liu, tyliu@microsoft.com (Microsoft Research Asia) ::: Abstract The Transformer architecture has become a dominant choice in many domains, such as natural language processing and computer vision. Yet, it has not achieved competitive performance […]

Ver mais

Like 0

Liked Liked

technocracy

EAGLE: Edge-Aware Graph Learning for Proactive Delivery Delay Prediction in Smart Logistics Networks

digitado ⋅ 7 de April de 2026

Modern logistics networks generate rich operational data streams at every warehouse node and transportation lane — from order timestamps and routing records to shipping manifests — yet predicting delivery delays remains predominantly reactive. Existing predictive approaches typically treat this problem either as a tabular classification task, ignoring network topology, or as a time-series anomaly detection task, overlooking the spatial dependencies of the supply chain graph. To bridge this gap, we propose a hybrid deep learning framework for proactive […]

Ver mais

Like 0

Liked Liked

technocracy

Probing Ethical Framework Representations in Large Language Models: Structure, Entanglement, and Methodological Challenges

digitado ⋅ 26 de March de 2026

arXiv:2603.23659v1 Announce Type: new Abstract: When large language models make ethical judgments, do their internal representations distinguish between normative frameworks, or collapse ethics into a single acceptability dimension? We probe hidden representations across five ethical frameworks (deontology, utilitarianism, virtue, justice, commonsense) in six LLMs spanning 4B–72B parameters. Our analysis reveals differentiated ethical subspaces with asymmetric transfer patterns — e.g., deontology probes partially generalize to virtue scenarios while commonsense probes fail catastrophically on justice. Disagreement between deontological and utilitarian […]

Ver mais

Like 0

Liked Liked

technocracy

January sponsors-only newsletter is out

digitado ⋅ 3 de February de 2026

I just sent the January edition of my sponsors-only monthly newsletter. If you are a sponsor (or if you start a sponsorship now) you can access it here. In the newsletter for January: LLM predictions for 2026 Coding agents get even more attention Clawdbot/Moltbot/OpenClaw went very viral Kakapo breeding season is off to a really strong start New options for sandboxes Web browsers are the “hello world” of coding agent swarms Sam Altman addressed the Jevons paradox for […]

Ver mais

Like 0

Liked Liked

technocracy

The Quiet Return of Gold as Financial Infrastructure

digitado ⋅ 6 de April de 2026

How Tokenization Is Turning Gold Into a Functional Asset Amid a Fragmenting Financial System “Gold is money. Everything else is credit.” J.P. Morgan said that in testimony before Congress in 1912. For most of the century that followed, the statement felt like a relic. Today it reads less like history and more like a description of how the system is quietly reconfiguring itself. Over the past three years, central banks have accumulated gold at the fastest pace in […]

Ver mais

Like 0

Liked Liked

technocracy

How ShareChat Scaled their ML Feature Store 1000X without Scaling the Database

digitado ⋅ 3 de February de 2026

How ShareChat engineers rebuilt a low-latency ML feature store on ScyllaDB after an initial scalability failure — and what they learned along the way The demand for low-latency machine learning feature stores is higher than ever, but actually implementing one at scale remains a challenge. That became clear when ShareChat engineers Ivan Burmistrov and Andrei Manakov took the P99 CONF 23 stage to share how they built a low-latency ML feature store based on ScyllaDB. This isn’t a […]

Ver mais

Like 0

Liked Liked

technocracy

A Geometrically-Grounded Drive for MDL-Based Optimization in Deep Learning

digitado ⋅ 16 de March de 2026

arXiv:2603.12304v1 Announce Type: new Abstract: This paper introduces a novel optimization framework that fundamentally integrates the Minimum Description Length (MDL) principle into the training dynamics of deep neural networks. Moving beyond its conventional role as a model selection criterion, we reformulate MDL as an active, adaptive driving force within the optimization process itself. The core of our method is a geometrically-grounded cognitive manifold whose evolution is governed by a textit{coupled Ricci flow}, enriched with a novel textit{MDL Drive} […]

Ver mais

Like 0

Liked Liked