digitado

Optimal Learning-Rate Schedules under Functional Scaling Laws: Power Decay and Warmup-Stable-Decay

digitado ⋅ 6 de February de 2026

We study optimal learning-rate schedules (LRSs) under the functional scaling law (FSL) framework introduced in Li et al. (2025), which accurately models the loss dynamics of both linear regression and large language model (LLM) pre-training. Within FSL, loss dynamics are governed by two exponents: a source exponent $s>0$ controlling the rate of signal learning, and a capacity exponent $β>1$ determining the rate of noise forgetting. Focusing on a fixed training horizon $N$, we derive the optimal LRSs and […]

Ver mais

Like 0

Liked Liked

technocracy

When Being the Most Reliable Leader Becomes a Liability

digitado ⋅ 27 de February de 2026

If you’re constantly fixing what’s broken, mediating conflict, and maintaining top performance, the organization may end up over-relying on you—to everyone’s detriment.

Ver mais

Like 0

Liked Liked

technocracy

Engaging the AI community through building, research, and shared learning

digitado ⋅ 2 de February de 2026

Engaging the AI community through building, research, and shared learning Machine learning Staff writer February 02, 02:51 PM February 02, 03:34 PM Advancing AI requires more than breakthrough models. It depends on communities of builders and researchers who experiment, test assumptions, and share what they learn. That belief is guiding how Amazon engages developers and academics around Amazon Nova, Amazons portfolio of AI offerings including the Nova models, Nova Forge and Nova Act. Today, two Nova initiatives launch […]

Ver mais

Like 0

Liked Liked

technocracy

Q-learning with Adjoint Matching

digitado ⋅ 20 de January de 2026

We propose Q-learning with Adjoint Matching (QAM), a novel TD-based reinforcement learning (RL) algorithm that tackles a long-standing challenge in continuous-action RL: efficient optimization of an expressive diffusion or flow-matching policy with respect to a parameterized Q-function. Effective optimization requires exploiting the first-order information of the critic, but it is challenging to do so for flow or diffusion policies because direct gradient-based optimization via backpropagation through their multi-step denoising process is numerically unstable. Existing methods work around this […]

Ver mais

Like 0

Liked Liked

technocracy

Cache Expiry is Eating Your AI Coding Budget

digitado ⋅ 27 de January de 2026

How cache TTL determines your bill I was burning through my Claude Code budget way faster than I should have been. Same work, same sessions, just bleeding tokens for no reason. Took me a while to figure out why. I started digging through the .jsonl session files one night, checking token usage patterns. That’s when I saw it. Almost zero cache hits. Every turn paying full price for stuff that should’ve been cached. The problem wasn’t the tool. It was me! Source: Image […]

Ver mais

Like 0

Liked Liked

technocracy

Coverage Path Planning for Autonomous Sailboats in Inhomogeneous and Time-Varying Oceans: A Spatiotemporal Optimization Approach

digitado ⋅ 19 de February de 2026

arXiv:2602.15901v1 Announce Type: new Abstract: Autonomous sailboats are well suited for long-duration ocean observation due to their wind-driven endurance. However, their performance is highly anisotropic and strongly influenced by inhomogeneous and time-varying wind and current fields, limiting the effectiveness of existing coverage methods such as boustrophedon sweeping. Planning under these environmental and maneuvering constraints remains underexplored. This paper presents a spatiotemporal coverage path planning framework tailored to autonomous sailboats, combining (1) topology-based morphological constraints in the spatial domain […]

Ver mais

Like 0

Liked Liked

technocracy

A universal compression theory for lottery ticket hypothesis and neural scaling laws

digitado ⋅ 3 de March de 2026

arXiv:2510.00504v2 Announce Type: replace Abstract: When training large-scale models, the performance typically scales with the number of parameters and the dataset size according to a slow power law. A fundamental theoretical and practical question is whether comparable performance can be achieved with significantly smaller models and substantially less data. In this work, we provide a positive and constructive answer. We prove that a generic permutation-invariant function of $d$ objects can be asymptotically compressed into a function of $operatorname{polylog} […]

Ver mais

Like 0

Liked Liked

technocracy

Learning Credal Ensembles via Distributionally Robust Optimization

digitado ⋅ 27 de February de 2026

arXiv:2602.08470v2 Announce Type: replace-cross Abstract: Credal predictors are models that are aware of epistemic uncertainty and produce a convex set of probabilistic predictions. They offer a principled way to quantify predictive epistemic uncertainty (EU) and have been shown to improve model robustness in various settings. However, most state-of-the-art methods mainly define EU as disagreement caused by random training initializations, which mostly reflects sensitivity to optimization randomness rather than uncertainty from deeper sources. To address this, we define EU […]

Ver mais

Like 0

Liked Liked

technocracy

Tech workers urge DOD, Congress to withdraw Anthropic label as a supply chain risk

digitado ⋅ 2 de March de 2026

Tech workers have signed an open letter urging the Department of War to withdraw its designation of Anthropic as a “supply chain risk” and instead to settle the matter quietly.

Ver mais

Like 0

Liked Liked

technocracy

A Few-Shot LLM Framework for Extreme Day Classification in Electricity Markets

digitado ⋅ 20 de February de 2026

arXiv:2602.16735v1 Announce Type: new Abstract: This paper proposes a few-shot classification framework based on Large Language Models (LLMs) to predict whether the next day will have spikes in real-time electricity prices. The approach aggregates system state information, including electricity demand, renewable generation, weather forecasts, and recent electricity prices, into a set of statistical features that are formatted as natural-language prompts and fed to an LLM along with general instructions. The model then determines the likelihood that the next […]

Ver mais

Like 0

Liked Liked