technocracy

Evaluation of Agents under Simulated AI Marketplace Dynamics

digitado ⋅ 18 de April de 2026

arXiv:2604.14256v1 Announce Type: new Abstract: Modern information access ecosystems consist of mixtures of systems, such as retrieval systems and large language models, and increasingly rely on marketplaces to mediate access to models, tools, and data, making competition between systems inherent to deployment. In such settings, outcomes are shaped not only by benchmark quality but also by competitive pressure, including user switching, routing decisions, and operational constraints. Yet evaluation is still largely conducted on static benchmarks with accuracy-focused measures […]

Ver mais

Like 0

Liked Liked

technocracy

A Jointly Efficient and Optimal Algorithm for Heteroskedastic Generalized Linear Bandits with Adversarial Corruptions

digitado ⋅ 12 de February de 2026

arXiv:2602.10971v1 Announce Type: cross Abstract: We consider the problem of heteroskedastic generalized linear bandits (GLBs) with adversarial corruptions, which subsumes various stochastic contextual bandit settings, including heteroskedastic linear bandits and logistic/Poisson bandits. We propose HCW-GLB-OMD, which consists of two components: an online mirror descent (OMD)-based estimator and Hessian-based confidence weights to achieve corruption robustness. This is computationally efficient in that it only requires ${O}(1)$ space and time complexity per iteration. Under the self-concordance assumption on the link function, […]

Ver mais

Like 0

Liked Liked

technocracy

Learning Centre Partitions from Summaries

digitado ⋅ 9 de March de 2026

arXiv:2509.16337v2 Announce Type: replace-cross Abstract: Multi-centre studies increasingly rely on distributed inference, where sites share only centre-level summaries. Homogeneity of parameters across centres is often violated, motivating methods that both emph{test} for equality and emph{learn} centre groupings before estimation. We develop multivariate Cochran-type tests that operate on summary statistics and embed them in a sequential, test-driven emph{Clusters-of-Centres (CoC)} algorithm that merges centres (or blocks) only when equality is not rejected. We derive the asymptotic $chi^2$-mixture distributions of the […]

Ver mais

Like 0

Liked Liked

technocracy

Mobile-Agent-v3.5: Multi-platform Fundamental GUI Agents

digitado ⋅ 20 de February de 2026

arXiv:2602.16855v1 Announce Type: new Abstract: The paper introduces GUI-Owl-1.5, the latest native GUI agent model that features instruct/thinking variants in multiple sizes (2B/4B/8B/32B/235B) and supports a range of platforms (desktop, mobile, browser, and more) to enable cloud-edge collaboration and real-time interaction. GUI-Owl-1.5 achieves state-of-the-art results on more than 20+ GUI benchmarks on open-source models: (1) on GUI automation tasks, it obtains 56.5 on OSWorld, 71.6 on AndroidWorld, and 48.4 on WebArena; (2) on grounding tasks, it obtains 80.3 […]

Ver mais

Like 0

Liked Liked

technocracy

Bounded Hyperbolic Tangent: A Stable and Efficient Alternative to Pre-Layer Normalization in Large Language Models

digitado ⋅ 16 de January de 2026

arXiv:2601.09719v1 Announce Type: new Abstract: Pre-Layer Normalization (Pre-LN) is the de facto choice for large language models (LLMs) and is crucial for stable pretraining and effective transfer learning. However, Pre-LN is inefficient due to repeated statistical calculations and suffers from the curse of depth. As layers grow, the magnitude and variance of the hidden state escalate, destabilizing training. Efficiency-oriented normalization-free methods such as Dynamic Tanh (DyT) improve speed but remain fragile at depth. To jointly address stability and […]

Ver mais

Like 0

Liked Liked

technocracy

LLM Guardrails and Safety in Production AI Systems

digitado ⋅ 14 de April de 2026

Last post covered evaluation, monitoring, and model degradation. This one covers guardrails — how you prevent LLMs from hallucinating, leaking data, following malicious instructions, or generating harmful content in production systems. LLMs generate probabilistic outputs. In healthcare, finance, or legal — any regulated domain — you can’t have the model hallucinating symptoms, giving medical advice it shouldn’t, or producing content that causes harm. Guardrails are the safety net between what the model generates and what reaches the user. The Layered Architecture No single guardrail catches […]

Ver mais

Like 0

Liked Liked

technocracy

NASA shakes up its Artemis program to speed up lunar return

digitado ⋅ 27 de February de 2026

NASA Administrator Jared Isaacman announced sweeping changes to the Artemis program on Friday morning, including an increased cadence of missions and cancellation of an expensive rocket stage. The upheaval comes as NASA has struggled to fuel the massive Space Launch System rocket for the upcoming Artemis II lunar mission, and Isaacman has sought to revitalize an agency that has moved at a glacial pace on its deep space programs. There is ever-increasing concern that, absent a shake-up, China’s […]

Ver mais

Like 0

Liked Liked

technocracy

Reinforcement Learning via Value Gradient Flow

digitado ⋅ 15 de April de 2026

We study behavior-regularized reinforcement learning (RL), where regularization toward a reference distribution (the dataset in offline RL or the base model in LLM RL finetuning) is essential to prevent value over-optimization caused by erroneous out-of-distribution extrapolation. Existing methods either rely on reparameterized policy gradient, which are difficult to scale to large generative models, or on reject sampling, which can be overly conservative when attempting to move beyond the behavior support. In this paper, we propose Value Gradient Flow […]

Ver mais

Like 0

Liked Liked

technocracy

Optimize video semantic search intent with Amazon Nova Model Distillation on Amazon Bedrock

digitado ⋅ 17 de April de 2026

Optimizing models for video semantic search requires balancing accuracy, cost, and latency. Faster, smaller models lack routing intelligence, while larger, accurate models add significant latency overhead. In Part 1 of this series, we showed how to build a multimodal video semantic search system on AWS with intelligent intent routing using the Anthropic Claude Haiku model in Amazon Bedrock. While the Haiku model delivers strong accuracy for user search intent, it increases end-to-end search time to 2-4 seconds. This […]

Ver mais

Like 0

Liked Liked

technocracy

Feature-Space Generative Models for One-Shot Class-Incremental Learning

digitado ⋅ 27 de January de 2026

arXiv:2601.17905v1 Announce Type: cross Abstract: Few-shot class-incremental learning (FSCIL) is a paradigm where a model, initially trained on a dataset of base classes, must adapt to an expanding problem space by recognizing novel classes with limited data. We focus on the challenging FSCIL setup where a model receives only a single sample (1-shot) for each novel class and no further training or model alterations are allowed after the base training phase. This makes generalization to novel classes particularly […]

Ver mais

Like 0

Liked Liked