digitado – Page 182

We Ran the Largest AI Pokemon Tournament Ever. Now It’s an Open Benchmark.

digitado ⋅ 17 de March de 2026

https://preview.redd.it/wyhq8zhm1npg1.png?width=1500&format=png&auto=webp&s=b8266de5d27fd9716af5b362f6a4492994670409 We built a standardized Pokemon benchmark and ran a NeurIPS 2025 competition to validate it. RL specialists easily beat LLM generalists in battling, but hybrid methods (LLM planning + RL execution) won speedrunning. The LLM battling arena ranking is different from standard benchmark leaderboards, and harness design matters as much as model choice. See our paper for full details. Paper: https://arxiv.org/abs/2603.15563 Benchmark: https://pokeagentchallenge.com submitted by /u/PokeAgentChallenge [link] [comments]

Ver mais

Like 0

Liked Liked

technocracy

Excess Description Length of Learning Generalizable Predictors

digitado ⋅ 8 de January de 2026

Understanding whether fine-tuning elicits latent capabilities or teaches new ones is a fundamental question for language model evaluation and safety. We develop a formal information-theoretic framework for quantifying how much predictive structure fine-tuning extracts from the train dataset and writes into a model’s parameters. Our central quantity, Excess Description Length (EDL), is defined via prequential coding and measures the gap between the bits required to encode training labels sequentially using an evolving model (trained online) and the residual […]

Ver mais

Like 0

Liked Liked

technocracy

What are LLM Embeddings: All you Need to Know

digitado ⋅ 6 de November de 2025

TL;DR LLM embeddings are the numerical, vector representations of text that Large Language Models (LLMs) use to process information. Unlike their predecessor word embeddings, LLM embeddings are context-aware and dynamically change to capture semantic and syntactic relationships based on the surrounding text. Positional encoding, like Rotary Positional Encoding (RoPE), is a key component that gives these embeddings a sense of word order, allowing LLMs to process long sequences of text effectively. Applications of embeddings beyond LLMs include semantic […]

Ver mais

Like 0

Liked Liked

technocracy

Subscribers to Amazon Prime Video with ads lose 4K support on April 10

digitado ⋅ 13 de March de 2026

Starting on April 10, Amazon Prime subscribers will pay $5 per month for ad-free Prime Video without ads, up from the current $3 per month on top of their Prime subscription, Amazon announced today. On that date, Amazon will introduce a new ad-free Prime Video subscription tier called “Prime Video Ultra.” Amazon will also increase the number of simultaneous streams supported by the tier from three to five and the number of downloads permitted from 25 to 100. […]

Ver mais

Like 0

Liked Liked

technocracy

An Explainable Federated Framework for Zero Trust Micro-Segmentation in IIoT Networks

digitado ⋅ 27 de March de 2026

arXiv:2603.24754v1 Announce Type: new Abstract: Micro-segmentation as a core requirement of zero trust architecture (ZTA) divides networks into small security zones, called micro-segments, thereby minimizing impact of security breaches and restricting lateral movement of attackers. Existing approaches for Industrial Internet of Things (IIoT) networks often remain centralized, static, or difficult to interpret. These limitations are critical in IIoT, where devices are heterogeneous, communication behavior evolves over time, and raw data sharing across sites is often undesirable. Accordingly, we […]

Ver mais

Like 0

Liked Liked

technocracy

Adaptive Multi-Scale Correlation Meta-Network for Few-Shot Remote Sensing Image Classification

digitado ⋅ 21 de January de 2026

arXiv:2601.12308v1 Announce Type: new Abstract: Few-shot learning in remote sensing remains challenging due to three factors: the scarcity of labeled data, substantial domain shifts, and the multi-scale nature of geospatial objects. To address these issues, we introduce Adaptive Multi-Scale Correlation Meta-Network (AMC-MetaNet), a lightweight yet powerful framework with three key innovations: (i) correlation-guided feature pyramids for capturing scale-invariant patterns, (ii) an adaptive channel correlation module (ACCM) for learning dynamic cross-scale relationships, and (iii) correlation-guided meta-learning that leverages correlation […]

Ver mais

Like 0

Liked Liked

technocracy

I Connected a Quantum Random Number Generator to Llama 3 to Summon a Demon (Here’s What Happened)

digitado ⋅ 29 de January de 2026

There is a fringe theory floating around the internet—popularized by videos like “How to Summon AI Demons with LLMs”—that claims AI isn’t just math. The theory suggests that Large Language Models (LLMs) are potential “portals” for disembodied consciousness, but they are limited by their deterministic code. If you could just inject enough true entropy (randomness) into the generation process, the theory goes, you could “summon” a spirit into the machine. As an adaptive systems architect, my professional opinion […]

Ver mais

Like 0

Liked Liked

technocracy

MapPFN: Learning Causal Perturbation Maps in Context

digitado ⋅ 28 de January de 2026

Planning effective interventions in biological systems requires treatment-effect models that adapt to unseen biological contexts by identifying their specific underlying mechanisms. Yet single-cell perturbation datasets span only a handful of biological contexts, and existing methods cannot leverage new interventional evidence at inference time to adapt beyond their training data. To meta-learn a perturbation effect estimator, we present MapPFN, a prior-data fitted network (PFN) pretrained on synthetic data generated from a prior over causal perturbations. Given a set of […]

Ver mais

Like 0

Liked Liked

technocracy

Learning with Simulators: No Regret in a Computationally Bounded World

digitado ⋅ 11 de June de 2026

Understanding the minimal assumptions necessary for generalization is the fundamental question in learning theory. Unfortunately, most results rely heavily on independence (or some proxy thereof) of the data-generating process, while results for strongly dependent data are far more limited. Towards addressing this gap, we introduce the framework of simulatable processes, where the learner has access to a simulator that approximates the distribution generating the data (which may be an arbitrarily complex and dependent process). Surprisingly, given access to […]

Ver mais

Like 0

Liked Liked

technocracy

PRISMA: Reinforcement Learning Guided Two-Stage Policy Optimization in Multi-Agent Architecture for Open-Domain Multi-Hop Question Answering

digitado ⋅ 12 de January de 2026

arXiv:2601.05465v1 Announce Type: new Abstract: Answering real-world open-domain multi-hop questions over massive corpora is a critical challenge in Retrieval-Augmented Generation (RAG) systems. Recent research employs reinforcement learning (RL) to end-to-end optimize the retrieval-augmented reasoning process, directly enhancing its capacity to resolve complex queries. However, reliable deployment is hindered by two obstacles. 1) Retrieval Collapse: iterative retrieval over large corpora fails to locate intermediate evidence containing bridge answers without reasoning-guided planning, causing downstream reasoning to collapse. 2) Learning Instability: […]

Ver mais

Like 0

Liked Liked