digitado – Page 495

Domain-level metacognitive monitoring in frontier LLMs: A 33-model atlas

digitado ⋅ 12 de May de 2026

arXiv:2605.06673v1 Announce Type: new Abstract: Aggregate metacognitive quality scores mask within-model variation across MMLU benchmark domains. We administered 1,500 MMLU items (250 per domain, under an a priori six-domain grouping) to 33 frontier LLMs from eight model families and computed Type-2 AUROC per model-domain cell using verbalized confidence (0-100). Total observations: 47,151. Every model with above-chance aggregate monitoring showed non-trivial domain-level variation. Applied/Professional knowledge was reliably the easiest benchmark domain to monitor (mean AUROC = .742, ranked top-2 […]

Ver mais

Like 0

Liked Liked

technocracy

How to Master Any Skill: Explaining the Biological Shortcut

digitado ⋅ 14 de March de 2026

You don’t walk up stairs in real-time. If you did, you’d be slow—each step requiring conscious processing of elevation, texture, muscle tension, and balance. Instead, your brain generates a high-resolution prediction of the next stair before your foot lands.1 You only become conscious of it when the simulation fails. When your foot hits floor sooner than expected. Or when you find empty air where a step should be. It’s a prediction error that jolts you back to consciousness […]

Ver mais

Like 0

Liked Liked

technocracy

Mimosa Framework: Toward Evolving Multi-Agent Systems for Scientific Research

digitado ⋅ 1 de April de 2026

arXiv:2603.28986v1 Announce Type: new Abstract: Current Autonomous Scientific Research (ASR) systems, despite leveraging large language models (LLMs) and agentic architectures, remain constrained by fixed workflows and toolsets that prevent adaptation to evolving tasks and environments. We introduce Mimosa, an evolving multi-agent framework that automatically synthesizes task-specific multi-agent workflows and iteratively refines them through experimental feedback. Mimosa leverages the Model Context Protocol (MCP) for dynamic tool discovery, generates workflow topologies via a meta-orchestrator, executes subtasks through code-generating agents that […]

Ver mais

Like 0

Liked Liked

technocracy

CoDCL: Counterfactual Data Augmentation Contrastive Learning for Continuous-Time Dynamic Network Link Prediction

digitado ⋅ 30 de January de 2026

The rapid growth and continuous structural evolution of dynamic networks make effective predictions increasingly challenging. To enable prediction models to adapt to complex temporal environments, they need to be robust to emerging structural changes. We propose a dynamic network learning framework CoDCL, which combines counterfactual data augmentation with contrastive learning to address this deficiency.Furthermore, we devise a comprehensive strategy to generate high-quality counterfactual data, combining a dynamic treatments design with efficient structural neighborhood exploration to quantify the temporal […]

Ver mais

Like 0

Liked Liked

technocracy

Tracking Drift: Variation-Aware Entropy Scheduling for Non-Stationary Reinforcement Learning

digitado ⋅ 27 de January de 2026

Real-world reinforcement learning often faces environment drift, but most existing methods rely on static entropy coefficients/target entropy, causing over-exploration during stable periods and under-exploration after drift (thus slow recovery), and leaving unanswered the principled question of how exploration intensity should scale with drift magnitude. We prove that entropy scheduling under non-stationarity can be reduced to a one-dimensional, round-by-round trade-off, faster tracking of the optimal solution after drift vs. avoiding gratuitous randomness when the environment is stable, so exploration […]

Ver mais

Like 0

Liked Liked

technocracy

Head-wise Modality Specialization within MLLMs for Robust Fake News Detection under Missing Modality

digitado ⋅ 15 de April de 2026

arXiv:2604.09711v1 Announce Type: new Abstract: Multimodal fake news detection (MFND) aims to verify news credibility by jointly exploiting textual and visual evidence. However, real-world news dissemination frequently suffers from missing modality due to deleted images, corrupted screenshots, and similar issues. Thus, robust detection in this scenario requires preserving strong verification ability for each modality, which is challenging in MFND due to insufficient learning of the low-contribution modality and scarce unimodal annotations. To address this issue, we propose Head-wise […]

Ver mais

Like 0

Liked Liked

technocracy

Attention Isn’t All You Need for Emotion Recognition:Domain Features Outperform Transformers on the EAV Dataset

digitado ⋅ 2 de February de 2026

arXiv:2601.22161v1 Announce Type: new Abstract: We present a systematic study of multimodal emotion recognition using the EAV dataset, investigating whether complex attention mechanisms improve performance on small datasets. We implement three model categories: baseline transformers (M1), novel factorized attention mechanisms (M2), and improved CNN baselines (M3). Our experiments show that sophisticated attention mechanisms consistently underperform on small datasets. M2 models achieved 5 to 13 percentage points below baselines due to overfitting and destruction of pretrained features. In contrast, […]

Ver mais

Like 0

Liked Liked

technocracy

Validating “Streaming Deep RL Finally Works” on 433k Observations of Real Attack Traffic

digitado ⋅ 13 de February de 2026

I’m learning the foundations of RL in alignment with the Alberta Plan for AI research and have been running through sets of experiments to both learn and experiment. To that end I spent the last month validating different methods for streaming deep RL on a non-stationary, adversarial dataset of real SSH honeypot observations. This work focuses on prediction and is in line with steps 1 & 2 of the Alberta Plan (Sutton, Bowling, & Pilarski 2022). After implementing […]

Ver mais

Like 0

Liked Liked

technocracy

Nuisance Function Tuning and Sample Splitting for Optimally Estimating a Doubly Robust Functional

digitado ⋅ 10 de March de 2026

arXiv:2212.14857v4 Announce Type: replace-cross Abstract: Estimators of doubly robust functionals typically rely on estimating two complex nuisance functions, such as the propensity score and conditional outcome mean for the average treatment effect functional. We consider the problem of how to estimate nuisance functions to obtain optimal rates of convergence for a doubly robust nonparametric functional that has witnessed applications across the causal inference and conditional independence testing literature. For several plug-in estimators and a first-order bias-corrected estimator, we […]

Ver mais

Like 0

Liked Liked

technocracy

LongAct: Harnessing Intrinsic Activation Patterns for Long-Context Reinforcement Learning

digitado ⋅ 16 de April de 2026

Reinforcement Learning (RL) has emerged as a critical driver for enhancing the reasoning capabilities of Large Language Models (LLMs). While recent advancements have focused on reward engineering or data synthesis, few studies exploit the model’s intrinsic representation characteristics to guide the training process. In this paper, we first observe the presence of high-magnitude activations within the query and key vectors when processing long contexts. Drawing inspiration from model quantization — which establishes the criticality of such high-magnitude activations […]

Ver mais

Like 0

Liked Liked