May 2026

More Thinking, More Bias: Length-Driven Position Bias in Reasoning Models

digitado ⋅ 12 de May de 2026

arXiv:2605.06672v1 Announce Type: new Abstract: Chain-of-thought (CoT) reasoning and reasoning-tuned models such as DeepSeek-R1 are commonly assumed to reduce shallow heuristic biases by thinking carefully. We test this on position bias in multiple-choice QA and find a different story: within any reasoning-capable model, per-question position bias scales with the length of the reasoning trajectory. Across thirteen reasoning-mode configurations (two R1-distilled 7-8B models, two base models prompted with CoT, and DeepSeek-R1 at 671B) on MMLU, ARC-Challenge, and GPQA, twelve […]

Ver mais

Like 0

Liked Liked

technocracy

GraphDC: A Divide-and-Conquer Multi-Agent System for Scalable Graph Algorithm Reasoning

digitado ⋅ 12 de May de 2026

arXiv:2605.06671v1 Announce Type: new Abstract: Large Language Models (LLMs) have demonstrated strong potential for many mathematical problems. However, their performance on graph algorithmic tasks is still unsatisfying, since graphs are naturally more complex in topology and often require systematic multi-step reasoning, especially on larger graphs. Motivated by this gap, we propose GraphDC, a Divide-and-Conquer multi-agent framework for scalable graph algorithm reasoning. Specifically, inspired by Divide-and-Conquer design, GraphDC decomposes an input graph into smaller subgraphs, assigns each subgraph to […]

Ver mais

Like 0

Liked Liked

technocracy

Evaluating Prompt Injection Defenses for Educational LLM Tutors: Security-Usability-Latency Trade-offs

digitado ⋅ 12 de May de 2026

arXiv:2605.06669v1 Announce Type: new Abstract: Educational LLM tutors face a core AI alignment challenge: they must follow user intent while preserving pedagogical constraints and safety policies. We present an evaluation methodology for prompt-injection defenses in this setting, showing that guardrail design entails explicit trade-offs among adversarial robustness, benign-task usability, and response latency. We evaluate a domain-specific multi-layer safeguard pipeline combining deterministic pattern filters, structural validation, contextual sandboxing, and session-level behavioral checks. On a controlled holdout benchmark with 480 […]

Ver mais

Like 0

Liked Liked

technocracy

Euler function

digitado ⋅ 12 de May de 2026

This morning I wrote a post about the probability that a random matrix over a finite field is invertible. If the field has q elements and the matrix has dimensions n × n then the probability is In that post I made observation that p(q, n) converges very quickly as a function of n [1]. One way to see that the convergence is quick is to note that and John Baez pointed out in the comments that p(q, […]

Ver mais

Like 0

Liked Liked

technocracy

Thoughts on GitLab’s workforce reduction” and “structural and strategic decisions”

digitado ⋅ 12 de May de 2026

GitLab Act 2 There’s a lot going on in this announcement from GitLab about the “workforce reduction” and “structural and strategic decisions” they are making with respect to the agentic era. They’re “planning to reduce the number of countries by up to 30% where we have small teams”. One of the most interesting things about GitLab is that they have employees spread across a large number of countries – 18 are listed in their public employee handbook but […]

Ver mais

Like 0

Liked Liked

technocracy

Building and Training a Kimi-K2 Model Using DeepSeek-V3 Components

digitado ⋅ 12 de May de 2026

Home Table of Contents Building and Training a Kimi-K2 Model Using DeepSeek-V3 Components Kimi-K2 vs DeepSeek-V3: Key Architecture Differences in LLM Design Mixture of Experts Scaling in Kimi-K2: Model Size, Sparsity, and Efficiency Attention Head Optimization in Kimi-K2 for Efficient Long-Context LLMs MuonClip Optimizer: Stabilizing Large-Scale LLM Training in Kimi-K2 Token Efficiency in LLM Training: Why It Matters for Kimi-K2 Attention Logit Explosion in LLMs: Training Instability and Challenges QK-Clip: Preventing Attention Logit Explosion in Kimi-K2 Training Training […]

Ver mais

Like 0

Liked Liked

technocracy

Sakana AI and NVIDIA Introduce TwELL with CUDA Kernels for 20.5% Inference and 21.9% Training Speedup in LLMs

digitado ⋅ 12 de May de 2026

Scaling large language models (LLMs) is expensive. Every token processed during inference and every gradient computed during training flows through feedforward layers that account for over two-thirds of model parameters and more than 80% of total FLOPs in larger models. A team researchers from Sakana AI and NVIDIA have worked on a new research that directly targets this bottleneck — not by changing the architecture, but by making the computation inside feedforward layers significantly cheaper through unstructured sparsity. […]

Ver mais

Like 0

Liked Liked

technocracy

Meta and Stanford Researchers Propose Fast Byte Latent Transformer That Reduces Inference Memory Bandwidth by Over 50% Without Tokenization

digitado ⋅ 12 de May de 2026

A team of researchers from Meta, Stanford University, and the University of Washington have introduced three new methods that substantially accelerate generation in the Byte Latent Transformer (BLT) — a language model architecture that operates directly on raw bytes instead of tokens. Byte-Level Models Are Slow at Inference To understand what this new research solves, you need to understand the tradeoff at the center of byte-level language modeling. Most language models today work on tokens — chunks of […]

Ver mais

Like 0

Liked Liked

technocracy

Enabling privacy-preserving AI training on everyday devices

digitado ⋅ 12 de May de 2026

A new method developed by MIT researchers can accelerate a privacy-preserving artificial intelligence training method by about 81 percent. This advance could enable a wider array of resource-constrained edge devices, like sensors and smartwatches, to deploy more accurate AI models while keeping user data secure. The MIT researchers boosted the efficiency of a technique known as federated learning, which involves a network of connected devices that work together to train a shared AI model. In federated learning, the […]

Ver mais

Like 0

Liked Liked

technocracy

Building web search-enabled agents with Strands and Exa

digitado ⋅ 11 de May de 2026

This post is co written by Ishan Goswami and Nitya Sridhar from Exa. If you are building web search-enabled AI agents for research, fact-checking, or competitive intelligence, access to current and reliable information is critical. Most general-purpose search APIs are not designed for agent workflows. They return HTML-heavy pages and short snippets optimized for human browsing, not structured data that an agent can directly consume. As a result, developers often need to build additional layers, custom crawlers, parsers, […]

Ver mais

Like 0

Liked Liked