March 2026

Locally Coherent Parallel Decoding in Diffusion Language Models

digitado ⋅ 24 de March de 2026

arXiv:2603.20216v1 Announce Type: new Abstract: Diffusion language models (DLMs) have emerged as a promising alternative to autoregressive (AR) models, offering sub-linear generation latency and bidirectional capabilities that are particularly appealing for code generation and editing. Achieving sub-linear latency in discrete DLMs requires predicting multiple tokens in parallel. However, standard DLMs sample tokens independently from conditional marginal distributions, failing to capture the joint dependencies among concurrently generated tokens. As a result, they often lead to syntactic inconsistencies and break […]

Ver mais

Like 0

Liked Liked

technocracy

Multi-Agent Debate with Memory Masking

digitado ⋅ 24 de March de 2026

arXiv:2603.20215v1 Announce Type: new Abstract: Large language models (LLMs) have recently demonstrated impressive capabilities in reasoning tasks. Currently, mainstream LLM reasoning frameworks predominantly focus on scaling up inference-time sampling to enhance performance. In particular, among all LLM reasoning frameworks, *multi-agent debate* (MAD), which employs multiple LLMs as agents to perform reasoning in the way of multi-round debate, has emerged as a powerful reasoning paradigm since it allows agents to access previous memories to alleviate fallacious content and refine […]

Ver mais

Like 0

Liked Liked

technocracy

Beyond Detection: Governing GenAI in Academic Peer Review as a Sociotechnical Challenge

digitado ⋅ 24 de March de 2026

arXiv:2603.20214v1 Announce Type: new Abstract: Generative AI tools are increasingly entering academic peer review workflows, raising questions about fairness, accountability, and the legitimacy of evaluative judgment. While these systems promise efficiency gains amid growing reviewer overload, their use introduces new sociotechnical risks. This paper presents a convergent mixed-method study combining discourse analysis of 448 social media posts with interviews with 14 area chairs and program chairs from leading AI and HCI conferences to examine how GenAI is discussed […]

Ver mais

Like 0

Liked Liked

technocracy

AgenticGEO: A Self-Evolving Agentic System for Generative Engine Optimization

digitado ⋅ 24 de March de 2026

arXiv:2603.20213v1 Announce Type: new Abstract: Generative search engines represent a transition from traditional ranking-based retrieval to Large Language Model (LLM)-based synthesis, transforming optimization goals from ranking prominence towards content inclusion. Generative Engine Optimization (GEO), specifically, aims to maximize visibility and attribution in black-box summarized outputs by strategically manipulating source content. However, existing methods rely on static heuristics, single-prompt optimization, or engine preference rule distillation that is prone to overfitting. They cannot flexibly adapt to diverse content or the […]

Ver mais

Like 0

Liked Liked

technocracy

Fast-Slow Thinking RM: Efficient Integration of Scalar and Generative Reward Models

digitado ⋅ 24 de March de 2026

arXiv:2603.20212v1 Announce Type: new Abstract: Reward models (RMs) are critical for aligning Large Language Models via Reinforcement Learning from Human Feedback (RLHF). While Generative Reward Models (GRMs) achieve superior accuracy through chain-of-thought (CoT) reasoning, they incur substantial computational costs. Conversely, Scalar Reward Models (SRMs) offer efficiency but suffer from limited performance and adaptability in complex scenarios. We introduce Fast-Slow Thinking Reward Models (F/S-RM), a hybrid RM architecture inspired by Dual Process Theory. It trains a single model to […]

Ver mais

Like 0

Liked Liked

technocracy

Exploring Teacher-Chatbot Interaction and Affect in Block-Based Programming

digitado ⋅ 24 de March de 2026

arXiv:2603.20211v1 Announce Type: new Abstract: AI-based chatbots have the potential to accelerate learning and teaching, but may also have counterproductive consequences without thoughtful design and scaffolding. To better understand teachers’ perspectives on large language model (LLM)-based chatbots, we conducted a study with 11 teams of middle school teachers using chatbots for a science and computational thinking activity within a block-based programming environment. Based on a qualitative analysis of audio transcripts and chatbot interactions, we propose three profiles: explorer, […]

Ver mais

Like 0

Liked Liked

technocracy

CRoCoDiL: Continuous and Robust Conditioned Diffusion for Language

digitado ⋅ 24 de March de 2026

arXiv:2603.20210v1 Announce Type: new Abstract: Masked Diffusion Models (MDMs) provide an efficient non-causal alternative to autoregressive generation but often struggle with token dependencies and semantic incoherence due to their reliance on discrete marginal distributions. We address these limitations by shifting the diffusion process into a continuous sentence-level semantic space. We propose CRoCoDiL (Continuous and Robust Conditioned Diffusion for Language), a unified fine-tuning approach that jointly trains an encoder-demasker architecture, grounding the MDM demasking in continuous latent representations. This […]

Ver mais

Like 0

Liked Liked

technocracy

Children’s Intelligence Tests Pose Challenges for MLLMs? KidGym: A 2D Grid-Based Reasoning Benchmark for MLLMs

digitado ⋅ 24 de March de 2026

arXiv:2603.20209v1 Announce Type: new Abstract: Multimodal Large Language Models (MLLMs) combine the linguistic strengths of LLMs with the ability to process multimodal data, enbaling them to address a broader range of visual tasks. Because MLLMs aim at more general, human-like competence than language-only models, we take inspiration from the Wechsler Intelligence Scales – an established battery for evaluating children by decomposing intelligence into interpretable, testable abilities. We introduce KidGym, a comprehensive 2D grid-based benchmark for assessing five essential […]

Ver mais

Like 0

Liked Liked

technocracy

RedacBench: Can AI Erase Your Secrets?

digitado ⋅ 24 de March de 2026

arXiv:2603.20208v1 Announce Type: new Abstract: Modern language models can readily extract sensitive information from unstructured text, making redaction — the selective removal of such information — critical for data security. However, existing benchmarks for redaction typically focus on predefined categories of data such as personally identifiable information (PII) or evaluate specific techniques like masking. To address this limitation, we introduce RedacBench, a comprehensive benchmark for evaluating policy-conditioned redaction across domains and strategies. Constructed from 514 human-authored texts spanning […]

Ver mais

Like 0

Liked Liked

technocracy

Enhancing Safety of Large Language Models via Embedding Space Separation

digitado ⋅ 24 de March de 2026

arXiv:2603.20206v1 Announce Type: new Abstract: Large language models (LLMs) have achieved impressive capabilities, yet ensuring their safety against harmful prompts remains a critical challenge. Recent work has revealed that the latent representations (embeddings) of harmful and safe queries in LLMs typically exhibit linear separability, a property that has been exploited to construct attacks by perturbing the embeddings of harmful queries towards the safe subspace. Motivated by this observation, we propose a representation-level fine-tuning approach, named Embedding Space Separation […]

Ver mais

Like 0

Liked Liked