March 2026

MSA: Memory Sparse Attention for Efficient End-to-End Memory Model Scaling to 100M Tokens

digitado ⋅ 26 de March de 2026

arXiv:2603.23516v1 Announce Type: new Abstract: Long-term memory is a cornerstone of human intelligence. Enabling AI to process lifetime-scale information remains a long-standing pursuit in the field. Due to the constraints of full-attention architectures, the effective context length of large language models (LLMs) is typically limited to 1M tokens. Existing approaches, such as hybrid linear attention, fixed-size memory states (e.g., RNNs), and external storage methods like RAG or agent systems, attempt to extend this limit. However, they often suffer […]

Ver mais

Like 0

Liked Liked

technocracy

Training a Large Language Model for Medical Coding Using Privacy-Preserving Synthetic Clinical Data

digitado ⋅ 26 de March de 2026

arXiv:2603.23515v1 Announce Type: new Abstract: Improving the accuracy and reliability of medical coding reduces clinician burnout and supports revenue cycle processes, freeing providers to focus more on patient care. However, automating the assignment of ICD-10-CM and CPT codes from clinical documentation remains a challenge due to heterogeneous records, nuanced coding guidelines, and long-tail distributions. Large language models have been proposed to help or automate specific medical coding tasks. However, foundation models are not explicitly trained for medical coding […]

Ver mais

Like 0

Liked Liked

technocracy

DepthCharge: A Domain-Agnostic Framework for Measuring Depth-Dependent Knowledge in Large Language Models

digitado ⋅ 26 de March de 2026

arXiv:2603.23514v1 Announce Type: new Abstract: Large Language Models appear competent when answering general questions but often fail when pushed into domain-specific details. No existing methodology provides an out-of-the-box solution for measuring how deeply LLMs can sustain accurate responses under adaptive follow-up questioning across arbitrary domains. We present DepthCharge, a domain-agnostic framework that measures knowledge depth through three innovations: adaptive probing that generates follow-up questions based on concepts the model actually mentions, on-demand fact verification from authoritative sources, and […]

Ver mais

Like 0

Liked Liked

technocracy

Berta: an open-source, modular tool for AI-enabled clinical documentation

digitado ⋅ 26 de March de 2026

arXiv:2603.23513v1 Announce Type: new Abstract: Commercial AI scribes cost $99-600 per physician per month, operate as opaque systems, and do not return data to institutional infrastructure, limiting organizational control over data governance, quality improvement, and clinical workflows. We developed Berta, an open-source modular scribe platform for AI-enabled clinical documentation, and deployed a customized implementation within Alberta Health Services (AHS) integrated with their existing Snowflake AI Data Cloud infrastructure. The system combines automatic speech recognition with large language models […]

Ver mais

Like 0

Liked Liked

technocracy

S-Path-RAG: Semantic-Aware Shortest-Path Retrieval Augmented Generation for Multi-Hop Knowledge Graph Question Answering

digitado ⋅ 26 de March de 2026

arXiv:2603.23512v1 Announce Type: new Abstract: We present S-Path-RAG, a semantic-aware shortest-path Retrieval-Augmented Generation framework designed to improve multi-hop question answering over large knowledge graphs. S-Path-RAG departs from one-shot, text-heavy retrieval by enumerating bounded-length, semantically weighted candidate paths using a hybrid weighted $k$-shortest, beam, and constrained random-walk strategy, learning a differentiable path scorer together with a contrastive path encoder and lightweight verifier, and injecting a compact soft mixture of selected path latents into a language model via cross-attention. The […]

Ver mais

Like 0

Liked Liked

technocracy

DISCO: Document Intelligence Suite for COmparative Evaluation

digitado ⋅ 26 de March de 2026

arXiv:2603.23511v1 Announce Type: new Abstract: Document intelligence requires accurate text extraction and reliable reasoning over document content. We introduce textbf{DISCO}, a emph{Document Intelligence Suite for COmparative Evaluation}, that evaluates optical character recognition (OCR) pipelines and vision-language models (VLMs) separately on parsing and question answering across diverse document types, including handwritten text, multilingual scripts, medical forms, infographics, and multi-page documents. Our evaluation shows that performance varies substantially across tasks and document characteristics, underscoring the need for complexity-aware approach selection. […]

Ver mais

Like 0

Liked Liked

technocracy

Visuospatial Perspective Taking in Multimodal Language Models

digitado ⋅ 26 de March de 2026

arXiv:2603.23510v1 Announce Type: new Abstract: As multimodal language models (MLMs) are increasingly used in social and collaborative settings, it is crucial to evaluate their perspective-taking abilities. Existing benchmarks largely rely on text-based vignettes or static scene understanding, leaving visuospatial perspective-taking (VPT) underexplored. We adapt two evaluation tasks from human studies: the Director Task, assessing VPT in a referential communication paradigm, and the Rotating Figure Task, probing perspective-taking across angular disparities. Across tasks, MLMs show pronounced deficits in Level […]

Ver mais

Like 0

Liked Liked

technocracy

Internal Safety Collapse in Frontier Large Language Models

digitado ⋅ 26 de March de 2026

arXiv:2603.23509v1 Announce Type: new Abstract: This work identifies a critical failure mode in frontier large language models (LLMs), which we term Internal Safety Collapse (ISC): under certain task conditions, models enter a state in which they continuously generate harmful content while executing otherwise benign tasks. We introduce TVD (Task, Validator, Data), a framework that triggers ISC through domain tasks where generating harmful content is the only valid completion, and construct ISC-Bench containing 53 scenarios across 8 professional disciplines. […]

Ver mais

Like 0

Liked Liked

technocracy

Fast and Faithful: Real-Time Verification for Long-Document Retrieval-Augmented Generation Systems

digitado ⋅ 26 de March de 2026

arXiv:2603.23508v1 Announce Type: new Abstract: Retrieval-augmented generation (RAG) is increasingly deployed in enterprise search and document-centric assistants, where responses must be grounded in long and complex source materials. In practice, verifying that generated answers faithfully reflect retrieved documents is difficult: large language models can check long contexts but are too slow and costly for interactive services, while lightweight classifiers operate within strict context limits and frequently miss evidence outside truncated passages. We present the design of a real-time […]

Ver mais

Like 0

Liked Liked

technocracy

Beyond Masks: Efficient, Flexible Diffusion Language Models via Deletion-Insertion Processes

digitado ⋅ 26 de March de 2026

arXiv:2603.23507v1 Announce Type: new Abstract: While Masked Diffusion Language Models (MDLMs) relying on token masking and unmasking have shown promise in language modeling, their computational efficiency and generation flexibility remain constrained by the masking paradigm. In this paper, we propose Deletion-Insertion Diffusion language models (DID) that rigorously formulate token deletion and insertion as discrete diffusion processes, replacing the masking and unmasking processes in current MDLMs. DID improves training and inference efficiency by eliminating two major sources of computational […]

Ver mais

Like 0

Liked Liked