March 2026

Shattering the Shortcut: A Topology-Regularized Benchmark for Multi-hop Medical Reasoning in LLMs

digitado ⋅ 16 de March de 2026

arXiv:2603.12458v1 Announce Type: new Abstract: While Large Language Models (LLMs) achieve expert-level performance on standard medical benchmarks through single-hop factual recall, they severely struggle with the complex, multi-hop diagnostic reasoning required in real-world clinical settings. A primary obstacle is “shortcut learning”, where models exploit highly connected, generic hub nodes (e.g., “inflammation”) in knowledge graphs to bypass authentic micro-pathological cascades. To address this, we introduce ShatterMed-QA, a bilingual benchmark of 10,558 multi-hop clinical questions designed to rigorously evaluate deep […]

Ver mais

Like 0

Liked Liked

technocracy

Operationalising Cyber Risk Management Using AI: Connecting Cyber Incidents to MITRE ATT&CK Techniques, Security Controls, and Metrics

digitado ⋅ 16 de March de 2026

arXiv:2603.12455v1 Announce Type: new Abstract: The escalating frequency of cyber-attacks poses significant challenges for organisations, particularly small enterprises constrained by limited in-house expertise, insufficient knowledge, and financial resources. This research presents a novel framework that leverages Natural Language Processing to address these challenges through automated mapping of cyber incidents to adversary techniques. We introduce the Cyber Catalog, a knowledge base that systematically integrates CIS Critical Security Controls, MITRE ATT&CK techniques, and SMART metrics. This integrated resource enables organisations […]

Ver mais

Like 0

Liked Liked

technocracy

CSE-UOI at SemEval-2026 Task 6: A Two-Stage Heterogeneous Ensemble with Deliberative Complexity Gating for Political Evasion Detection

digitado ⋅ 16 de March de 2026

arXiv:2603.12453v1 Announce Type: new Abstract: This paper describes our system for SemEval-2026 Task 6, which classifies clarity of responses in political interviews into three categories: Clear Reply, Ambivalent, and Clear Non-Reply. We propose a heterogeneous dual large language model (LLM) ensemble via self-consistency (SC) and weighted voting, and a novel post-hoc correction mechanism, Deliberative Complexity Gating (DCG). This mechanism uses cross-model behavioral signals and exploits the finding that an LLM response-length proxy correlates strongly with sample ambiguity. To […]

Ver mais

Like 0

Liked Liked

technocracy

Overcoming the Modality Gap in Context-Aided Forecasting

digitado ⋅ 16 de March de 2026

arXiv:2603.12451v1 Announce Type: new Abstract: Context-aided forecasting (CAF) holds promise for integrating domain knowledge and forward-looking information, enabling AI systems to surpass traditional statistical methods. However, recent empirical studies reveal a puzzling gap: multimodal models often fail to outperform their unimodal counterparts. We hypothesize that this underperformance stems from poor context quality in existing datasets, as verification is challenging. To address these limitations, we introduce a semi-synthetic data augmentation method that generates contexts both descriptive of temporal dynamics […]

Ver mais

Like 0

Liked Liked

technocracy

Bridging the Gap Between Security Metrics and Key Risk Indicators: An Empirical Framework for Vulnerability Prioritization

digitado ⋅ 16 de March de 2026

arXiv:2603.12450v1 Announce Type: new Abstract: Organisations overwhelmingly prioritize vulnerability remediation using Common Vulnerability Scoring System (CVSS) severity scores, yet CVSS classifiers achieve an Area Under the Precision-Recall Curve (AUPRC) of 0.011 on real-world exploitation data, near random chance. We propose a composite Key Risk Indicator grounded in expected-loss decomposition, integrating dimensions of threat, impact, and exposure. We evaluated the KRI framework against the Known Exploited Vulnerabilities (KEV) catalog using a comprehensive dataset of 280,694 Common Vulnerabilities and Exposures […]

Ver mais

Like 0

Liked Liked

technocracy

RadEar: A Self-Supervised RF Backscatter System for Voice Eavesdropping and Separation

digitado ⋅ 16 de March de 2026

arXiv:2603.12446v1 Announce Type: new Abstract: Eavesdropping on voice conversations presents a growing threat to personal privacy and information security. In this paper, we present RadEar, a novel RF backscatter-based system designed to enable covert voice eavesdropping through walls. RadEar consists of two key components: (i) a batteryless RF backscatter tag covertly deployed inside the target space, and (ii) an RF reader located outside the room that performs signal demodulation, voice separation, and denoising. The tag features a compact, […]

Ver mais

Like 0

Liked Liked

technocracy

KernelFoundry: Hardware-aware evolutionary GPU kernel optimization

digitado ⋅ 16 de March de 2026

arXiv:2603.12440v1 Announce Type: new Abstract: Optimizing GPU kernels presents a significantly greater challenge for large language models (LLMs) than standard code generation tasks, as it requires understanding hardware architecture, parallel optimization strategies, and performance profiling outputs. Most existing LLM-based approaches to kernel generation rely on simple prompting and feedback loops, incorporating hardware awareness only indirectly through profiling feedback. We introduce KernelFoundry, an evolutionary framework that efficiently explores the GPU kernel design space through three key mechanisms: (1) MAP-Elites […]

Ver mais

Like 0

Liked Liked

technocracy

Compensation of Input/Output Delays for Retarded Systems by Sequential Predictors: A Lyapunov-Halanay Method

digitado ⋅ 16 de March de 2026

arXiv:2603.12439v1 Announce Type: new Abstract: This paper presents a Lyapunov-Halanay method to study global asymptotic stabilization (GAS) of nonlinear retarded systems subject to large constant delays in input/output – a challenging problem due to their inherent destabilizing effects. Under the conditions of global Lipschitz continuity (GLC) and global exponential stabilizability (GES) of the retarded system without input delay, a state feedback controller is designed based on sequential predictors to make the closed-loop retarded system GAS. Moreover, if the […]

Ver mais

Like 0

Liked Liked

technocracy

DiscoRD: An Experimental Methodology for Quickly Discovering the Reliable Read Disturbance Threshold of Real DRAM Chips

digitado ⋅ 16 de March de 2026

arXiv:2603.12435v1 Announce Type: new Abstract: State-of-the-art DRAM read disturbance mitigations rely on the read disturbance threshold (RDT) (e.g., the number of aggressor row activations needed to induce the first read disturbance bitflip) to securely and performance- and energy-efficiently prevent read disturbance bitflips. However, accurately and exhaustively characterizing the RDT of every DRAM row in a chip is time intensive. Rapidly determining RDT is important for enabling secure, performance- and energy-efficient systems. Our goal is to develop and evaluate […]

Ver mais

Like 0

Liked Liked

technocracy

Surg-R1: A Hierarchical Reasoning Foundation Model for Scalable and Interpretable Surgical Decision Support with Multi-Center Clinical Validation

digitado ⋅ 16 de March de 2026

arXiv:2603.12430v1 Announce Type: new Abstract: Surgical scene understanding demands not only accurate predictions but also interpretable reasoning that surgeons can verify against clinical expertise. However, existing surgical vision-language models generate predictions without reasoning chains, and general-purpose reasoning models fail on compositional surgical tasks without domain-specific knowledge. We present Surg-R1, a surgical Vision-Language Model that addresses this gap through hierarchical reasoning trained via a four-stage pipeline. Our approach introduces three key contributions: (1) a three-level reasoning hierarchy decomposing surgical […]

Ver mais

Like 0

Liked Liked