January 2026

TRYLOCK: Defense-in-Depth Against LLM Jailbreaks via Layered Preference and Representation Engineering

digitado ⋅ 8 de January de 2026

arXiv:2601.03300v1 Announce Type: new Abstract: Large language models remain vulnerable to jailbreak attacks, and single-layer defenses often trade security for usability. We present TRYLOCK, the first defense-in-depth architecture that combines four heterogeneous mechanisms across the inference stack: weight-level safety alignment via DPO, activation-level control via Representation Engineering (RepE) steering, adaptive steering strength selected by a lightweight sidecar classifier, and input canonicalization to neutralize encoding-based bypasses. On Mistral-7B-Instruct evaluated against a 249-prompt attack set spanning five attack families, TRYLOCK […]

Ver mais

Like 0

Liked Liked

technocracy

130k Lines of Formal Topology in Two Weeks: Simple and Cheap Autoformalization for Everyone?

digitado ⋅ 8 de January de 2026

arXiv:2601.03298v1 Announce Type: new Abstract: This is a brief description of a project that has already autoformalized a large portion of the general topology from the Munkres textbook (which has in total 241 pages in 7 chapters and 39 sections). The project has been running since November 21, 2025 and has as of January 4, 2026, produced 160k lines of formalized topology. Most of it (about 130k lines) have been done in two weeks,from December 22 to January […]

Ver mais

Like 0

Liked Liked

technocracy

AgentMark: Utility-Preserving Behavioral Watermarking for Agents

digitado ⋅ 8 de January de 2026

arXiv:2601.03294v1 Announce Type: new Abstract: LLM-based agents are increasingly deployed to autonomously solve complex tasks, raising urgent needs for IP protection and regulatory provenance. While content watermarking effectively attributes LLM-generated outputs, it fails to directly identify the high-level planning behaviors (e.g., tool and subgoal choices) that govern multi-step execution. Critically, watermarking at the planning-behavior layer faces unique challenges: minor distributional deviations in decision-making can compound during long-term agent operation, degrading utility, and many agents operate as black boxes […]

Ver mais

Like 0

Liked Liked

technocracy

Lightweight Transformer Architectures for Edge Devices in Real-Time Applications

digitado ⋅ 8 de January de 2026

arXiv:2601.03290v1 Announce Type: new Abstract: The deployment of transformer-based models on resource-constrained edge devices represents a critical challenge in enabling real-time artificial intelligence applications. This comprehensive survey examines lightweight transformer architectures specifically designed for edge deployment, analyzing recent advances in model compression, quantization, pruning, and knowledge distillation techniques. We systematically review prominent lightweight variants including MobileBERT, TinyBERT, DistilBERT, EfficientFormer, EdgeFormer, and MobileViT, providing detailed performance benchmarks on standard datasets such as GLUE, SQuAD, ImageNet-1K, and COCO. Our analysis […]

Ver mais

Like 0

Liked Liked

technocracy

Differentiation Between Faults and Cyberattacks through Combined Analysis of Cyberspace Logs and Physical Measurements

digitado ⋅ 8 de January de 2026

arXiv:2601.03289v1 Announce Type: new Abstract: In recent years, cyberattacks – along with physical faults – have become an increasing factor causing system failures, especially in DER (Distributed Energy Resources) systems. In addition, according to the literature, a number of faults have been reported to remain undetected. Consequently, unlike anomaly detection works that only identify abnormalities, differentiating undetected faults and cyberattacks is a challenging task. Although several works have studied this problem, they crucially fall short of achieving an […]

Ver mais

Like 0

Liked Liked

technocracy

How Real is Your Jailbreak? Fine-grained Jailbreak Evaluation with Anchored Reference

digitado ⋅ 8 de January de 2026

arXiv:2601.03288v1 Announce Type: new Abstract: Jailbreak attacks present a significant challenge to the safety of Large Language Models (LLMs), yet current automated evaluation methods largely rely on coarse classifications that focus mainly on harmfulness, leading to substantial overestimation of attack success. To address this problem, we propose FJAR, a fine-grained jailbreak evaluation framework with anchored references. We first categorized jailbreak responses into five fine-grained categories: Rejective, Irrelevant, Unhelpful, Incorrect, and Successful, based on the degree to which the […]

Ver mais

Like 0

Liked Liked

technocracy

Automated Post-Incident Policy Gap Analysis via Threat-Informed Evidence Mapping using Large Language Models

digitado ⋅ 8 de January de 2026

arXiv:2601.03287v1 Announce Type: new Abstract: Cybersecurity post-incident reviews are essential for identifying control failures and improving organisational resilience, yet they remain labour-intensive, time-consuming, and heavily reliant on expert judgment. This paper investigates whether Large Language Models (LLMs) can augment post-incident review workflows by autonomously analysing system evidence and identifying security policy gaps. We present a threat-informed, agentic framework that ingests log data, maps observed behaviours to the MITRE ATT&CK framework, and evaluates organisational security policies for adequacy and […]

Ver mais

Like 0

Liked Liked

technocracy

HyperCLOVA X 32B Think

digitado ⋅ 8 de January de 2026

arXiv:2601.03286v1 Announce Type: new Abstract: In this report, we present HyperCLOVA X 32B Think, a vision-language model designed with particular emphasis on reasoning within the Korean linguistic and cultural context, as well as agentic ability. HyperCLOVA X 32B Think is pre-trained with a strong focus on reasoning capabilities and subsequently post-trained to support multimodal understanding, enhanced reasoning, agentic behaviors, and alignment with human preferences. Experimental evaluations against comparably sized models demonstrate that our model achieves strong performance on […]

Ver mais

Like 0

Liked Liked

technocracy

Battery-time-space fragment-based formulation for the Electric Autonomous Dial-a-Ride Problem

digitado ⋅ 8 de January de 2026

arXiv:2601.03282v1 Announce Type: new Abstract: The Electric Autonomous Dial-A-Ride Problem (E-ADARP) optimizes routing and scheduling for electric autonomous vehicles to transport customers from origins to destinations. It features a combined objective that minimizes travel cost and excess user ride time, and allows partial recharging. Motivated by practical scenarios where time and battery data are available with limited precision, we introduce a discrete variant of the problem, termed D-E-ADARP, in which all time and battery parameters are discretized. This […]

Ver mais

Like 0

Liked Liked

technocracy

$alpha^3$-Bench: A Unified Benchmark of Safety, Robustness, and Efficiency for LLM-Based UAV Agents over 6G Networks

digitado ⋅ 8 de January de 2026

arXiv:2601.03281v1 Announce Type: new Abstract: Large Language Models (LLMs) are increasingly used as high level controllers for autonomous Unmanned Aerial Vehicle (UAV) missions. However, existing evaluations rarely assess whether such agents remain safe, protocol compliant, and effective under realistic next generation networking constraints. This paper introduces $alpha^3$-Bench, a benchmark for evaluating LLM driven UAV autonomy as a multi turn conversational reasoning and control problem operating under dynamic 6G conditions. Each mission is formulated as a language mediated control […]

Ver mais

Like 0

Liked Liked