February 2026

GradingAttack: Attacking Large Language Models Towards Short Answer Grading Ability

digitado ⋅ 3 de February de 2026

arXiv:2602.00979v1 Announce Type: new Abstract: Large language models (LLMs) have demonstrated remarkable potential for automatic short answer grading (ASAG), significantly boosting student assessment efficiency and scalability in educational scenarios. However, their vulnerability to adversarial manipulation raises critical concerns about automatic grading fairness and reliability. In this paper, we introduce GradingAttack, a fine-grained adversarial attack framework that systematically evaluates the vulnerability of LLM based ASAG models. Specifically, we align general-purpose attack methods with the specific objectives of ASAG by […]

Ver mais

Like 0

Liked Liked

technocracy

Organismal Agency and Rapid Adaptation: The Phenopoiesis Algorithm for Phenotype-First Evolution

digitado ⋅ 3 de February de 2026

arXiv:2602.00978v1 Announce Type: new Abstract: Evolutionary success depends on the capacity to adapt: organisms must respond to environmental challenges through both genetic innovation and lifetime learning. The gene-centric paradigm attributes evolutionary causality exclusively to genes, while Denis Noble’s phenotype-first framework argues that organisms are active agents capable of interpreting genetic resources, learning from experience, and shaping their own development. However, this framework has remained philosophically intuitive but algorithmically opaque. We show for the first time that organismal agency […]

Ver mais

Like 0

Liked Liked

technocracy

Trust in One Round: Confidence Estimation for Large Language Models via Structural Signals

digitado ⋅ 3 de February de 2026

arXiv:2602.00977v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly deployed in domains where errors carry high social, scientific, or safety costs. Yet standard confidence estimators, such as token likelihood, semantic similarity and multi-sample consistency, remain brittle under distribution shift, domain-specialised text, and compute limits. In this work, we present Structural Confidence, a single-pass, model-agnostic framework that enhances output correctness prediction based on multi-scale structural signals derived from a model’s final-layer hidden-state trajectory. By combining spectral, local-variation, […]

Ver mais

Like 0

Liked Liked

technocracy

Forest-Guided Semantic Transport for Label-Supervised Manifold Alignment

digitado ⋅ 3 de February de 2026

arXiv:2602.00974v1 Announce Type: new Abstract: Label-supervised manifold alignment bridges the gap between unsupervised and correspondence-based paradigms by leveraging shared label information to align multimodal datasets. Still, most existing methods rely on Euclidean geometry to model intra-domain relationships. This approach can fail when features are only weakly related to the task of interest, leading to noisy, semantically misleading structure and degraded alignment quality. To address this limitation, we introduce FoSTA (Forest-guided Semantic Transport Alignment), a scalable alignment framework that […]

Ver mais

Like 0

Liked Liked

technocracy

Exploration of Radar-based Obstacle Visualizations to Support Safety and Presence in Camera-Free Outdoor VR

digitado ⋅ 3 de February de 2026

arXiv:2602.00973v1 Announce Type: new Abstract: Outdoor virtual reality (VR) places users in dynamic physical environments where they must remain aware of real-world obstacles, including static structures and moving bystanders, while immersed in a virtual scene. This dual demand introduces challenges for both user safety and presence. Millimeter-wave (mmWave) radar offers a privacy-preserving alternative to camera-based sensing by detecting obstacles without capturing identifiable visual imagery, yet effective methods for communicating its sparse spatial information to users remain underexplored. In […]

Ver mais

Like 0

Liked Liked

technocracy

Cast: Automated Resilience Testing for Production Cloud Service Systems

digitado ⋅ 3 de February de 2026

arXiv:2602.00972v1 Announce Type: new Abstract: The distributed nature of microservice architecture introduces significant resilience challenges. Traditional testing methods, limited by extensive manual effort and oversimplified test environments, fail to capture production system complexity. To address these limitations, we present Cast, an automated, end-to-end framework for microservice resilience testing in production. It achieves high test fidelity by replaying production traffic against a comprehensive library of application-level faults to exercise internal error-handling logic. To manage the combinatorial test space, Cast […]

Ver mais

Like 0

Liked Liked

technocracy

Unveiling the Cognitive Compass: Theory-of-Mind-Guided Multimodal Emotion Reasoning

digitado ⋅ 3 de February de 2026

arXiv:2602.00971v1 Announce Type: new Abstract: Despite rapid progress in multimodal large language models (MLLMs), their capability for deep emotional understanding remains limited. We argue that genuine affective intelligence requires explicit modeling of Theory of Mind (ToM), the cognitive substrate from which emotions arise. To this end, we introduce HitEmotion, a ToM-grounded hierarchical benchmark that diagnoses capability breakpoints across increasing levels of cognitive depth. Second, we propose a ToM-guided reasoning chain that tracks mental states and calibrates cross-modal evidence […]

Ver mais

Like 0

Liked Liked

technocracy

Verification Required: The Impact of Information Credibility on AI Persuasion

digitado ⋅ 3 de February de 2026

arXiv:2602.00970v1 Announce Type: new Abstract: Agents powered by large language models (LLMs) are increasingly deployed in settings where communication shapes high-stakes decisions, making a principled understanding of strategic communication essential. Prior work largely studies either unverifiable cheap-talk or fully verifiable disclosure, failing to capture realistic domains in which information has probabilistic credibility. We introduce MixTalk, a strategic communication game for LLM-to-LLM interaction that models information credibility. In MixTalk, a sender agent strategically combines verifiable and unverifiable claims to […]

Ver mais

Like 0

Liked Liked

technocracy

On the Spectral Flattening of Quantized Embeddings

digitado ⋅ 3 de February de 2026

arXiv:2602.00969v1 Announce Type: new Abstract: Training Large Language Models (LLMs) at ultra-low precision is critically impeded by instability rooted in the conflict between discrete quantization constraints and the intrinsic heavy-tailed spectral nature of linguistic data. By formalizing the connection between Zipfian statistics and random matrix theory, we prove that the power-law decay in the singular value spectra of embeddings is a fundamental requisite for semantic encoding. We derive theoretical bounds showing that uniform quantization introduces a noise floor […]

Ver mais

Like 0

Liked Liked

technocracy

Robust Adaptive Learning Control for a Class of Non-affine Nonlinear Systems

digitado ⋅ 3 de February de 2026

arXiv:2602.00968v1 Announce Type: new Abstract: We address the tracking problem for a class of uncertain non-affine nonlinear systems with high relative degrees, performing non-repetitive tasks. We propose a rigorously proven, robust adaptive learning control scheme that relies on a gradient descent parameter adaptation law to handle the unknown time-varying parameters of the system, along with a state estimator that estimates the unmeasurable state variables. Furthermore, despite the inherently complex nature of the non-affine system, we provide an explicit […]

Ver mais

Like 0

Liked Liked