January 2026

Simulated Annealing Enhances Theory-of-Mind Reasoning in Autoregressive Language Models

digitado ⋅ 21 de January de 2026

arXiv:2601.12269v1 Announce Type: new Abstract: Autoregressive language models are next-token predictors and have been criticized for only optimizing surface plausibility (i.e., local coherence) rather than maintaining correct latent-state representations (i.e., global coherence). Because Theory of Mind (ToM) tasks crucially depend on reasoning about latent mental states of oneself and others, such models are therefore often thought to fail at ToM. While post-training methods can improve ToM performance, we show that strong ToM capability can be recovered directly from […]

Ver mais

Like 0

Liked Liked

technocracy

Opportunistic Scheduling for Optimal Spot Instance Savings in the Cloud

digitado ⋅ 21 de January de 2026

arXiv:2601.12266v1 Announce Type: new Abstract: We study the problem of scheduling delay-sensitive jobs over spot and on-demand cloud instances to minimize average cost while meeting an average delay constraint. Jobs arrive as a general stochastic process, and incur different costs based on the instance type. This work provides the first analytical treatment of this problem using tools from queuing theory, stochastic processes, and optimization. We derive cost expressions for general policies, prove queue length one is optimal for […]

Ver mais

Like 0

Liked Liked

technocracy

Statistical Firefly Algorithm for Truss Topology Optimization

digitado ⋅ 21 de January de 2026

arXiv:2601.12265v1 Announce Type: new Abstract: This study proposes an algorithm titled a statistical firefly algorithm (SFA) for truss topology optimization. In the proposed algorithm, historical results of fireflies’ motions are used in hypothesis testing to limit the motions of fireflies that are suggested by current information exchanges between fireflies only to those that are potentially useful. Hypothesis testing is applied to the mechanism of an ordinary firefly algorithm (FA) without changing its structure. As a result, the implementation […]

Ver mais

Like 0

Liked Liked

technocracy

Multimodal Generative Engine Optimization: Rank Manipulation for Vision-Language Model Rankers

digitado ⋅ 21 de January de 2026

arXiv:2601.12263v1 Announce Type: new Abstract: Vision-Language Models (VLMs) are rapidly replacing unimodal encoders in modern retrieval and recommendation systems. While their capabilities are well-documented, their robustness against adversarial manipulation in competitive ranking scenarios remains largely unexplored. In this paper, we uncover a critical vulnerability in VLM-based product search: multimodal ranking attacks. We present Multimodal Generative Engine Optimization (MGEO), a novel adversarial framework that enables a malicious actor to unfairly promote a target product by jointly optimizing imperceptible image […]

Ver mais

Like 0

Liked Liked

technocracy

Environment-Aware Code Generation: How far are We?

digitado ⋅ 21 de January de 2026

arXiv:2601.12262v1 Announce Type: new Abstract: Recent progress in large language models (LLMs) has improved code generation, but most evaluations still test isolated, small-scale code (e.g., a single function) under default or unspecified software environments. As a result, it is unclear whether LLMs can reliably generate executable code tailored to a user’s specific environment. We present the first systematic study of Environment-Aware Code Generation (EACG), where generated code must be functionally correct and directly executable under arbitrary software configurations. […]

Ver mais

Like 0

Liked Liked

technocracy

Docs2Synth: A Synthetic Data Trained Retriever Framework for Scanned Visually Rich Documents Understanding

digitado ⋅ 21 de January de 2026

arXiv:2601.12260v1 Announce Type: new Abstract: Document understanding (VRDU) in regulated domains is particularly challenging, since scanned documents often contain sensitive, evolving, and domain specific knowledge. This leads to two major challenges: the lack of manual annotations for model adaptation and the difficulty for pretrained models to stay up-to-date with domain-specific facts. While Multimodal Large Language Models (MLLMs) show strong zero-shot abilities, they still suffer from hallucination and limited domain grounding. In contrast, discriminative Vision-Language Pre-trained Models (VLPMs) provide […]

Ver mais

Like 0

Liked Liked

technocracy

FutureX-Pro: Extending Future Prediction to High-Value Vertical Domains

digitado ⋅ 21 de January de 2026

arXiv:2601.12259v1 Announce Type: new Abstract: Building upon FutureX, which established a live benchmark for general-purpose future prediction, this report introduces FutureX-Pro, including FutureX-Finance, FutureX-Retail, FutureX-PublicHealth, FutureX-NaturalDisaster, and FutureX-Search. These together form a specialized framework extending agentic future prediction to high-value vertical domains. While generalist agents demonstrate proficiency in open-domain search, their reliability in capital-intensive and safety-critical sectors remains under-explored. FutureX-Pro targets four economically and socially pivotal verticals: Finance, Retail, Public Health, and Natural Disaster. We benchmark agentic Large […]

Ver mais

Like 0

Liked Liked

technocracy

Soft Shadow Diffusion (SSD): Physics-inspired Learning for 3D Computational Periscopy

digitado ⋅ 21 de January de 2026

arXiv:2601.12257v1 Announce Type: new Abstract: Conventional imaging requires a line of sight to create accurate visual representations of a scene. In certain circumstances, however, obtaining a suitable line of sight may be impractical, dangerous, or even impossible. Non-line-of-sight (NLOS) imaging addresses this challenge by reconstructing the scene from indirect measurements. Recently, passive NLOS methods that use an ordinary photograph of the subtle shadow cast onto a visible wall by the hidden scene have gained interest. These methods are […]

Ver mais

Like 0

Liked Liked

technocracy

Improving Large Molecular Language Model via Relation-aware Multimodal Collaboration

digitado ⋅ 21 de January de 2026

arXiv:2601.12256v1 Announce Type: new Abstract: Large language models (LLMs) have demonstrated their instruction-following capabilities and achieved powerful performance on various tasks. Inspired by their success, recent works in the molecular domain have led to the development of large molecular language models (LMLMs) that integrate 1D molecular strings or 2D molecular graphs into the language models. However, existing LMLMs often suffer from hallucination and limited robustness, largely due to inadequate integration of diverse molecular modalities such as 1D sequences, […]

Ver mais

Like 0

Liked Liked

technocracy

Confidence-based Filtering for Speech Dataset Curation with Generative Speech Enhancement Using Discrete Tokens

digitado ⋅ 21 de January de 2026

arXiv:2601.12254v1 Announce Type: new Abstract: Generative speech enhancement (GSE) models show great promise in producing high-quality clean speech from noisy inputs, enabling applications such as curating noisy text-to-speech (TTS) datasets into high-quality ones. However, GSE models are prone to hallucination errors, such as phoneme omissions and speaker inconsistency, which conventional error filtering based on non-intrusive speech quality metrics often fails to detect. To address this issue, we propose a non-intrusive method for filtering hallucination errors from discrete token-based […]

Ver mais

Like 0

Liked Liked