January 2026

Exploring the Effects of Alignment on Numerical Bias in Large Language Models

digitado ⋅ 26 de January de 2026

arXiv:2601.16444v1 Announce Type: new Abstract: “LLM-as-a-judge,” which utilizes large language models (LLMs) as evaluators, has proven effective in many evaluation tasks. However, evaluator LLMs exhibit numerical bias, a phenomenon where certain evaluation scores are generated disproportionately often, leading reduced evaluation performance. This study investigates the cause of this bias. Given that most evaluator LLMs are aligned through instruction tuning and preference tuning, and that prior research suggests alignment reduces output diversity, we hypothesize that numerical bias arises from […]

Ver mais

Like 0

Liked Liked

technocracy

Endless Terminals: Scaling RL Environments for Terminal Agents

digitado ⋅ 26 de January de 2026

arXiv:2601.16443v1 Announce Type: new Abstract: Environments are the bottleneck for self-improving agents. Current terminal benchmarks were built for evaluation, not training; reinforcement learning requires a scalable pipeline, not just a dataset. We introduce Endless Terminals, a fully autonomous pipeline that procedurally generates terminal-use tasks without human annotation. The pipeline has four stages: generating diverse task descriptions, building and validating containerized environments, producing completion tests, and filtering for solvability. From this pipeline we obtain 3255 tasks spanning file operations, […]

Ver mais

Like 0

Liked Liked

technocracy

Masked Face Recognition under Different Backbones

digitado ⋅ 26 de January de 2026

arXiv:2601.16440v1 Announce Type: new Abstract: Erratum to the paper (Zhang et al., 2025): corrections to Table IV and the data in Page 3, Section A. In the post-pandemic era, a high proportion of civil aviation passengers wear masks during security checks, posing significant challenges to traditional face recognition models. The backbone network serves as the core component of face recognition models. In standard tests, r100 series models excelled (98%+ accuracy at 0.01% FAR in face comparison, high top1/top5 […]

Ver mais

Like 0

Liked Liked

technocracy

Two classes of LCD codes derived from $(mathcal{L},mathcal{P})$-TGRS codes

digitado ⋅ 26 de January de 2026

arXiv:2601.16438v1 Announce Type: new Abstract: Twisted generalized Reed-Solomon (TGRS) codes, as a flexible extension of classical generalized Reed-Solomon (GRS) codes, have attracted significant attention in recent years. In this paper, we construct two classes of LCD codes from the $(mathcal{L},mathcal{P})$-TGRS code $mathcal{C}_h$ of length $n$ and dimension $k$, where $mathcal{L}={0,1,ldots,l}$ for $lleq n-k-1$ and $mathcal{P}={h}$ for $1leq hleq k-1$. First, we derive the parity check matrix of $mathcal{C}_h$ and provide a necessary and sufficient condition for $mathcal{C}_h$ to […]

Ver mais

Like 0

Liked Liked

technocracy

MDAFNet: Multiscale Differential Edge and Adaptive Frequency Guided Network for Infrared Small Target Detection

digitado ⋅ 26 de January de 2026

arXiv:2601.16434v1 Announce Type: new Abstract: Infrared small target detection (IRSTD) plays a crucial role in numerous military and civilian applications. However, existing methods often face the gradual degradation of target edge pixels as the number of network layers increases, and traditional convolution struggles to differentiate between frequency components during feature extraction, leading to low-frequency backgrounds interfering with high-frequency targets and high-frequency noise triggering false detections. To address these limitations, we propose MDAFNet (Multi-scale Differential Edge and Adaptive Frequency […]

Ver mais

Like 0

Liked Liked

technocracy

iPDB — Optimizing SQL Queries with ML and LLM Predicates

digitado ⋅ 26 de January de 2026

arXiv:2601.16432v1 Announce Type: new Abstract: Structured Query Language (SQL) has remained the standard query language for databases. SQL is highly optimized for processing structured data laid out in relations. Meanwhile, in the present application development landscape, it is highly desirable to utilize the power of learned models to perform complex tasks. Large language models (LLMs) have been shown to understand and extract information from unstructured textual data. However, SQL as a query language and accompanying relational database systems […]

Ver mais

Like 0

Liked Liked

technocracy

AlphaFace: High Fidelity and Real-time Face Swapper Robust to Facial Pose

digitado ⋅ 26 de January de 2026

arXiv:2601.16429v1 Announce Type: new Abstract: Existing face-swapping methods often deliver competitive results in constrained settings but exhibit substantial quality degradation when handling extreme facial poses. To improve facial pose robustness, explicit geometric features are applied, but this approach remains problematic since it introduces additional dependencies and increases computational cost. Diffusion-based methods have achieved remarkable results; however, they are impractical for real-time processing. We introduce AlphaFace, which leverages an open-source vision-language model and CLIP image and text embeddings to […]

Ver mais

Like 0

Liked Liked

technocracy

DCCS-Det: Directional Context and Cross-Scale-Aware Detector for Infrared Small Target

digitado ⋅ 26 de January de 2026

arXiv:2601.16428v1 Announce Type: new Abstract: Infrared small target detection (IRSTD) is critical for applications like remote sensing and surveillance, which aims to identify small, low-contrast targets against complex backgrounds. However, existing methods often struggle with inadequate joint modeling of local-global features (harming target-background discrimination) or feature redundancy and semantic dilution (degrading target representation quality). To tackle these issues, we propose DCCS-Det (Directional Context and Cross-Scale Aware Detector for Infrared Small Target), a novel detector that incorporates a Dual-stream […]

Ver mais

Like 0

Liked Liked

technocracy

Safe Multitask Molecular Graph Networks for Vapor Pressure and Odor Threshold Prediction

digitado ⋅ 26 de January de 2026

arXiv:2601.16426v1 Announce Type: new Abstract: We investigate two important tasks in odor-related property modeling: Vapor Pressure (VP) and Odor Threshold (OP). To evaluate the model’s out-of-distribution (OOD) capability, we adopt the Bemis-Murcko scaffold split. In terms of features, we introduce the rich A20/E17 molecular graph features (20-dimensional atom features + 17-dimensional bond features) and systematically compare GINE and PNA backbones. The results show: for VP, PNA with a simple regression head achieves Val MSE $approx$ 0.21 (normalized space); […]

Ver mais

Like 0

Liked Liked

technocracy

Bayesian Experimental Design for Model Discrepancy Calibration: A Rivalry between Kullback–Leibler Divergence and Wasserstein Distance

digitado ⋅ 26 de January de 2026

arXiv:2601.16425v1 Announce Type: new Abstract: Designing experiments that systematically gather data from complex physical systems is central to accelerating scientific discovery. While Bayesian experimental design (BED) provides a principled, information-based framework that integrates experimental planning with probabilistic inference, the selection of utility functions in BED is a long-standing and active topic, where different criteria emphasize different notions of information. Although Kullback–Leibler (KL) divergence has been one of the most common choices, recent studies have proposed Wasserstein distance as […]

Ver mais

Like 0

Liked Liked