March 2026

EARTalking: End-to-end GPT-style Autoregressive Talking Head Synthesis with Frame-wise Control

digitado ⋅ 24 de March de 2026

arXiv:2603.20307v1 Announce Type: new Abstract: Audio-driven talking head generation aims to create vivid and realistic videos from a static portrait and speech. Existing AR-based methods rely on intermediate facial representations, which limit their expressiveness and realism. Meanwhile, diffusion-based methods generate clip-by-clip, lacking fine-grained control and causing inherent latency due to overall denoising across the window. To address these limitations, we propose EARTalking, a novel end-to-end, GPT-style autoregressive model for interactive audio-driven talking head generation. Our method introduces a […]

Ver mais

Like 0

Liked Liked

technocracy

Review and Analysis of Scientific Paper Embellishments

digitado ⋅ 24 de March de 2026

arXiv:2603.20306v1 Announce Type: new Abstract: We present a review and analysis of scientific paper embellishments — simple visual elements that are deeply integrated into the text of scientific publications. These embellishments are increasingly used in research papers, which have the potential to enhance textual descriptions, strengthen connections between figures and content, and improve internal textual coherence, while also carrying the risk of disrupting the reading experience. As their exact impact is not yet well understood, we conducted a […]

Ver mais

Like 0

Liked Liked

technocracy

The Global-Local loop: what is missing in bridging the gap between geospatial data from numerous communities?

digitado ⋅ 24 de March de 2026

arXiv:2603.20305v1 Announce Type: new Abstract: We face a unprecedented amount of geospatial data, describing directly or indirectly the Earth Surface at multiple spatial, temporal, and semantic scales, and stemming from numerous contributors, from satellites to citizens. The main challenge in all the geospatial-related communities lies in suitably leveraging a combination of some of the sources for either a generic or a thematic application. Certain data fusion schemes are predominantly exploited: they correspond to popular tasks with mainstream data […]

Ver mais

Like 0

Liked Liked

technocracy

Transferable Multi-Bit Watermarking Across Frozen Diffusion Models via Latent Consistency Bridges

digitado ⋅ 24 de March de 2026

arXiv:2603.20304v1 Announce Type: new Abstract: As diffusion models (DMs) enable photorealistic image generation at unprecedented scale, watermarking techniques have become essential for provenance establishment and accountability. Existing methods face challenges: sampling-based approaches operate on frozen models but require costly $N$-step Denoising Diffusion Implicit Models (DDIM) inversion (typically N=50) for zero-bit-only detection; fine-tuning-based methods achieve fast multi-bit extraction but couple the watermark to a specific model checkpoint, requiring retraining for each architecture. We propose DiffMark, a plug-and-play watermarking method […]

Ver mais

Like 0

Liked Liked

technocracy

InjectFlow: Weak Guides Strong via Orthogonal Injection for Flow Matching

digitado ⋅ 24 de March de 2026

arXiv:2603.20303v1 Announce Type: new Abstract: Flow Matching (FM) has recently emerged as a leading approach for high-fidelity visual generation, offering a robust continuous-time alternative to ordinary differential equation (ODE) based models. However, despite their success, FM models are highly sensitive to dataset biases, which cause severe semantic degradation when generating out-of-distribution or minority-class samples. In this paper, we provide a rigorous mathematical formalization of the “Bias Manifold” within the FM framework. We identify that this performance drop is […]

Ver mais

Like 0

Liked Liked

technocracy

Voice Privacy from an Attribute-based Perspective

digitado ⋅ 24 de March de 2026

arXiv:2603.20301v1 Announce Type: new Abstract: Voice privacy approaches that preserve the anonymity of speakers modify speech in an attempt to break the link with the true identity of the speaker. Current benchmarks measure speaker protection based on signal-to-signal comparisons. In this paper, we introduce an attribute-based perspective, where we measure privacy protection in terms of comparisons between sets of speaker attributes. First, we analyze privacy impact by calculating speaker uniqueness for ground truth attributes, attributes inferred on the […]

Ver mais

Like 0

Liked Liked

technocracy

From Human Interfaces to Agent Interfaces: Rethinking Software Design in the Age of AI-Native Systems

digitado ⋅ 24 de March de 2026

arXiv:2603.20300v1 Announce Type: new Abstract: Software systems have traditionally been designed for human interaction, emphasizing graphical user interfaces, usability, and cognitive alignment with end users. However, recent advances in large language model (LLM)-based agents are changing the primary consumers of software systems. Increasingly, software is no longer only used by humans, but also invoked autonomously by AI agents through structured interfaces. In this paper, we argue that software engineering is undergoing a paradigm shift from human-oriented interfaces to […]

Ver mais

Like 0

Liked Liked

technocracy

HCAG: Hierarchical Abstraction and Retrieval-Augmented Generation on Theoretical Repositories with LLMs

digitado ⋅ 24 de March de 2026

arXiv:2603.20299v1 Announce Type: new Abstract: Existing Retrieval-Augmented Generation (RAG) methods for code struggle to capture the high-level architectural patterns and cross-file dependencies inherent in complex, theory-driven codebases, such as those in algorithmic game theory (AGT), leading to a persistent semantic and structural gap between abstract concepts and executable implementations. To address this challenge, we propose Hierarchical Code/Architecture-guided Agent Generation (HCAG), a framework that reformulates repository-level code generation as a structured, planning-oriented process over hierarchical knowledge. HCAG adopts a […]

Ver mais

Like 0

Liked Liked

technocracy

Error-detecting solid codes

digitado ⋅ 24 de March de 2026

arXiv:2603.20298v1 Announce Type: new Abstract: A code is called solid if, roughly speaking, any correctly-transmitted codeword in an arbitrarily corrupted string of codewords can still be decoded correctly and unambiguously. So-called variable-length solid codes, in which codewords may differ in length, have been studied by various authors. In this short note, we observe that a recent construction of variable-length solid codes based on binary codes may be extended to arbitrary n-ary codes. We further prove an interesting error-detection […]

Ver mais

Like 0

Liked Liked

technocracy

Transformer-Based Predictive Maintenance for Risk-Aware Instrument Calibration

digitado ⋅ 24 de March de 2026

arXiv:2603.20297v1 Announce Type: new Abstract: Accurate calibration is essential for instruments whose measurements must remain traceable, reliable, and compliant over long operating periods. Fixed-interval programs are easy to administer, but they ignore that instruments drift at different rates under different conditions. This paper studies calibration scheduling as a predictive maintenance problem: given recent sensor histories, estimate time-to-drift (TTD) and intervene before a violation occurs. We adapt the NASA C-MAPSS benchmark into a calibration setting by selecting drift-sensitive sensors, […]

Ver mais

Like 0

Liked Liked