digitado

About digitado

https://www.digitado.com.br

Posts by :

Open-TQ-Metal: Fused Compressed-Domain Attention for Long-Context LLM Inference on Apple Silicon

digitado ⋅ 21 de April de 2026

arXiv:2604.16957v1 Announce Type: new Abstract: We present Open-TQ-Metal, the first implementation of fused compressed-domain attention on Apple Silicon, enabling 128K-context inference for Llama 3.1 70B on a single 64GB consumer Mac — a configuration impossible with all existing inference frameworks. Open-TQ-Metal quantizes the KV cache to int4 on the fly and computes attention directly on the compressed representation via custom Metal compute shaders, eliminating all intermediate dequantization matrices. Across 330 experiments spanning two model families (Gemma 4 31B […]

Ver mais

Like 0

Liked Liked

technocracy

Training-inference input alignment outweighs framework choice in longitudinal retinal image prediction

digitado ⋅ 21 de April de 2026

arXiv:2604.16955v1 Announce Type: new Abstract: Quantitative prediction of future retinal appearance from longitudinal imaging would support clinical decisions in progressive macular disease that currently rely on qualitative comparison or scalar progression scores. Recent methods have moved toward increasing generative complexity, but whether this complexity is necessary for slowly progressing retinal disease is unclear. We tested this through a controlled comparison of five conditioning configurations sharing one architecture and training dataset, spanning standard conditional diffusion, inference-aligned stochastic training, and […]

Ver mais

Like 0

Liked Liked

technocracy

TSM-Pose: Topology-Aware Learning with Semantic Mamba for Category-Level Object Pose Estimation

digitado ⋅ 21 de April de 2026

arXiv:2604.16954v1 Announce Type: new Abstract: Category-level object pose estimation is fundamental for embodied intelligence, yet achieving robust generalization to unseen instances remains challenging. However, existing methods mainly rely on simple feature extraction and aggregation, which struggle to capture category-shared topological structures and conduct semantic keypoint modeling, limiting their generalization. To address these, we propose a textbf{T}opology-Aware Learning with textbf{S}emantic textbf{M}amba for Category-Level textbf{P}ose Estimation framework (TSM-Pose). Specifically, we introduce a Topology Extractor to capture the global topological representation […]

Ver mais

Like 0

Liked Liked

technocracy

Better with Less: Tackling Heterogeneous Multi-Modal Image Joint Pretraining via Conditioned and Degraded Masked Autoencoder

digitado ⋅ 21 de April de 2026

arXiv:2604.16952v1 Announce Type: new Abstract: Learning robust representations across extremely heterogeneous modalities remains a fundamental challenge in multi-modal vision. As a critical and profound instantiation of this challenge, high-resolution (HR) joint optical and synthetic aperture radar (SAR) pretraining seeks modality synergy to mutually enhance single-source representations; its potential is severely hindered by the Heterogeneity-Resolution Paradox: finer spatial scales drastically amplify the physical divergence between complex radar geometries and non-homologous optical textures. Consequently, migrating medium-resolution-oriented rigid alignment paradigms to […]

Ver mais

Like 0

Liked Liked

technocracy

AutoPKG: An Automated Framework for Dynamic E-commerce Product-Attribute Knowledge Graph Construction

digitado ⋅ 21 de April de 2026

arXiv:2604.16950v1 Announce Type: new Abstract: Product attribute extraction in e-commerce is bottlenecked by ontologies that are inconsistent, incomplete, and costly to maintain. We present AutoPKG, a multi-agent Large Language Model (LLM) framework that automatically constructs a Product-attribute Knowledge Graph (PKG) from multimodal product content. AutoPKG induces product types and type-specific attribute keys on demand, extracts attribute values from text and images, and consolidates updates through a centralized decision agent that maintains a globally consistent canonical graph. We also […]

Ver mais

Like 0

Liked Liked

technocracy

L1 Regularization Paths in Linear Models by Parametric Gaussian Message Passing

digitado ⋅ 21 de April de 2026

arXiv:2604.16949v1 Announce Type: new Abstract: The paper considers the computation of L1 regularization paths in a state space setting, which includes L1 regularized Kalman smoothing, linear SVM, LASSO, and more. The paper proposes two new algorithms, which are duals of each other; the first algorithm applies to L1 regularization of independent variables while the second applies to L1 regularization of dependent variables. The heart of the proposed algorithms is parametric Gaussian message passing (i.e., Kalman-type forward-backward recursions) in […]

Ver mais

Like 0

Liked Liked

technocracy

Selecting Normal-Form Nash Equilibria in Extensive-Form Games via a Sequence-Form Variant of Logit Quantal Response Equilibrium

digitado ⋅ 21 de April de 2026

arXiv:2604.16944v1 Announce Type: new Abstract: Although logit quantal response equilibrium (logit QRE) offers a natural equilibrium selection mechanism and converges to Nash equilibrium as the rationality parameter tends to infinity, its computation in extensive-form games is generally intractable when based on the normal-form representation, due to the exponential growth of the strategy space. To address this difficulty, this paper develops a sequence-form formulation of logit QRE for finite n-player extensive-form games with perfect recall, which avoids explicit construction […]

Ver mais

Like 0

Liked Liked

technocracy

MNAFT: modality neuron-aware fine-tuning of multimodal large language models for image translation

digitado ⋅ 21 de April de 2026

arXiv:2604.16943v1 Announce Type: new Abstract: Multimodal large language models (MLLMs) have shown impressive capabilities, yet they often struggle to effectively capture the fine-grained textual information within images crucial for accurate image translation. This often leads to a modality gap between visual text inputs and textual inputs/outputs for image translation. Existing methods, primarily relying on instruction fine-tuning, risk parameter redundancy of pre-trained knowledge, hindering generalization performance. To address this, we introduce modality neuron-aware fine-tuning (MNAFT), a novel approach that […]

Ver mais

Like 0

Liked Liked

technocracy

Jointly Correlated Dual-Side Fluid Antenna System

digitado ⋅ 21 de April de 2026

arXiv:2604.16942v1 Announce Type: new Abstract: Fluid antenna systems (FASs) have introduced a new paradigm for wireless system design by revealing how mutual correlation can be exploited to harvest inherent spatial diversity. While existing studies have mainly focused on one-sided FAS configurations, i.e., with FAS deployed at either the transmitter or the receiver, this work investigates the ergodic capacity of a jointly correlated dual-side FAS under statistical eigenmode transmission. Specifically, a jointly correlated dual-side channel model is developed, and […]

Ver mais

Like 0

Liked Liked

technocracy

MEMRES: A Memory-Augmented Resolver with Confidence Cascade for Agentic Python Dependency Resolution

digitado ⋅ 21 de April de 2026

arXiv:2604.16941v1 Announce Type: new Abstract: We present MEMRES, an agentic system for Python dependency resolution that introduces a multi-level confidence cascade where the LLM serves as the last resort. Our system combines: (1) a Self-Evolving Memory that accumulates reusable resolution patterns via tips and shortcuts; (2) an Error Pattern Knowledge Base with 200+ curated import-to-package mappings; (3) a Semantic Import Analyzer; and (4) a Python 2 heuristic detector resolving the largest failure category. On HG2.9K using Gemma-2 9B […]

Ver mais

Like 0

Liked Liked