digitado

JAEGER: Joint 3D Audio-Visual Grounding and Reasoning in Simulated Physical Environments

digitado ⋅ 24 de February de 2026

arXiv:2602.18527v1 Announce Type: new Abstract: Current audio-visual large language models (AV-LLMs) are predominantly restricted to 2D perception, relying on RGB video and monaural audio. This design choice introduces a fundamental dimensionality mismatch that precludes reliable source localization and spatial reasoning in complex 3D environments. We address this limitation by presenting JAEGER, a framework that extends AV-LLMs to 3D space, to enable joint spatial grounding and reasoning through the integration of RGB-D observations and multi-channel first-order ambisonics. A core […]

Ver mais

Like 0

Liked Liked

technocracy

Communication-Efficient Multi-Modal Edge Inference via Uncertainty-Aware Distributed Learning

digitado ⋅ 21 de January de 2026

Semantic communication is emerging as a key enabler for distributed edge intelligence due to its capability to convey task-relevant meaning. However, achieving communication-efficient training and robust inference over wireless links remains challenging. This challenge is further exacerbated for multi-modal edge inference (MMEI) by two factors: 1) prohibitive communication overhead for distributed learning over bandwidth-limited wireless links, due to the emph{multi-modal} nature of the system; and 2) limited robustness under varying channels and noisy multi-modal inputs. In this paper, […]

Ver mais

Like 0

Liked Liked

technocracy

Optimal Bias-variance Tradeoff in Matrix and Tensor Estimation

digitado ⋅ 9 de February de 2026

arXiv:2509.17382v3 Announce Type: replace Abstract: We study matrix and tensor denoising when the underlying signal is textbf{not} necessarily low-rank. In the tensor setting, we observe [ Y = X^ast + Z in mathbb{R}^{p_1 times p_2 times p_3}, ] where $X^ast$ is an unknown signal tensor and $Z$ is a noise tensor. We propose a one-step variant of the higher-order SVD (HOSVD) estimator, denoted $widetilde X$, and show that, uniformly over any user-specified Tucker ranks $(r_1,r_2,r_3)$, with high probability, […]

Ver mais

Like 0

Liked Liked

technocracy

Exploring Anti-Aging Literature via ConvexTopics and Large Language Models

digitado ⋅ 25 de February de 2026

arXiv:2602.20224v1 Announce Type: new Abstract: The rapid expansion of biomedical publications creates challenges for organizing knowledge and detecting emerging trends, underscoring the need for scalable and interpretable methods. Common clustering and topic modeling approaches such as K-means or LDA remain sensitive to initialization and prone to local optima, limiting reproducibility and evaluation. We propose a reformulation of a convex optimization based clustering algorithm that produces stable, fine-grained topics by selecting exemplars from the data and guaranteeing a global […]

Ver mais

Like 0

Liked Liked

technocracy

Robust Online Learning

digitado ⋅ 6 de February de 2026

We study the problem of learning robust classifiers where the classifier will receive a perturbed input. Unlike robust PAC learning studied in prior work, here the clean data and its label are also adversarially chosen. We formulate this setting as an online learning problem and consider both the realizable and agnostic learnability of hypothesis classes. We define a new dimension of classes and show it controls the mistake bounds in the realizable setting and the regret bounds in […]

Ver mais

Like 0

Liked Liked

technocracy

MH-FLOCKE v0.5.0: Replaced mathematical CPG with Izhikevich half-center oscillators

digitado ⋅ 14 de April de 2026

Update on MH-FLOCKE. This version brought two things: a 60% SNN speedup and a neural CPG to replace the sine waves. Long nights. The speedup came from wrapping the SNN step in torch.no_grad(), switching to dense matmul for small networks, and caching time constants. The 232-neuron Freenove SNN now runs at 1.2ms/step in simulation. Along the way I found that setting output neurons to Fast Spiking (Izhikevich a=0.1) destabilized the Go2 — motoneurons are biologically Regular Spiking, not […]

Ver mais

Like 0

Liked Liked

technocracy

Reconstruction-Guided Slot Curriculum: Addressing Object Over-Fragmentation in Video Object-Centric Learning

digitado ⋅ 24 de March de 2026

Video Object-Centric Learning seeks to decompose raw videos into a small set of object slots, but existing slot-attention models often suffer from severe over-fragmentation. This is because the model is implicitly encouraged to occupy all slots to minimize the reconstruction objective, thereby representing a single object with multiple redundant slots. We tackle this limitation with a reconstruction-guided slot curriculum (SlotCurri). Training starts with only a few coarse slots and progressively allocates new slots where reconstruction error remains high, […]

Ver mais

Like 0

Liked Liked

technocracy

Efficient Variance-reduced Estimation from Generative EHR Models: The SCOPE and REACH Estimators

digitado ⋅ 4 de February de 2026

arXiv:2602.03730v1 Announce Type: new Abstract: Generative models trained using self-supervision of tokenized electronic health record (EHR) timelines show promise for clinical outcome prediction. This is typically done using Monte Carlo simulation for future patient trajectories. However, existing approaches suffer from three key limitations: sparse estimate distributions that poorly differentiate patient risk levels, extreme computational costs, and high sampling variance. We propose two new estimators: the Sum of Conditional Outcome Probability Estimator (SCOPE) and Risk Estimation from Anticipated Conditional […]

Ver mais

Like 0

Liked Liked

technocracy

Fog of War Chess

digitado ⋅ 28 de January de 2026

arXiv:2601.18813v1 Announce Type: new Abstract: Fog of War chess is a popular variant of classical chess, in which both players have only partial information about the position of the opponent’s pieces. This study provides the first theoretical analysis of endgames in Fog of War chess. In particular, we analyze the setups king and queen versus king, king and rook versus king, and king and two rooks versus king. We show that a king and queen can always guarantee […]

Ver mais

Like 0

Liked Liked

technocracy

Rethinking Reinforcement fine-tuning of LLMs: A Multi-armed Bandit Learning Perspective

digitado ⋅ 21 de January de 2026

A large number of heuristics have been proposed to optimize the reinforcement fine-tuning of LLMs. However, inconsistent claims are made from time to time, making this area elusive. Reflecting on this situation, two fundamental questions still lack a clear understanding: 1) what is the role of each optimizing choice? 2) which ones are the bottlenecks? This paper aims to shed light on them, and it faces the challenge of several entangled confounding factors in the fine-tuning process. To […]

Ver mais

Like 0

Liked Liked