February 2026

Convex Dominance in Deep Learning I: A Scaling Law of Loss and Learning Rate

digitado ⋅ 10 de February de 2026

arXiv:2602.07145v1 Announce Type: new Abstract: Deep learning has non-convex loss landscape and its optimization dynamics is hard to analyze or control. Nevertheless, the dynamics can be empirically convex-like across various tasks, models, optimizers, hyperparameters, etc. In this work, we examine the applicability of convexity and Lipschitz continuity in deep learning, in order to precisely control the loss dynamics via the learning rate schedules. We illustrate that deep learning quickly becomes weakly convex after a short period of training, […]

Ver mais

Like 0

Liked Liked

technocracy

Massive Sound Embedding Benchmark (MSEB)

digitado ⋅ 10 de February de 2026

arXiv:2602.07143v1 Announce Type: new Abstract: Audio is a critical component of multimodal perception, and any truly intelligent system must demonstrate a wide range of auditory capabilities. These capabilities include transcription, classification, retrieval, reasoning, segmentation, clustering, reranking, and reconstruction. Fundamentally, each task involves transforming a raw audio signal into a meaningful ’embedding’ – be it a single vector, a sequence of continuous or discrete representations, or another structured form – which then serves as the basis for generating the […]

Ver mais

Like 0

Liked Liked

technocracy

Exploring Teachers’ Perspectives on Using Conversational AI Agents for Group Collaboration

digitado ⋅ 10 de February de 2026

arXiv:2602.07142v1 Announce Type: new Abstract: Collaboration is a cornerstone of 21st-century learning, yet teachers continue to face challenges in supporting productive peer interaction. Emerging generative AI tools offer new possibilities for scaffolding collaboration, but their role in mediating in-person group work remains underexplored, especially from the perspective of educators. This paper presents findings from an exploratory qualitative study with 33 K12 teachers who interacted with Phoenix, a voice-based conversational agent designed to function as a near-peer in face-to-face […]

Ver mais

Like 0

Liked Liked

technocracy

Featured Reproducing Kernel Banach Spaces for Learning and Neural Networks

digitado ⋅ 10 de February de 2026

arXiv:2602.07141v1 Announce Type: new Abstract: Reproducing kernel Hilbert spaces provide a foundational framework for kernel-based learning, where regularization and interpolation problems admit finite-dimensional solutions through classical representer theorems. Many modern learning models, however — including fixed-architecture neural networks equipped with non-quadratic norms — naturally give rise to non-Hilbertian geometries that fall outside this setting. In Banach spaces, continuity of point-evaluation functionals alone is insufficient to guarantee feature representations or kernel-based learning formulations. In this work, we develop a […]

Ver mais

Like 0

Liked Liked

technocracy

ImmCOGNITO: Identity Obfuscation in Millimeter-Wave Radar-Based Gesture Recognition for IoT Environments

digitado ⋅ 10 de February de 2026

arXiv:2602.07139v1 Announce Type: new Abstract: Millimeter-Wave (mmWave) radar enables camera-free gesture recognition for Internet of Things (IoT) interfaces, with robustness to lighting variations and partial occlusions. However, recent studies reveal that its data can inadvertently encode biometric signatures, raising critical privacy challenges for IoT applications. In particular, we demonstrate that mmWave radar point cloud data can leak identity-related information in the absence of explicit identity labels. To address this risk, we propose {ImmCOGNITO}, a graph-based autoencoder that transforms […]

Ver mais

Like 0

Liked Liked

technocracy

Landscaper: Understanding Loss Landscapes Through Multi-Dimensional Topological Analysis

digitado ⋅ 10 de February de 2026

arXiv:2602.07135v1 Announce Type: new Abstract: Loss landscapes are a powerful tool for understanding neural network optimization and generalization, yet traditional low-dimensional analyses often miss complex topological features. We present Landscaper, an open-source Python package for arbitrary-dimensional loss landscape analysis. Landscaper combines Hessian-based subspace construction with topological data analysis to reveal geometric structures such as basin hierarchy and connectivity. A key component is the Saddle-Minimum Average Distance (SMAD) for quantifying landscape smoothness. We demonstrate Landscaper’s effectiveness across various architectures […]

Ver mais

Like 0

Liked Liked

technocracy

Finding Connections: Membership Inference Attacks for the Multi-Table Synthetic Data Setting

digitado ⋅ 10 de February de 2026

arXiv:2602.07126v1 Announce Type: new Abstract: Synthetic tabular data has gained attention for enabling privacy-preserving data sharing. While substantial progress has been made in single-table synthetic generation where data are modeled at the row or item level, most real-world data exists in relational databases where a user’s information spans items across multiple interconnected tables. Recent advances in synthetic relational data generation have emerged to address this complexity, yet release of these data introduce unique privacy challenges as information can […]

Ver mais

Like 0

Liked Liked

technocracy

Reasoning-Augmented Representations for Multimodal Retrieval

digitado ⋅ 10 de February de 2026

arXiv:2602.07125v1 Announce Type: new Abstract: Universal Multimodal Retrieval (UMR) seeks any-to-any search across text and vision, yet modern embedding models remain brittle when queries require latent reasoning (e.g., resolving underspecified references or matching compositional constraints). We argue this brittleness is often data-induced: when images carry “silent” evidence and queries leave key semantics implicit, a single embedding pass must both reason and compress, encouraging spurious feature matching. We propose a data-centric framework that decouples these roles by externalizing reasoning […]

Ver mais

Like 0

Liked Liked

technocracy

Anchored Decoding: Provably Reducing Copyright Risk for Any Language Model

digitado ⋅ 10 de February de 2026

arXiv:2602.07120v1 Announce Type: new Abstract: Modern language models (LMs) tend to memorize portions of their training data and emit verbatim spans. When the underlying sources are sensitive or copyright-protected, such reproduction raises issues of consent and compensation for creators and compliance risks for developers. We propose Anchored Decoding, a plug-and-play inference-time method for suppressing verbatim copying: it enables decoding from any risky LM trained on mixed-license data by keeping generation in bounded proximity to a permissively trained safe […]

Ver mais

Like 0

Liked Liked

technocracy

ShallowJail: Steering Jailbreaks against Large Language Models

digitado ⋅ 10 de February de 2026

arXiv:2602.07107v1 Announce Type: new Abstract: Large Language Models(LLMs) have been successful in numerous fields. Alignment has usually been applied to prevent them from harmful purposes. However, aligned LLMs remain vulnerable to jailbreak attacks that deliberately mislead them into producing harmful outputs. Existing jailbreaks are either black-box, using carefully crafted, unstealthy prompts, or white-box, requiring resource-intensive computation. In light of these challenges, we introduce ShallowJail, a novel attack that exploits shallow alignment in LLMs. ShallowJail can misguide LLMs’ responses […]

Ver mais

Like 0

Liked Liked