digitado

SAD Neural Networks: Divergent Gradient Flows and Asymptotic Optimality via o-minimal Structures

digitado ⋅ 13 de January de 2026

arXiv:2505.09572v3 Announce Type: replace-cross Abstract: We study gradient flows for loss landscapes of fully connected feedforward neural networks with commonly used continuously differentiable activation functions such as the logistic, hyperbolic tangent, softplus or GELU function. We prove that the gradient flow either converges to a critical point or diverges to infinity while the loss converges to an asymptotic critical value. Moreover, we prove the existence of a threshold $varepsilon>0$ such that the loss value of any gradient flow […]

Ver mais

Like 0

Liked Liked

technocracy

[D] Where is modern geometry actually useful in machine learning? (data, architectures, optimization)

digitado ⋅ 3 de February de 2026

From April 2025 to January 2026, I worked through Frankel’s “The Geometry of Physics”. The goal wasn’t to “relearn physics”, but to rebuild a modern geometric toolbox and see which mature ideas from geometry and topology might still be underused in machine learning. The book develops a large amount of machinery—manifolds, differential forms, connections and curvature, Lie groups and algebras, bundles, gauge theory, variational principles, topology—and shows how these arise naturally across classical mechanics, electromagnetism, relativity, and quantum […]

Ver mais

Like 0

Liked Liked

technocracy

Lean Clients, Full Accuracy: Hybrid Zeroth- and First-Order Split Federated Learning

digitado ⋅ 14 de January de 2026

Split Federated Learning (SFL) enables collaborative training between resource-constrained edge devices and a compute-rich server. Communication overhead is a central issue in SFL and can be mitigated with auxiliary networks. Yet, the fundamental client-side computation challenge remains, as back-propagation requires substantial memory and computation costs, severely limiting the scale of models that edge devices can support. To enable more resource-efficient client computation and reduce the client-server communication, we propose HERON-SFL, a novel hybrid optimization framework that integrates zeroth-order […]

Ver mais

Like 0

Liked Liked

technocracy

Adaptive Optimization via Momentum on Variance-Normalized Gradients

digitado ⋅ 12 de February de 2026

arXiv:2602.10204v1 Announce Type: new Abstract: We introduce MVN-Grad (Momentum on Variance-Normalized Gradients), an Adam-style optimizer that improves stability and performance by combining two complementary ideas: variance-based normalization and momentum applied after normalization. MVN-Grad scales each coordinate by an exponential moving average of gradient uncertainty and applies momentum to the resulting normalized gradients, eliminating the cross-time coupling between stale momentum and a stochastic normalizer present in standard Adam-type updates. We prove that this decoupling yields strictly smaller one-step conditional […]

Ver mais

Like 0

Liked Liked

technocracy

Meet the Writer: Dechun on Building Reliable AI for High-Impact Systems

digitado ⋅ 16 de January de 2026

Welcome to HackerNoon’s Meet the Writer Interview series, where we learn a bit more about the contributors that have written some of our favorite stories. Let’s start! Tell us a bit about yourself (name, profession, and personal interests). My name is Dechun. I’m a software engineer working in the UK, mainly on large-scale payment systems. I spend a lot of time thinking about how AI behaves once it’s deployed in real-world environments. I’m particularly interested in how AI […]

Ver mais

Like 0

Liked Liked

technocracy

Factored Causal Representation Learning for Robust Reward Modeling in RLHF

digitado ⋅ 29 de January de 2026

A reliable reward model is essential for aligning large language models with human preferences through reinforcement learning from human feedback. However, standard reward models are susceptible to spurious features that are not causally related to human labels. This can lead to reward hacking, where high predicted reward does not translate into better behavior. In this work, we address this problem from a causal perspective by proposing a factored representation learning framework that decomposes the model’s contextual embedding into […]

Ver mais

Like 0

Liked Liked

technocracy

Gain an Even Bigger Audience: How to Import Your Articles Into HackerNoon

digitado ⋅ 30 de January de 2026

Hey Hackers! Picture this: You just started as a writer. You find this platform that you think is the perfect fit for your articles, so you start publishing with them. But as time goes on, the problems start to show themselves: too many ads, too many paywalls, not enough audience reach. You decide to leave this platform and move on to another one; unfortunately, the same issues arise. Then, after much trial and error, you find the perfect […]

Ver mais

Like 0

Liked Liked

technocracy

A Differential Geometry and Algebraic Topology Based Public-Key Cryptographic Algorithm in Presence of Quantum Adversaries

digitado ⋅ 19 de January de 2026

arXiv:2601.10883v1 Announce Type: new Abstract: In antiquity, the seal embodied trust, secrecy, and integrity in safeguarding the exchange of letters and messages. The purpose of this work is to continue this tradition in the contemporary era, characterized by the presence of quantum computers, classical supercomputers, and increasingly sophisticated artificial intelligence. We introduce Z-Sigil, an asymmetric public-key cryptographic algorithm grounded in functional analysis, differential geometry, and algebraic topology, with the explicit goal of achieving resistance against both classical and […]

Ver mais

Like 0

Liked Liked

technocracy

CycleBEV: Regularizing View Transformation Networks via View Cycle Consistency for Bird’s-Eye-View Semantic Segmentation

digitado ⋅ 2 de March de 2026

arXiv:2602.23575v1 Announce Type: new Abstract: Transforming image features from perspective view (PV) space to bird’s-eye-view (BEV) space remains challenging in autonomous driving due to depth ambiguity and occlusion. Although several view transformation (VT) paradigms have been proposed, the challenge still remains. In this paper, we propose a new regularization framework, dubbed CycleBEV, that enhances existing VT models for BEV semantic segmentation. Inspired by cycle consistency, widely used in image distribution modeling, we devise an inverse view transformation (IVT) […]

Ver mais

Like 0

Liked Liked

technocracy

MARS: Unleashing the Power of Speculative Decoding via Margin-Aware Verification

digitado ⋅ 23 de January de 2026

arXiv:2601.15498v1 Announce Type: new Abstract: Speculative Decoding (SD) accelerates autoregressive large language model (LLM) inference by decoupling generation and verification. While recent methods improve draft quality by tightly coupling the drafter with the target model, the verification mechanism itself remains largely unchanged, relying on strict token-level rejection sampling. In practice, modern LLMs frequently operate in low-margin regimes where the target model exhibits weak preference among top candidates. In such cases, rejecting plausible runner-up tokens yields negligible information gain […]

Ver mais

Like 0

Liked Liked