technocracy

Understanding the difficulty of training deep feedforward neural networks

digitado ⋅ 31 de March de 2010

Whereas before 2006 it appears that deep multi-layer neural networks were not successfully trained, since then several algorithms have been shown to successfully train them, with experimental results showing the superiority of deeper vs less deep architectures. All these experimental results were obtained with new initialization or training mechanisms. Our objective here is to understand better why standard gradient descent from random initialization is doing so poorly with deep neural networks, to better understand these recent relative successes […]

Ver mais

Like 0

Liked Liked

technocracy

[P] I Trained a Language Model on CPU for 40 Hours – It Beat the GPU Baseline

digitado ⋅ 22 de February de 2026

For those who have been following this project, you may recall FlashLM v3, then v4 “Bolt”, and v5.2 “Nova-Ignition”. I am pleased to announce that FlashLM v5 “Thunderbolt” is now complete. Results Metric Value Final PPL 1.36 Final BPC 0.44 Parameters 29.7M (26.5M ternary) Training Time ~40 hours Hardware AMD Ryzen 7950X3D FlashLM v5 achieves a validation perplexity of 1.36, which beats the TinyStories-1M baseline (PPL 1.59). This represents the first instance of a CPU-trained model beating this […]

Ver mais

Like 0

Liked Liked

technocracy

The HTML Partial Trap: Why HTMX is Only Half the Story

digitado ⋅ 3 de February de 2026

Introduction: The Hypermedia Renaissance We are living in the middle of a hypermedia renaissance. Frameworks like HTMX have correctly identified that the “Single Page Application” (SPA) model often introduces unnecessary complexity, pushing developers to manage state in two places and build expensive APIs for what should be simple UI updates. The core premise of HTMX is brilliant: Hypermedia as the Engine of Application State (HATEOAS). By returning HTML fragments (partials) from the server instead of raw data, we keep […]

Ver mais

Like 0

Liked Liked

technocracy

Statsformer: Validated Ensemble Learning with LLM-Derived Semantic Priors

digitado ⋅ 6 de February de 2026

arXiv:2601.21410v2 Announce Type: replace Abstract: We introduce Statsformer, a principled framework for integrating large language model (LLM)-derived knowledge into supervised statistical learning. Existing approaches are limited in adaptability and scope: they either inject LLM guidance as an unvalidated heuristic, which is sensitive to LLM hallucination, or embed semantic information within a single fixed learner. Statsformer overcomes both limitations through a guardrailed ensemble architecture. We embed LLM-derived feature priors within an ensemble of linear and nonlinear learners, adaptively calibrating […]

Ver mais

Like 0

Liked Liked

technocracy

Decoding Data: The Technical Secrets of LSEnet Algorithms and Benchmarks

digitado ⋅ 19 de February de 2026

Table of Links Abstract and 1. Introduction Related Work Preliminaries and Notations Differentiable Structural Information 4.1. A New Formulation 4.2. Properties 4.3. Differentiability & Deep Graph Clustering LSEnet 5.1. Embedding Leaf Nodes 5.2. Learning Parent Nodes 5.3. Hyperbolic Partitioning Tree Experiments 6.1. Graph Clustering 6.2. Discussion on Structural Entropy Conclusion, Broader Impact, and References Appendix A. Proofs B. Hyperbolic Space C. Technical Details D. Additional Results C. Technical Details C.1. Notations The mathematical notations are described in Table […]

Ver mais

Like 0

Liked Liked

technocracy

Grokking as a Phase Transition between Competing Basins: a Singular Learning Theory Approach

digitado ⋅ 1 de March de 2026

Grokking, the abrupt transition from memorization to generalisation after extended training, suggests the presence of competing solution basins with distinct statistical properties. We study this phenomenon through the lens of Singular Learning Theory (SLT), a Bayesian framework that characterizes the geometry of the loss landscape via the local learning coefficient (LLC), a measure of the local degeneracy of the loss surface. SLT links lower-LLC basins to higher posterior mass concentration and lower expected generalisation error. Leveraging this theory, […]

Ver mais

Like 0

Liked Liked

technocracy

The Value of Variance: Mitigating Debate Collapse in Multi-Agent Systems via Uncertainty-Driven Policy Optimization

digitado ⋅ 10 de February de 2026

arXiv:2602.07186v1 Announce Type: new Abstract: Multi-agent debate (MAD) systems improve LLM reasoning through iterative deliberation, but remain vulnerable to debate collapse, a failure type where final agent decisions are compromised on erroneous reasoning. Existing methods lack principled mechanisms to detect or prevent such failures. To address this gap, we first propose a hierarchical metric that quantifies behavioral uncertainty at three levels: intra-agent (individual reasoning uncertainty), inter-agent (interactive uncertainty), and system-level (output uncertainty). Empirical analysis across several benchmarks reveals […]

Ver mais

Like 0

Liked Liked

technocracy

Partial Rewriting and Value Interpretation of Logically Constrained Terms (Full Version)

digitado ⋅ 2 de February de 2026

arXiv:2601.22191v1 Announce Type: new Abstract: Logically constrained term rewrite systems (LCTRSs) are a rewriting formalism that naturally supports built-in data structures, including integers and bit-vectors. The recent framework of existentially constrained terms and most general constrained rewriting on them (Takahata et al., 2025) has many advantages over the original approach of rewriting constrained terms. In this paper, we introduce partial constrained rewriting, a variant of rewriting existentially constrained terms whose underlying idea has already appeared implicitly in previous […]

Ver mais

Like 0

Liked Liked

technocracy

You probably don’t need a Vector Database (Yet) for your RAG

digitado ⋅ 21 de February de 2026

Author(s): Thomas Reid Originally published on Towards AI. Numpy and/or SciKit-Learn might meet all your retrieval needs Right now, off the back of Retrieval Augmented Generation (RAG), vector databases are getting a lot of attention in the AI world. Image by Nano BananaThe article discusses the emerging popularity of vector databases in the AI landscape, particularly in the context of Retrieval Augmented Generation (RAG). It argues that while these tools are crucial for large-scale enterprise applications with extensive […]

Ver mais

Like 0

Liked Liked

technocracy

CHIME: Chiplet-based Heterogeneous Near-Memory Acceleration for Edge Multimodal LLM Inference

digitado ⋅ 29 de January de 2026

arXiv:2601.19908v1 Announce Type: new Abstract: The proliferation of large language models (LLMs) is accelerating the integration of multimodal assistants into edge devices, where inference is executed under stringent latency and energy constraints, often exacerbated by intermittent connectivity. These challenges become particularly acute in the context of multimodal LLMs (MLLMs), as high-dimensional visual inputs are transformed into extensive token sequences, thereby inflating the key-value (KV) cache and imposing substantial data movement overheads to the LLM backbone. To address these […]

Ver mais

Like 0

Liked Liked