digitado

Monkey Jump : MoE-Style PEFT for Efficient Multi-Task Learning

digitado ⋅ 10 de January de 2026

Mixture-of-experts variants of parameter-efficient fine-tuning enable per-token specialization, but they introduce additional trainable routers and expert parameters, increasing memory usage and training cost. This undermines the core goal of parameter-efficient fine-tuning. We propose Monkey Jump, a method that brings mixture-of-experts-style specialization to parameter-efficient fine-tuning without introducing extra trainable parameters for experts or routers. Instead of adding new adapters as experts, Monkey Jump treats the adapters already present in each Transformer block (such as query, key, value, up, and […]

Ver mais

Like 0

Liked Liked

technocracy

Deep Neural Nets: 33 years ago and 33 years from now

digitado ⋅ 14 de March de 2022

The Yann LeCun et al. (1989) paper Backpropagation Applied to Handwritten Zip Code Recognition is I believe of some historical significance because it is, to my knowledge, the earliest real-world application of a neural net trained end-to-end with backpropagation. Except for the tiny dataset (7291 16×16 grayscale images of digits) and the tiny neural network used (only 1,000 neurons), this paper reads remarkably modern today, 33 years later – it lays out a dataset, describes the neural net […]

Ver mais

Like 0

Liked Liked

technocracy

Meta showcases hands-free teleprompter built into Ray-Ban Display glasses and more at CES 2026

digitado ⋅ 9 de January de 2026

We all knew that this year’s CES would be dominated by AI, and the reality is nothing short of that. Major companies and startups have unveiled some of the most unique AI products at the largest consumer tech conference in Las Vegas. When we talk of AI, Meta is among the names that pops up in everyone’s head, and rightly so. Meta teleprompter brings notes right to your eyes Well, Meta showed up at the CES 2026 event, […]

Ver mais

Like 0

Liked Liked

technocracy

Researchers and Founders

digitado ⋅ 19 de June de 2020

I spent many years working with founders and now I work with researchers. Although there are always individual exceptions, on average it’s surprising to me how different the best people in these groups are (including in some qualities that I had assumed were present in great people everywhere, like very high levels of self-belief). So I’ve been thinking about the ways they’re the same, because maybe there is something to learn about qualities of really effective people in […]

Ver mais

Like 0

Liked Liked

technocracy

Lower Bounds for the Algorithmic Complexity of Learned Indexes

digitado ⋅ 10 de January de 2026

Learned index structures aim to accelerate queries by training machine learning models to approximate the rank function associated with a database attribute. While effective in practice, their theoretical limitations are not fully understood. We present a general framework for proving lower bounds on query time for learned indexes, expressed in terms of their space overhead and parameterized by the model class used for approximation. Our formulation captures a broad family of learned indexes, including most existing designs, as […]

Ver mais

Like 0

Liked Liked

technocracy

GSM8K-Platinum: Revealing Performance Gaps in Frontier LLMs

digitado ⋅ 6 de March de 2025

<!– –> Dataset Code Recently, we introduced Platinum Benchmarks as a step toward quantifying the reliability of large language models (LLMs). In that work, we revised older benchmarks to minimize label noise, such as ambiguous or mislabeled examples, and showed that frontier LLMs still make genuine errors on simple questions. For example, as part of that work we revised a 300-problem subset of GSM8K, a dataset of grade school math word problems, and found that all […]

Ver mais

Like 0

Liked Liked

technocracy

Unit testing in Python

digitado ⋅ 14 de December de 2013

Let’s be honest, code testing is everything but a joyful task. However, a good unit testing framework makes this process as smooth as possible. Eventually, testing becomes a regular and continuous process, accompanied by the assurance that our code will operate just as exact and seamlessly as a Swiss clockwork.

Ver mais

Like 0

Liked Liked

technocracy

Beyond Euclidean Space: Optimizing Hierarchical Data in Hyperbolic HSVMs

digitado ⋅ 10 de January de 2026

Table of Links Abstract and 1. Introduction Related Works Convex Relaxation Techniques for Hyperbolic SVMs 3.1 Preliminaries 3.2 Original Formulation of the HSVM 3.3 Semidefinite Formulation 3.4 Moment-Sum-of-Squares Relaxation Experiments 4.1 Synthetic Dataset 4.2 Real Dataset Discussions, Acknowledgements, and References A. Proofs B. Solution Extraction in Relaxed Formulation C. On Moment Sum-of-Squares Relaxation Hierarchy D. Platt Scaling [31] E. Detailed Experimental Results F. Robust Hyperbolic Support Vector Machine Abstract Hyperbolic spaces have increasingly been recognized for their outstanding […]

Ver mais

Like 0

Liked Liked

technocracy

Universal computation is intrinsic to language model decoding

digitado ⋅ 14 de January de 2026

arXiv:2601.08061v1 Announce Type: new Abstract: Language models now provide an interface to express and often solve general problems in natural language, yet their ultimate computational capabilities remain a major topic of scientific debate. Unlike a formal computer, a language model is trained to autoregressively predict successive elements in human-generated text. We prove that chaining a language model’s autoregressive output is sufficient to perform universal computation. That is, a language model can simulate the execution of any algorithm on […]

Ver mais

Like 0

Liked Liked

technocracy

Position: Don’t be Afraid of Over-Smoothing And Over-Squashing

digitado ⋅ 13 de January de 2026

arXiv:2601.07419v1 Announce Type: new Abstract: Over-smoothing and over-squashing have been extensively studied in the literature on Graph Neural Networks (GNNs) over the past years. We challenge this prevailing focus in GNN research, arguing that these phenomena are less critical for practical applications than assumed. We suggest that performance decreases often stem from uninformative receptive fields rather than over-smoothing. We support this position with extensive experiments on several standard benchmark datasets, demonstrating that accuracy and over-smoothing are mostly uncorrelated […]

Ver mais

Like 0

Liked Liked