January 2026

Getting High-Quality Output from 7B Models: A Production-Grade Prompting Playbook

digitado ⋅ 27 de January de 2026

7B Models: Cheap, Fast… and Brutally Honest About Your Prompting If you’ve deployed a 7B model locally (or on a modest GPU), you already know the trade: Pros low cost low latency easy to self-host Cons patchy world knowledge weaker long-chain reasoning worse instruction-following unstable formatting (“JSON… but not really”) The biggest mistake is expecting 7B models to behave like frontier models. They won’t. But you can get surprisingly high-quality output if you treat prompting like systems design, […]

Ver mais

Like 0

Liked Liked

technocracy

TinyTorch: Building Machine Learning Systems from First Principles

digitado ⋅ 27 de January de 2026

Machine learning systems engineering requires a deep understanding of framework internals. Yet most current education separates algorithms from systems. Students learn gradient descent without measuring memory usage, and attention mechanisms without profiling computational cost. This split leaves graduates unprepared to debug real production failures and widens the gap between machine learning research and reliable deployment. We present TinyTorch, a 20 module curriculum in which students implement the core components of PyTorch, including tensors, autograd, optimizers, and neural networks, […]

Ver mais

Like 0

Liked Liked

technocracy

FloydNet: A Learning Paradigm for Global Relational Reasoning

digitado ⋅ 27 de January de 2026

Developing models capable of complex, multi-step reasoning is a central goal in artificial intelligence. While representing problems as graphs is a powerful approach, Graph Neural Networks (GNNs) are fundamentally constrained by their message-passing mechanism, which imposes a local bottleneck that limits global, holistic reasoning. We argue that dynamic programming (DP), which solves problems by iteratively refining a global state, offers a more powerful and suitable learning paradigm. We introduce FloydNet, a new architecture that embodies this principle. In […]

Ver mais

Like 0

Liked Liked

technocracy

Axe: A Simple Unified Layout Abstraction for Machine Learning Compilers

digitado ⋅ 27 de January de 2026

Scaling modern deep learning workloads demands coordinated placement of data and compute across device meshes, memory hierarchies, and heterogeneous accelerators. We present Axe Layout, a hardware-aware abstraction that maps logical tensor coordinates to a multi-axis physical space via named axes. Axe unifies tiling, sharding, replication, and offsets across inter-device distribution and on-device layouts, enabling collective primitives to be expressed consistently from device meshes to threads. Building on Axe, we design a multi-granularity, distribution-aware DSL and compiler that composes […]

Ver mais

Like 0

Liked Liked

technocracy

Sparse Mixture of Experts for Game AI: An Accidental Architecture

digitado ⋅ 27 de January de 2026

I built a sparse MoE to train ML bots for Color Switch before I knew what one was. LSTM networks trained via PPO would overfit to obstacle subsets and fail to generalize. Routing inputs through clustered ensembles fixed it. The Problem Color Switch is a mobile game where players navigate obstacles by matching colors. I trained bots in a reinforcement learning setting via PPO. Individual networks would learn to pass ~30% of obstacles, then fail on the rest. […]

Ver mais

Like 0

Liked Liked

technocracy

Transformer Learning of Chaotic Collective Dynamics in Many-Body Systems

digitado ⋅ 27 de January de 2026

Learning reduced descriptions of chaotic many-body dynamics is fundamentally challenging: although microscopic equations are Markovian, collective observables exhibit strong memory and exponential sensitivity to initial conditions and prediction errors. We show that a self-attention-based transformer framework provides an effective approach for modeling such chaotic collective dynamics directly from time-series data. By selectively reweighting long-range temporal correlations, the transformer learns a non-Markovian reduced description that overcomes intrinsic limitations of conventional recurrent architectures. As a concrete demonstration, we study the […]

Ver mais

Like 0

Liked Liked

technocracy

Generalized Kuramoto Models: Dynamics on Manifolds, Lie Groups, and Spheres

digitado ⋅ 27 de January de 2026

Table of Links Abstract and 1. Introduction Some recent trends in theoretical ML 2.1 Deep Learning via continuous-time controlled dynamical system 2.2 Probabilistic modeling and inference in DL 2.3 Deep Learning in non-Euclidean spaces 2.4 Physics Informed ML Kuramoto model 3.1 Kuramoto models from the geometric point of view 3.2 Hyperbolic geometry of Kuramoto ensembles 3.3 Kuramoto models with several globally coupled sub-ensembles Kuramoto models on higher-dimensional manifolds 4.1 Non-Abelian Kuramoto models on Lie groups 4.2 Kuramoto models […]

Ver mais

Like 0

Liked Liked

technocracy

C2NP: A Benchmark for Learning Scale-Dependent Geometric Invariances in 3D Materials Generation

digitado ⋅ 27 de January de 2026

Generative models for materials have achieved strong performance on periodic bulk crystals, yet their ability to generalize across scale transitions to finite nanostructures remains largely untested. We introduce Crystal-to-Nanoparticle (C2NP), a systematic benchmark for evaluating generative models when moving between infinite crystalline unit cells and finite nanoparticles, where surface effects and size-dependent distortions dominate. C2NP defines two complementary tasks: (i) generating nanoparticles of specified radii from periodic unit cells, testing whether models capture surface truncation and geometric constraints; […]

Ver mais

Like 0

Liked Liked

technocracy