technocracy

Beyond Gradient Descent: A Practical Guide to SGD, Momentum, RMSProp, and Adam (with Worked…

digitado ⋅ 11 de January de 2026

Beyond Gradient Descent: A Practical Guide to SGD, Momentum, RMSProp, and Adam (with Worked Examples) When we train machine learning models, we rarely use “vanilla” gradient descent. In practice, we almost always reach for improved variants that converge faster, behave better with noisy gradients, and handle tricky loss landscapes more reliably. Modern training commonly relies on stochastic gradient descent (mini-batch SGD), plus optimizers like Momentum, RMSProp, and Adam. 1) The baseline: Gradient Descent vs. Stochastic (Mini-batch) Gradient Descent 1.1 What […]

Ver mais

Like 0

Liked Liked

technocracy

A Coding Tutorial on OpenMythos on Recurrent-Depth Transformers with Depth Extrapolation, Adaptive Computation, and Mixture-of-Experts Routing

digitado ⋅ 23 de April de 2026

In this tutorial, we explore the implementation of OpenMythos, a theoretical reconstruction of the Claude Mythos architecture that enables deeper reasoning through iterative computation rather than increased parameter size. We build and analyze models using both GQA and MLA attention mechanisms, examine memory efficiency through KV-cache comparisons, and validate stability via the spectral properties of the recurrent update. We then train the model on a structured parity task and investigate how increasing loop depth at inference improves performance […]

Ver mais

Like 0

Liked Liked

technocracy

A Coding Implementation on Microsoft’s OpenMementos with Trace Structure Analysis, Context Compression, and Fine-Tuning Data Preparation

digitado ⋅ 25 de April de 2026

In this tutorial, we work with Microsoft’s OpenMementos dataset and explore how reasoning traces are structured through blocks and mementos in a practical, Colab-ready workflow. We stream the dataset efficiently, parse its special-token format, inspect how reasoning and summaries are organized, and measure the compression provided by the memento representation across different domains. As we move through the analysis, we also visualize dataset patterns, align the streamed format with the richer full subset, simulate inference-time compression, and prepare […]

Ver mais

Like 0

Liked Liked

technocracy

grangersearch: An R Package for Exhaustive Granger Causality Testing with Tidyverse Integration

digitado ⋅ 6 de January de 2026

arXiv:2601.01604v1 Announce Type: cross Abstract: This paper introduces grangersearch, an R package for performing exhaustive Granger causality searches on multiple time series. The package provides: (1) exhaustive pairwise search across multiple variables, (2) automatic lag order optimization with visualization, (3) tidyverse-compatible syntax with pipe operators and non-standard evaluation, and (4) integration with the broom ecosystem through tidy() and glance() methods. The package wraps the vars infrastructure while providing a simple interface for exploratory causal analysis. We describe the […]

Ver mais

Like 0

Liked Liked

technocracy

Learning functional components of PDEs from data using neural networks

digitado ⋅ 13 de February de 2026

Partial differential equations often contain unknown functions that are difficult or impossible to measure directly, hampering our ability to derive predictions from the model. Workflows for recovering scalar PDE parameters from data are well studied: here we show how similar workflows can be used to recover functions from data. Specifically, we embed neural networks into the PDE and show how, as they are trained on data, they can approximate unknown functions with arbitrary accuracy. Using nonlocal aggregation-diffusion equations […]

Ver mais

Like 0

Liked Liked

technocracy

Revisiting OmniAnomaly for Anomaly Detection: performance metrics and comparison with PCA-based models

digitado ⋅ 20 de March de 2026

arXiv:2603.18985v1 Announce Type: new Abstract: Deep learning models have become the dominant approach for multivariate time series anomaly detection (MTSAD), often reporting substantial performance improvements over classical statistical methods. However, these gains are frequently evaluated under heterogeneous thresholding strategies and evaluation protocols, making fair comparisons difficult. This work revisits OmniAnomaly, a widely used stochastic recurrent model for MTSAD, and systematically compares it with a simple linear baseline based on Principal Component Analysis (PCA) on the Server Machine Dataset […]

Ver mais

Like 0

Liked Liked

technocracy

Maximizing Incremental Information Entropy for Contrastive Learning

digitado ⋅ 13 de March de 2026

Contrastive learning has achieved remarkable success in self-supervised representation learning, often guided by information-theoretic objectives such as mutual information maximization. Motivated by the limitations of static augmentations and rigid invariance constraints, we propose IE-CL (Incremental-Entropy Contrastive Learning), a framework that explicitly optimizes the entropy gain between augmented views while preserving semantic consistency. Our theoretical framework reframes the challenge by identifying the encoder as an information bottleneck and proposes a joint optimization of two components: a learnable transformation for […]

Ver mais

Like 0

Liked Liked

technocracy

RAG: A Data Problem Disguised as AI

digitado ⋅ 19 de February de 2026

RAG systems fail because of broken data pipelines, not bad models. Poor chunking orphans meaning. Stale indexes serve outdated answers. Missing metadata can’t filter irrelevant results. Fix Monday: audit your corpus completeness, rebuild indexes regularly, and measure retrieval recall before blaming the LLM. RAG is search with a generative frontend—treat it like one.

Ver mais

Like 0

Liked Liked

technocracy

Mashup Learning: Faster Finetuning by Remixing Past Checkpoints

digitado ⋅ 10 de March de 2026

Finetuning on domain-specific data is a well-established method for enhancing LLM performance on downstream tasks. Training on each dataset produces a new set of model weights, resulting in a multitude of checkpoints saved in-house or on open-source platforms. However, these training artifacts are rarely reused for subsequent experiments despite containing improved model abilities for potentially similar tasks. In this paper, we propose Mashup Learning, a simple method to leverage the outputs of prior training runs to enhance model […]

Ver mais

Like 0

Liked Liked

technocracy

Illocutionary Explanation Planning for Source-Faithful Explanations in Retrieval-Augmented Language Models

digitado ⋅ 9 de April de 2026

arXiv:2604.06211v1 Announce Type: new Abstract: Natural language explanations produced by large language models (LLMs) are often persuasive, but not necessarily scrutable: users cannot easily verify whether the claims in an explanation are supported by evidence. In XAI, this motivates a focus on faithfulness and traceability, i.e., the extent to which an explanation’s claims can be grounded in, and traced back to, an explicit source. We study these desiderata in retrieval-augmented generation (RAG) for programming education, where textbooks provide […]

Ver mais

Like 0

Liked Liked