digitado – Page 356

A Grammar of Machine Learning Workflows

digitado ⋅ 11 de March de 2026

Data leakage affected 294 published papers across 17 scientific fields (Kapoor & Narayanan, 2023). The dominant response has been documentation: checklists, linters, best-practice guides. Documentation does not prevent these failures. This paper proposes a structural remedy: a grammar that decomposes the supervised learning lifecycle into 7 kernel primitives connected by a typed directed acyclic graph (DAG), with four hard constraints that reject the two most damaging leakage classes at call time. The grammar’s core contribution is the terminal […]

Ver mais

Like 0

Liked Liked

technocracy

Laravel 12 Prompts Guide: Prompt Types, Validation, and an Interactive Seeder Generator Example

digitado ⋅ 16 de January de 2026

Key Takeaways Laravel Prompts provides a beautiful, user-friendly interface for command-line applications with zero dependencies The package offers multiple input types including text, password, select, multiselect, confirm, search, and progress bars Laravel 12 includes Prompts natively, making CLI interactions more intuitive and visually appealing Prompts automatically handles validation, error messages, and keyboard navigation Perfect for creating installation wizards, configuration tools, and interactive artisan commands Index Introduction to Laravel Prompts Understanding Laravel Prompts Components Statistics Available Prompt Types Practical […]

Ver mais

Like 0

Liked Liked

technocracy

What If a Tiny Always-On Local AI Became Ambient Infrastructure?

digitado ⋅ 8 de May de 2026

Raising a capable AI agent on a limited local compute budget kaibot — an AI agent running locally on constrained hardware Two years ago I made a deliberate choice to go small on local compute. I swapped out my workstation for a $300 Windows mini PC — AMD Ryzen 5, 8GB RAM (about 5GB actually usable by the time the OS takes its cut), no discrete GPU. The philosophy was simple: everything compute-intensive lives in the cloud. LLMs run on Claude, OpenAI, Gemini, OpenRouter. […]

Ver mais

Like 0

Liked Liked

technocracy

Post-hoc Self-explanation of CNNs

digitado ⋅ 31 de March de 2026

arXiv:2603.28466v1 Announce Type: cross Abstract: Although standard Convolutional Neural Networks (CNNs) can be mathematically reinterpreted as Self-Explainable Models (SEMs), their built-in prototypes do not on their own accurately represent the data. Replacing the final linear layer with a $k$-means-based classifier addresses this limitation without compromising performance. This work introduces a common formalization of $k$-means-based post-hoc explanations for the classifier, the encoder’s final output (B4), and combinations of intermediate feature activations. The latter approach leverages the spatial consistency of […]

Ver mais

Like 0

Liked Liked

technocracy

DISCO: Document Intelligence Suite for COmparative Evaluation

digitado ⋅ 26 de March de 2026

arXiv:2603.23511v1 Announce Type: new Abstract: Document intelligence requires accurate text extraction and reliable reasoning over document content. We introduce textbf{DISCO}, a emph{Document Intelligence Suite for COmparative Evaluation}, that evaluates optical character recognition (OCR) pipelines and vision-language models (VLMs) separately on parsing and question answering across diverse document types, including handwritten text, multilingual scripts, medical forms, infographics, and multi-page documents. Our evaluation shows that performance varies substantially across tasks and document characteristics, underscoring the need for complexity-aware approach selection. […]

Ver mais

Like 0

Liked Liked

technocracy

Glitter: Visualizing Lexical Surprisal for Readability in Administrative Texts

digitado ⋅ 12 de January de 2026

arXiv:2601.05411v1 Announce Type: new Abstract: This work investigates how measuring information entropy of text can be used to estimate its readability. We propose a visualization framework that can be used to approximate information entropy of text using multiple language models and visualize the result. The end goal is to use this method to estimate and improve readability and clarity of administrative or bureaucratic texts. Our toolset is available as a libre software on https://github.com/ufal/Glitter.

Ver mais

Like 0

Liked Liked

technocracy

Using the Path of Least Resistance to Explain Deep Networks

digitado ⋅ 27 de February de 2026

arXiv:2502.12108v3 Announce Type: replace-cross Abstract: Integrated Gradients (IG), a widely used axiomatic path-based attribution method, assigns importance scores to input features by integrating model gradients along a straight path from a baseline to the input. While effective in some cases, we show that straight paths can lead to flawed attributions. In this paper, we identify the cause of these misattributions and propose an alternative approach that equips the input space with a model-induced Riemannian metric (derived from the […]

Ver mais

Like 0

Liked Liked

technocracy

What do near-optimal learning rate schedules look like?

digitado ⋅ 11 de March de 2026

A basic unanswered question in neural network training is: what is the best learning rate schedule shape for a given workload? The choice of learning rate schedule is a key factor in the success or failure of the training process, but beyond having some kind of warmup and decay, there is no consensus on what makes a good schedule shape. To answer this question, we designed a search procedure to find the best shapes within a parameterized schedule […]

Ver mais

Like 0

Liked Liked

technocracy

Geometric Manifold Rectification for Imbalanced Learning

digitado ⋅ 13 de February de 2026

Imbalanced classification presents a formidable challenge in machine learning, particularly when tabular datasets are plagued by noise and overlapping class boundaries. From a geometric perspective, the core difficulty lies in the topological intrusion of the majority class into the minority manifold, which obscures the true decision boundary. Traditional undersampling techniques, such as Edited Nearest Neighbours (ENN), typically employ symmetric cleaning rules and uniform voting, failing to capture the local manifold structure and often inadvertently removing informative minority samples. […]

Ver mais

Like 0

Liked Liked

technocracy

Towards a Theoretical Understanding to the Generalization of RLHF

digitado ⋅ 26 de January de 2026

arXiv:2601.16403v1 Announce Type: new Abstract: Reinforcement Learning from Human Feedback (RLHF) and its variants have emerged as the dominant approaches for aligning Large Language Models with human intent. While empirically effective, the theoretical generalization properties of these methods in high-dimensional settings remain to be explored. To this end, we build the generalization theory on RLHF of LLMs under the linear reward model, through the framework of algorithmic stability. In contrast to the existing works built upon the consistency […]

Ver mais

Like 0

Liked Liked