technocracy

Pretrain Your Own RoBERTa model from Scratch

digitado ⋅ 27 de April de 2025

Masked Language Models (MLMs) like BERT, RoBERTa, and MPNet have revolutionized the way we understand and process language. These models are foundational for tasks such as text classification, named-entity recognition (NER), and many other NLP applications where the entire input sequence matters. But what if you want to create your own MLM — tailored to your specific domain, like legal documents, medical texts, or tweets? Langformers makes this process straightforward, flexible, and efficient! In this guide, you’ll learn […]

Ver mais

Like 0

Liked Liked

technocracy

Shattered Compositionality: Counterintuitive Learning Dynamics of Transformers for Arithmetic

digitado ⋅ 30 de January de 2026

Large language models (LLMs) often exhibit unexpected errors or unintended behavior, even at scale. While recent work reveals the discrepancy between LLMs and humans in skill compositions, the learning dynamics of skill compositions and the underlying cause of non-human behavior remain elusive. In this study, we investigate the mechanism of learning dynamics by training transformers on synthetic arithmetic tasks. Through extensive ablations and fine-grained diagnostic metrics, we discover that transformers do not reliably build skill compositions according to […]

Ver mais

Like 0

Liked Liked

technocracy

Grables: Tabular Learning Beyond Independent Rows

digitado ⋅ 3 de February de 2026

Tabular learning is still dominated by row-wise predictors that score each row independently, which fits i.i.d. benchmarks but fails on transactional, temporal, and relational tables where labels depend on other rows. We show that row-wise prediction rules out natural targets driven by global counts, overlaps, and relational patterns. To make "using structure" precise across architectures, we introduce grables: a modular interface that separates how a table is lifted to a graph (constructor) from how predictions are computed on […]

Ver mais

Like 0

Liked Liked

technocracy

Forecasting Energy Consumption using Recurrent Neural Networks: A Comparative Analysis

digitado ⋅ 27 de January de 2026

arXiv:2601.17110v1 Announce Type: new Abstract: Accurate short-term energy consumption forecasting is essential for efficient power grid management, resource allocation, and market stability. Traditional time-series models often fail to capture the complex, non-linear dependencies and external factors affecting energy demand. In this study, we propose a forecasting approach based on Recurrent Neural Networks (RNNs) and their advanced variant, Long Short-Term Memory (LSTM) networks. Our methodology integrates historical energy consumption data with external variables, including temperature, humidity, and time-based features. […]

Ver mais

Like 0

Liked Liked

technocracy

On the Complexity of Offline Reinforcement Learning with $Q^star$-Approximation and Partial Coverage

digitado ⋅ 12 de February de 2026

We study offline reinforcement learning under $Q^star$-approximation and partial coverage, a setting that motivates practical algorithms such as Conservative $Q$-Learning (CQL; Kumar et al., 2020) but has received limited theoretical attention. Our work is inspired by the following open question: "Are $Q^star$-realizability and Bellman completeness sufficient for sample-efficient offline RL under partial coverage?" We answer in the negative by establishing an information-theoretic lower bound. Going substantially beyond this, we introduce a general framework that characterizes the intrinsic complexity […]

Ver mais

Like 0

Liked Liked

technocracy

A Latent World Model Framework for Modeling Implied Volatility Surfaces

digitado ⋅ 19 de January de 2026

Visualizing Risk: A Latent World Model for Financial Crisis Hedging Introduction Financial markets have traditionally been understood through parametric models and stochastic calculus. From Black-Scholes to Heston, quantitative finance relies on mathematical frameworks that treat volatility either as a scalar parameter or as an abstract stochastic process. However, traders often describe recognizing patterns in volatility surfaces visually, identifying structural changes through spatial features like skew steepness and term structure shape. This suggests that volatility surfaces contain rich visual information […]

Ver mais

Like 0

Liked Liked

technocracy

Causality-Driven Disentangled Representation Learning in Multiplex Graphs

digitado ⋅ 25 de March de 2026

Learning representations from multiplex graphs, i.e., multi-layer networks where nodes interact through multiple relation types, is challenging due to the entanglement of shared (common) and layer-specific (private) information, which limits generalization and interpretability. In this work, we introduce a causal inference-based framework that disentangles common and private components in a self-supervised manner. CaDeM jointly (i) aligns shared embeddings across layers, (ii) enforces private embeddings to capture layer-specific signals, and (iii) applies backdoor adjustment to ensure that the common […]

Ver mais

Like 0

Liked Liked

technocracy

Never-before-seen Linux malware is “far more advanced than typical”

digitado ⋅ 13 de January de 2026

Researchers have discovered a never-before-seen framework that infects Linux machines with a wide assortment of modules that are notable for the range of advanced capabilities they provide to attackers. The framework, referred to as VoidLink by its source code, features more than 30 modules that can be used to customize capabilities to meet attackers’ needs for each infected machine. These modules can provide additional stealth and specific tools for reconnaissance, privilege escalation, and lateral movement inside a compromised […]

Ver mais

Like 0

Liked Liked

technocracy

Prompt Debt: Building a Practical Workflow; 70% of Our AI Prompts Weren’t Design Work; They Were…

digitado ⋅ 20 de April de 2026

Prompt Debt: Building a Practical Workflow; 70% of Our AI Prompts Weren’t Design Work; They Were ‘Damage Control.’ The old design loop was boring? hmm.. You had a vision. You opened Figma. You shaped, implemented critique, and you shipped. Now you: Open Figma → Claude → link an MCP → Pull in a design system → write a prompt. Ninety seconds later, something that would have taken a week in 2022 is on your screen, and the future finally feels […]

Ver mais

Like 0

Liked Liked

technocracy

Jacobian Scopes: token-level causal attributions in LLMs

digitado ⋅ 26 de January de 2026

arXiv:2601.16407v1 Announce Type: new Abstract: Large language models (LLMs) make next-token predictions based on clues present in their context, such as semantic descriptions and in-context examples. Yet, elucidating which prior tokens most strongly influence a given prediction remains challenging due to the proliferation of layers and attention heads in modern architectures. We propose Jacobian Scopes, a suite of gradient-based, token-level causal attribution methods for interpreting LLM predictions. By analyzing the linearized relations of final hidden state with respect […]

Ver mais

Like 0

Liked Liked