March 2026

Starting Off on the Wrong Foot: Pitfalls in Data Preparation

digitado ⋅ 20 de March de 2026

arXiv:2603.18190v1 Announce Type: new Abstract: When working with real-world insurance data, practitioners often encounter challenges during the data preparation stage that can undermine the statistical validity and reliability of downstream modeling. This study illustrates that conventional data preparation procedures such as random train-test partitioning, often yield unreliable and unstable results when confronted with highly imbalanced insurance loss data. To mitigate these limitations, we propose a novel data preparation framework leveraging two recent statistical advancements: support points for representative […]

Ver mais

Like 0

Liked Liked

technocracy

ResNets of All Shapes and Sizes: Convergence of Training Dynamics in the Large-scale Limit

digitado ⋅ 20 de March de 2026

arXiv:2603.18168v1 Announce Type: new Abstract: We establish convergence of the training dynamics of residual neural networks (ResNets) to their joint infinite depth L, hidden width M, and embedding dimension D limit. Specifically, we consider ResNets with two-layer perceptron blocks in the maximal local feature update (MLU) regime and prove that, after a bounded number of training steps, the error between the ResNet and its large-scale limit is O(1/L + sqrt(D/(L M)) + 1/sqrt(D)). This error rate is empirically […]

Ver mais

Like 0

Liked Liked

technocracy

LLM Use, Cheating, and Academic Integrity in Software Engineering Education

digitado ⋅ 20 de March de 2026

arXiv:2603.17060v2 Announce Type: new Abstract: Background: Cheating in university education is commonly described as context dependent and influenced by assessment design, institutional norms, and student interpretation. In software engineering education, programming oriented coursework has historically involved ambiguity around collaboration, reuse, and external assistance. Recently, large language models (LLMs) have introduced additional mediation in the production of code and related artifacts. Aims: This study investigates how software engineering students describe experiences of using LLMs in ways they perceived as […]

Ver mais

Like 0

Liked Liked

technocracy

HopChain: Multi-Hop Data Synthesis for Generalizable Vision-Language Reasoning

digitado ⋅ 20 de March de 2026

arXiv:2603.17024v2 Announce Type: new Abstract: Vision-language models (VLMs) show strong multimodal capabilities but still struggle with fine-grained vision-language reasoning. We find that long chain-of-thought (CoT) reasoning exposes diverse failure modes, including perception, reasoning, knowledge, and hallucination errors, which can compound across intermediate steps. However, most existing vision-language data used for reinforcement learning with verifiable rewards (RLVR) does not involve complex reasoning chains that rely on visual evidence throughout, leaving these weaknesses largely unexposed. We therefore propose HopChain, a […]

Ver mais

Like 0

Liked Liked

technocracy

Embodied Foundation Models at the Edge: A Survey of Deployment Constraints and Mitigation Strategies

digitado ⋅ 20 de March de 2026

arXiv:2603.16952v2 Announce Type: new Abstract: Deploying foundation models in embodied edge systems is fundamentally a systems problem, not just a problem of model compression. Real-time control must operate within strict size, weight, and power constraints, where memory traffic, compute latency, timing variability, and safety margins interact directly. The Deployment Gauntlet organizes these constraints into eight coupled barriers that determine whether embodied foundation models can run reliably in practice. Across representative edge workloads, autoregressive Vision-Language-Action policies are constrained primarily […]

Ver mais

Like 0

Liked Liked

technocracy

DeepStock: Reinforcement Learning with Policy Regularizations for Inventory Management

digitado ⋅ 20 de March de 2026

Deep Reinforcement Learning (DRL) provides a general-purpose methodology for training inventory policies that can leverage big data and compute. However, off-the-shelf implementations of DRL have seen mixed success, often plagued by high sensitivity to the hyperparameters used during training. In this paper, we show that by imposing policy regularizations, grounded in classical inventory concepts such as "Base Stock", we can significantly accelerate hyperparameter tuning and improve the final performance of several DRL methods. We report details from a […]

Ver mais

Like 0

Liked Liked

technocracy

On Performance Guarantees for Federated Learning with Personalized Constraints

digitado ⋅ 20 de March de 2026

Federated learning (FL) has emerged as a communication-efficient algorithmic framework for distributed learning across multiple agents. While standard FL formulations capture unconstrained or globally constrained problems, many practical settings involve heterogeneous resource or model constraints, leading to optimization problems with agent-specific feasible sets. Here, we study a personalized constrained federated optimization problem in which each agent is associated with a convex local objective and a private constraint set. We propose PC-FedAvg, a method in which each agent maintains […]

Ver mais

Like 0

Liked Liked

technocracy

Decorrelation, Diversity, and Emergent Intelligence: The Isomorphism Between Social Insect Colonies and Ensemble Machine Learning

digitado ⋅ 20 de March de 2026

Social insect colonies and ensemble machine learning methods represent two of the most successful examples of decentralized information processing in nature and computation respectively. Here we develop a rigorous mathematical framework demonstrating that ant colony decision-making and random forest learning are isomorphic under a common formalism of textbf{stochastic ensemble intelligence}. We show that the mechanisms by which genetically identical ants achieve functional differentiation — through stochastic response to local cues and positive feedback — map precisely onto the […]

Ver mais

Like 0

Liked Liked

technocracy

ARMOR: Adaptive Resilience Against Model Poisoning Attacks in Continual Federated Learning for Mobile Indoor Localization

digitado ⋅ 20 de March de 2026

Indoor localization has become increasingly essential for applications ranging from asset tracking to delivering personalized services. Federated learning (FL) offers a privacy-preserving approach by training a centralized global model (GM) using distributed data from mobile devices without sharing raw data. However, real-world deployments require a continual federated learning (CFL) setting, where the GM receives continual updates under device heterogeneity and evolving indoor environments. In such dynamic conditions, erroneous or biased updates can cause the GM to deviate from […]

Ver mais

Like 0

Liked Liked

technocracy

SQLite Tags Benchmark: Comparing 5 Tagging Strategies

digitado ⋅ 20 de March de 2026

Research: SQLite Tags Benchmark: Comparing 5 Tagging Strategies I had Claude Code run a micro-benchmark comparing different approaches to implementing tagging in SQLite. Traditional many-to-many tables won, but FTS5 came a close second. Full table scans with LIKE queries performed better than I expected, but full table scans with JSON arrays and json_each() were much slower. Tags: json, sqlite

Ver mais

Like 0

Liked Liked