digitado

About digitado

https://www.digitado.com.br

Posts by :

Spark-LLM-Eval: A Distributed Framework for Statistically Rigorous Large Language Model Evaluation

digitado ⋅ 1 de April de 2026

arXiv:2603.28769v1 Announce Type: new Abstract: Evaluating large language models at scale remains a practical bottleneck for many organizations. While existing evaluation frameworks work well for thousands of examples, they struggle when datasets grow to hundreds of thousands or millions of samples. This scale is common when assessing model behavior across diverse domains or conducting comprehensive regression testing. We present Spark-LLM-Eval, a distributed evaluation framework built natively on Apache Spark. The system treats evaluation as a data-parallel problem, partitioningexamplesacrossexecutorsandaggregatingresultswithproperstatistical […]

Ver mais

Like 0

Liked Liked

technocracy

CRAFT: Cost-aware Expert Replica Allocation with Fine-Grained Layerwise Estimations

digitado ⋅ 1 de April de 2026

arXiv:2603.28768v1 Announce Type: new Abstract: Mixture-of-Experts (MoE) has recently emerged as the mainstream architecture for efficiently scaling large language models while maintaining near-constant computational cost. Expert parallelism distributes parameters by partitioning experts across devices, but this introduces token-level load imbalance during inference. Expert replication is a widely adopted load-balancing technique in serving frameworks that alleviates load imbalance in large-scale deployments by replicating experts with high loads. In this work, we demonstrate that existing replication schemes often over-replicate, with […]

Ver mais

Like 0

Liked Liked

technocracy

datasette-llm-usage 0.2a0

digitado ⋅ 1 de April de 2026

Release: datasette-llm-usage 0.2a0 Removed features relating to allowances and estimated pricing. These are now the domain of datasette-llm-accountant. Now depends on datasette-llm for model configuration. #3 Full prompts and responses and tool calls can now be logged to the llm_usage_prompt_log table in the internal database if you set the new datasette-llm-usage.log_prompts plugin configuration setting. Redesigned the /-/llm-usage-simple-prompt page, which now requires the llm-usage-simple-prompt permission. Tags: llm, datasette

Ver mais

Like 0

Liked Liked

technocracy

Learning Humanoid Navigation from Human Data

digitado ⋅ 1 de April de 2026

We present EgoNav, a system that enables a humanoid robot to traverse diverse, unseen environments by learning entirely from 5 hours of human walking data, with no robot data or finetuning. A diffusion model predicts distributions of plausible future trajectories conditioned on past trajectory, a 360 deg visual memory fusing color, depth, and semantics, and video features from a frozen DINOv3 backbone that capture appearance cues invisible to depth sensors. A hybrid sampling scheme achieves real-time inference in […]

Ver mais

Like 0

Liked Liked

technocracy

The Loop: How an AI Swarm Surfaced a Governance Limitation, Then Tested the Fix

digitado ⋅ 1 de April de 2026

AgentGate is a runtime accountability layer for AI agents: before an agent can execute a high-impact action, it must lock a bond as collateral. Good outcomes release the bond. Bad outcomes slash it. The mechanism makes bad behavior economically irrational. In March 2026, a coordinated swarm of nine AI agents ran 97 attacks against AgentGate. One team — Beta — spent 48 clean bond cycles building reputation and earned nothing for it. Bond capacity was mathematically enforced but not reputation-gated: a brand-new […]

Ver mais

Like 0

Liked Liked

technocracy

Why Drug Toxicity Can’t Be Predicted in Isolation — Building EIRION with Graph Neural Networks

digitado ⋅ 1 de April de 2026

How we built a graph neural network that finally sees the whole play — not just the audition Every year, drugs that passed early safety tests go on to harm people in ways nobody predicted. Not because the chemistry was wrong. Not because the researchers were careless. But because we kept evaluating drugs the way a talent agent judges an actor from a solo audition tape. Isolated. Out of context. No script. No co-stars. No stage. In real theatre, a performance is never […]

Ver mais

Like 0

Liked Liked

technocracy

The Cognitive Dissonance Agent: Why the Best AI Reasoning Starts With Self-Doubt

digitado ⋅ 1 de April de 2026

Part 1 of 2 – The psychology, the positioning, and the architecture What if the most powerful thing an AI agent could do was not give you an answer but sit with the contradiction? Image generated by the author using Google Gemini For years, we have trained machines to converge upon the answer, reduce uncertainty, and optimise. However, what cognitive science tells us is something we do not have an easy time believing; namely, that the discomfort arising from […]

Ver mais

Like 0

Liked Liked

technocracy

Is convergence always dependent on initial exploration?

digitado ⋅ 1 de April de 2026

I’m new to RL and have been attempting to teach a simulated robot how to travel through randomly generated mazes using DQN. Sometimes when I run my program it quickly diverges into a terrible policy where it just slams into walls unintelligently, but maybe 1/3 of the time it actually learns a pretty decent policy. I’m not changing the code at all. Simply rerunning it and obtaining drastically different behavior. My question is this: Is this unreliability an […]

Ver mais

Like 0

Liked Liked

technocracy

Gradient-Based Data Valuation Improves Curriculum Learning for Game-Theoretic Motion Planning

digitado ⋅ 1 de April de 2026

We demonstrate that gradient-based data valuation produces curriculum orderings that significantly outperform metadata-based heuristics for training game-theoretic motion planners. Specifically, we apply TracIn gradient-similarity scoring to GameFormer on the nuPlan benchmark and construct a curriculum that weights training scenarios by their estimated contribution to validation loss reduction. Across three random seeds, the TracIn-weighted curriculum achieves a mean planning ADE of $1.704pm0.029$,m, significantly outperforming the metadata-based interaction-difficulty curriculum ($1.822pm0.014$,m; paired $t$-test $p=0.021$, Cohen’s $d_z=3.88$) while exhibiting lower variance than […]

Ver mais

Like 0

Liked Liked

technocracy

GUIDE: Reinforcement Learning for Behavioral Action Support in Type 1 Diabetes

digitado ⋅ 1 de April de 2026

Type 1 Diabetes (T1D) management requires continuous adjustment of insulin and lifestyle behaviors to maintain blood glucose within a safe target range. Although automated insulin delivery (AID) systems have improved glycemic outcomes, many patients still fail to achieve recommended clinical targets, warranting new approaches to improve glucose control in patients with T1D. While reinforcement learning (RL) has been utilized as a promising approach, current RL-based methods focus primarily on insulin-only treatment and do not provide behavioral recommendations for […]

Ver mais

Like 0

Liked Liked