digitado – Page 587

Scenario-Adaptive Evaluation of Trustworthy Fine-Tuned Text Models Across Knowledge-Grounded Generation and Misinformation Detection

digitado ⋅ 11 de May de 2026

Large language models (LLMs) increasingly require robust evaluation under realistic instruction-following conditions, particularly for fine-tuned task-specific adapters operating in multilingual environments. This study proposes a scenario-adaptive evaluation framework for assessing the reliability of fine-tuned text models across two application regimes: misinformation detection (disinfo) and knowledge-grounded factual biography generation (heroes). The framework integrates automated generation of balanced risk-oriented scenarios, bilingual evaluation in English and Ukrainian, the LLM-as-a-Judge paradigm, and multidimensional robustness analysis through the Alignment Robustness Index (ARI). Six […]

Ver mais

Like 0

Liked Liked

technocracy

Almost Sure Convergence of Differential Temporal Difference Learning for Average Reward Markov Decision Processes

digitado ⋅ 18 de February de 2026

The average reward is a fundamental performance metric in reinforcement learning (RL) focusing on the long-run performance of an agent. Differential temporal difference (TD) learning algorithms are a major advance for average reward RL as they provide an efficient online method to learn the value functions associated with the average reward in both on-policy and off-policy settings. However, existing convergence guarantees require a local clock in learning rates tied to state visit counts, which practitioners do not use […]

Ver mais

Like 0

Liked Liked

technocracy

This New Decomposition Framework Makes Multi-Agent Pathfinding More Scalable

digitado ⋅ 18 de February de 2026

:::info Authors: Zhuo Yao Wei Wang ::: Table Of Links ABSTRACT I. INTRODUCTION II. RELATED WORKS III. PRELIMINARIES IV. METHODOLOGY V. RESULTS OF DECOMPOSITION VI. RESULTS OF DECOMPOSITION’S APPLICATION VII. CONCLUSION, ACKNOWLEDGMENTS AND REFERENCES ABSTRACT Generally, the calculation and memory space required for multi-agent path finding (MAPF) grows exponentially as the number of agents increases. This often results in some MAPF instances being unsolvable under limited computational resources and memory space, thereby limiting the application of MAPF in […]

Ver mais

Like 0

Liked Liked

technocracy

How do you test AI agents in production? The unpredictability is overwhelming.[D]

digitado ⋅ 27 de April de 2026

I’ve been in QA for almost a decade. My mental model for quality was always: given input X, assert output Y. Now I’m on a team that’s shipping an LLM-based agent that handles multi-step tasks. I genuinely do not know how to test this in a way that feels rigorous. The thing works. But the output isn’t deterministic. The same input can produce different reasoning chains across runs. Hell even with temp=0 I see variation in tool selection […]

Ver mais

Like 0

Liked Liked

technocracy

How AI-Powered Demand Sensing Is Transforming Real-Time Supply Chain Planning

digitado ⋅ 10 de April de 2026

Traditionally, supply chains operated within reasonably stable and predictable demand environments. Businesses utilized the historical sales pattern to plan for production, inventory, and logistics. These were usually based on historical data and monthly forecasts. This has changed significantly. Due to the pace of change in consumer buying patterns; increased spikes in demand due to promotional activity; and the conditions occurring outside of the control of businesses (e.g., weather-related disruptions, economic indicators), the way in which consumers purchase goods […]

Ver mais

Like 0

Liked Liked

technocracy

Energy-Aware Reinforcement Learning for Robotic Manipulation of Articulated Components in Infrastructure Operation and Maintenance

digitado ⋅ 16 de February de 2026

arXiv:2602.12288v1 Announce Type: new Abstract: With the growth of intelligent civil infrastructure and smart cities, operation and maintenance (O&M) increasingly requires safe, efficient, and energy-conscious robotic manipulation of articulated components, including access doors, service drawers, and pipeline valves. However, existing robotic approaches either focus primarily on grasping or target object-specific articulated manipulation, and they rarely incorporate explicit actuation energy into multi-objective optimisation, which limits their scalability and suitability for long-term deployment in real O&M settings. Therefore, this paper […]

Ver mais

Like 0

Liked Liked

technocracy

How to Fine-Tune an LLM: SFT, LoRA, QLoRA and DPO Explained

digitado ⋅ 17 de May de 2026

This blog post discusses the details of what finetuning is, why it’s needed, and how we can finetune an LLM model with practical examples. The fine-tuning is what brings life to the LLM model. It’s a technique to make models adapt to a specific task, such as coding, writing poems or songs, classifying objects in an image, etc. A typical lifecycle of LLM training is depicted below. In the pretraining model, it only learns to predict the next token, […]

Ver mais

Like 0

Liked Liked

technocracy

The Productivity Trap Engineers Fall Into: Work That Never Ships

digitado ⋅ 5 de February de 2026

There’s a very specific kind of productivity that only engineers know. It looks like progress. It feels like progress. It even *logs* like progress. But nothing ships. I know this because I’ve lived there. For weeks. Sometimes months. Writing code. Rewriting code. Refactoring code that hasn’t offended anyone yet. Stashing changes like they’re radioactive. Telling myself, *“I’m being careful.”* No. I was being scared. Let’s call it what it is. This isn’t engineering. This is avoidance in a […]

Ver mais

Like 0

Liked Liked

technocracy

Beyond Passive Aggregation: Active Auditing and Topology-Aware Defense in Decentralized Federated Learning

digitado ⋅ 19 de March de 2026

Decentralized Federated Learning (DFL) remains highly vulnerable to adaptive backdoor attacks designed to bypass traditional passive defense metrics. To address this limitation, we shift the defensive paradigm toward a novel active, interventional auditing framework. First, we establish a dynamical model to characterize the spatiotemporal diffusion of adversarial updates across complex graph topologies. Second, we introduce a suite of proactive auditing metrics, stochastic entropy anomaly, randomized smoothing Kullback-Leibler divergence, and activation kurtosis. These metrics utilize private probes to stress-test […]

Ver mais

Like 0

Liked Liked

technocracy

Flow Matching with Injected Noise for Offline-to-Online Reinforcement Learning

digitado ⋅ 20 de February de 2026

Generative models have recently demonstrated remarkable success across diverse domains, motivating their adoption as expressive policies in reinforcement learning (RL). While they have shown strong performance in offline RL, particularly where the target distribution is well defined, their extension to online fine-tuning has largely been treated as a direct continuation of offline pre-training, leaving key challenges unaddressed. In this paper, we propose Flow Matching with Injected Noise for Offline-to-Online RL (FINO), a novel method that leverages flow matching-based […]

Ver mais

Like 0

Liked Liked