digitado

Automated structural testing of LLM-based agents: methods, framework, and case studies

digitado ⋅ 28 de January de 2026

arXiv:2601.18827v1 Announce Type: new Abstract: LLM-based agents are rapidly being adopted across diverse domains. Since they interact with users without supervision, they must be tested extensively. Current testing approaches focus on acceptance-level evaluation from the user’s perspective. While intuitive, these tests require manual evaluation, are difficult to automate, do not facilitate root cause analysis, and incur expensive test environments. In this paper, we present methods to enable structural testing of LLM-based agents. Our approach utilizes traces (based on […]

Ver mais

Like 0

Liked Liked

technocracy

Shortcut Learning in Binary Classifier Black Boxes: Applications to Voice Anti-Spoofing and Biometrics

digitado ⋅ 25 de January de 2026

The widespread adoption of deep-learning models in data-driven applications has drawn attention to the potential risks associated with biased datasets and models. Neglected or hidden biases within datasets and models can lead to unexpected results. This study addresses the challenges of dataset bias and explores “shortcut learning” or “Clever Hans effect” in binary classifiers. We propose a novel framework for analyzing the black-box classifiers and for examining the impact of both training and test data on classifier scores. […]

Ver mais

Like 0

Liked Liked

Animation of a flying, flipping microrobot

technocracy

MIT engineers design an aerial microrobot that can fly as fast as a bumblebee

digitado ⋅ 3 de December de 2025

In the future, tiny flying robots could be deployed to aid in the search for survivors trapped beneath the rubble after a devastating earthquake. Like real insects, these robots could flit through tight spaces larger robots can’t reach, while simultaneously dodging stationary obstacles and pieces of falling rubble. So far, aerial microrobots have only been able to fly slowly along smooth trajectories, far from the swift, agile flight of real insects — until now. MIT researchers have demonstrated […]

Ver mais

Like 0

Liked Liked

technocracy

LLM Powered Autonomous Agents

digitado ⋅ 23 de June de 2023

Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver. Agent System Overview In a LLM-powered autonomous agent system, LLM functions as the agent’s brain, complemented by several key components: Planning Subgoal and decomposition: The agent breaks […]

Ver mais

Like 0

Liked Liked

technocracy

Conditional Distribution Compression via the Kernel Conditional Mean Embedding

digitado ⋅ 19 de January de 2026

arXiv:2504.10139v4 Announce Type: replace Abstract: Existing distribution compression methods, like Kernel Herding (KH), were originally developed for unlabelled data. However, no existing approach directly compresses the conditional distribution of textit{labelled} data. To address this gap, we first introduce the Average Maximum Conditional Mean Discrepancy (AMCMD), a metric for comparing conditional distributions, and derive a closed form estimator. Next, we make a key observation: in the context of distribution compression, the cost of constructing a compressed set targeting the […]

Ver mais

Like 0

Liked Liked

technocracy

Language Models Entangle Language and Culture

digitado ⋅ 23 de January de 2026

arXiv:2601.15337v1 Announce Type: new Abstract: Users should not be systemically disadvantaged by the language they use for interacting with LLMs; i.e. users across languages should get responses of similar quality irrespective of language used. In this work, we create a set of real-world open-ended questions based on our analysis of the WildChat dataset and use it to evaluate whether responses vary by language, specifically, whether answer quality depends on the language used to query the model. We also […]

Ver mais

Like 0

Liked Liked

technocracy

U.S. Virgin Islands Lawsuit Finally Calls Time On Meta’s Profitable Scam Ad Machine

digitado ⋅ 2 de January de 2026

Meta’s Scam Ads Are Finally Being Challenged — And It’s Long Overdue After years of warnings from consumer advocates, regulators and defrauded users, Meta Platforms is finally being dragged into court over what critics say has been an open-secret business model: knowingly allowing scam advertisements to run across Facebook and Instagram in the name of profit. The U.S. Virgin Islands has filed a lawsuit against the social media giant, alleging Meta deliberately profited from scam advertising while publicly […]

Ver mais

Like 0

Liked Liked

technocracy

How to Build a DAO from Scratch with Solidity and Foundry, Part 1: Designing the Governance Token

digitado ⋅ 16 de January de 2026

A DAO (Decentralized Autonomous Organization) is a system that enables collective decision-making through code, without relying on traditional organizational hierarchies such as boards of directors, CEOs, or CTOs. Instead of trust in individuals or institutions, DAOs rely on smart contracts deployed on a blockchain. At its core, a DAO allows participants to propose, vote, and execute decisions in a transparent and verifiable way. Voting power is typically derived from tokens held by participants, where each token represents a unit of voting weight. A typical on-chain DAO is composed […]

Ver mais

Like 0

Liked Liked

technocracy

Gamifying Cyber Governance: A Virtual Escape Room to Transform Cybersecurity Policy Education

digitado ⋅ 19 de January de 2026

arXiv:2601.10852v1 Announce Type: new Abstract: Serious games are gaining popularity as effective teaching and learning tools, providing engaging, interactive, and practical experiences for students. Gamified learning experiences, such as virtual escape rooms, have emerged as powerful tools in bridging theory and practice, fostering deeper understanding and engagement among students. This paper presents the design, implementation, and evaluation of a virtual escape room tailored specifically for cybersecurity governance and policy education. Developed as a 3D immersive environment, the escape […]

Ver mais

Like 0

Liked Liked

technocracy

How many steps are needed to show progress in locomotion?

digitado ⋅ 15 de January de 2026

My problem is such: I have to use the cpu to train my agent , so running 1600 steps per episode on bipedalwalker, half cheetah etc is out of the question. Are 200 steps fine as a starter point ( assuming the agent can get a score 300 for 1600 steps, that would set the score at 37.5 for 200 steps) so if the agent is able to get to 40 score then for testing I could just […]

Ver mais

Like 0

Liked Liked