digitado

About digitado

https://www.digitado.com.br

Posts by :

Beyond Static Benchmarks: Synthesizing Harmful Content via Persona-based Simulation for Robust Evaluation

digitado ⋅ 21 de April de 2026

arXiv:2604.17020v1 Announce Type: new Abstract: Static benchmarks for harmful content detection face limitations in scalability and diversity, and may also be affected by contamination from web-scale pre-training corpora. To address these issues, we propose a framework for synthesizing harmful content, leveraging persona-guided large language model (LLM) agents. Our approach constructs two-dimensional user personas by integrating demographic identities and topical interests with situational harmful strategies, enabling the simulation of diverse and contextually grounded harmful interactions. We evaluate the framework […]

Ver mais

Like 0

Liked Liked

technocracy

Mini-BEHAVIOR-Gran: Revealing U-Shaped Effects of Instruction Granularity on Language-Guided Embodied Agents

digitado ⋅ 21 de April de 2026

arXiv:2604.17019v1 Announce Type: new Abstract: Instruction granularity is an important yet poorly controlled variable in language-guided embodied AI. Existing benchmarks typically pair each task with a single static instruction, making it difficult to study how agent behavior changes when the same task is described at different levels of detail. We introduce Mini-BEHAVIOR-Gran, a new benchmark for controlled studies of instruction granularity that extends Mini-BEHAVIOR with multiple instruction variants per task, ranging from high-level goal descriptions to step-by-step guidance. […]

Ver mais

Like 0

Liked Liked

technocracy

HELO-APR: Enhancing Low-Resource Program Repair through Cross-Lingual Knowledge Transfer

digitado ⋅ 21 de April de 2026

arXiv:2604.17016v1 Announce Type: new Abstract: Large Language Models (LLMs) perform well on automatic program repair (APR) for high-resource programming languages (HRPLs), but their effectiveness drops sharply in low-resource programming languages (LRPLs), due to a lack of sufficient verified buggy-fixed pairs for APR training. To address this challenge, we propose HELO-APR (High-resource Enabled LOw-resource APR), a two-stage APR framework that enables cross-lingual transfer of repair knowledge from HRPLs to LRPLs. HELO-APR (1) constructs high-quality LRPL training data by synthesizing […]

Ver mais

Like 0

Liked Liked

technocracy

False Security Confidence in Benign LLM Code Generation

digitado ⋅ 21 de April de 2026

arXiv:2604.17014v1 Announce Type: new Abstract: Prior work has demonstrated that functionally correct yet vulnerable outputs arise systematically in threat-oriented settings, where adversarial or implicit channels are used to induce security failures in code agents and automated patching workflows. This note introduces a complementary but distinct framing: False Security Confidence (FSC), which studies the same surface phenomenon from a measurement-first perspective in ordinary, non-attack-framed generation tasks. Our interest is not in whether attacks can produce such outputs, but in […]

Ver mais

Like 0

Liked Liked

technocracy

Towards Universal Skeleton-Based Action Recognition

digitado ⋅ 21 de April de 2026

arXiv:2604.17013v1 Announce Type: new Abstract: With the development of robotics, skeleton-based action recognition has become increasingly important, as human-robot interaction requires understanding the actions of humans and humanoid robots. Due to different sources of human skeletons and structures of humanoid robots, skeleton data naturally exhibit heterogeneity. However, previous works overlook the data heterogeneity of skeletons and solely construct models using homogeneous skeletons. Moreover, open-vocabulary action recognition is also essential for real-world applications. To this end, this work studies […]

Ver mais

Like 0

Liked Liked

technocracy

Net Load Forecasting Using Machine Learning with Growing Renewable Power Capacity Features: A Comparative Study of Direct and Indirect Methods

digitado ⋅ 21 de April de 2026

arXiv:2604.17012v1 Announce Type: new Abstract: Renewable energy adoption has increased significantly over the past few years. However, with the increasing adoption of renewable energy, forecasting the net load has become a major challenge due to the inherent uncertainty associated with these renewable sources. To mitigate the impact of uncertainties, this study utilizes long short-term memory (LSTM) model and fully connected neural networks (FCNN) to predict net load based on two independent approaches: the direct method and indirect method. […]

Ver mais

Like 0

Liked Liked

technocracy

Improving LLM Code Reasoning via Semantic Equivalence Self-Play with Formal Verification

digitado ⋅ 21 de April de 2026

arXiv:2604.17010v1 Announce Type: new Abstract: We introduce a self-play framework for semantic equivalence in Haskell, utilizing formal verification to guide adversarial training between a generator and an evaluator. The framework leverages Liquid Haskell proofs for validating equivalence and execution-based counterexamples for inequivalence, organized via a difficulty-aware curriculum. To facilitate this, we release textbf{OpInstruct-HSx}, a synthetic dataset of $approx$28k validated Haskell programs. Empirical experiments show that our evaluator transfers effectively to downstream tasks, achieving up to 13.3pp accuracy gain […]

Ver mais

Like 0

Liked Liked

technocracy

Small Model as Master Orchestrator: Learning Unified Agent-Tool Orchestration with Parallel Subtask Decomposition

digitado ⋅ 21 de April de 2026

arXiv:2604.17009v1 Announce Type: new Abstract: Multi-agent systems (MAS) demonstrate clear advantages in tackling complex problems by coordinating diverse agents and external tools. However, most existing orchestration methods rely on static workflows or serial agent scheduling, and are further constrained by heterogeneous interface protocols between tools and agents. This leads to high system complexity and poor extensibility. To mitigate these issues, we propose Agent-as-Tool, a unified parallel orchestration paradigm that abstracts both agents and tools into a standardized, learnable […]

Ver mais

Like 0

Liked Liked

technocracy

BIASEDTALES-ML: A Multilingual Dataset for Analyzing Narrative Attribute Distributions in LLM-Generated Stories

digitado ⋅ 21 de April de 2026

arXiv:2604.17008v1 Announce Type: new Abstract: Large Language Models (LLMs) are increasingly used to generate narrative content, including children’s stories, which play an important role in social and cultural learning. Despite growing interest in AI safety and alignment, most existing evaluations focus primarily on English, leaving the cross-lingual generalization of aligned behavior underexplored. In this work, we introduce BiasedTales-ML, a large-scale parallel corpus of approximately 350,000 children’s stories generated across eight typologically and culturally diverse languages using a full-permutation […]

Ver mais

Like 0

Liked Liked

technocracy

MobileAgeNet: Lightweight Facial Age Estimation for Mobile Deployment

digitado ⋅ 21 de April de 2026

arXiv:2604.17007v1 Announce Type: new Abstract: Mobile deployment of facial age estimation requires models that balance predictive accuracy with low latency and compact size. In this work, we present MobileAgeNet, a lightweight age-regression framework that achieves an MAE of 4.65 years on the UTKFace held-out test set while maintaining efficient on-device inference with an average latency of 14.4 ms measured using the AI Benchmark application. The model is built on a pretrained MobileNetV3-Large backbone combined with a compact regression […]

Ver mais

Like 0

Liked Liked