digitado

Test-Driven AI Agent Definition (TDAD): Compiling Tool-Using Agents from Behavioral Specifications

digitado ⋅ 11 de March de 2026

arXiv:2603.08806v1 Announce Type: new Abstract: We present Test-Driven AI Agent Definition (TDAD), a methodology that treats agent prompts as compiled artifacts: engineers provide behavioral specifications, a coding agent converts them into executable tests, and a second coding agent iteratively refines the prompt until tests pass. Deploying tool-using LLM agents in production requires measurable behavioral compliance that current development practices cannot provide. Small prompt changes cause silent regressions, tool misuse goes undetected, and policy violations emerge only after deployment. […]

Ver mais

Like 0

Liked Liked

technocracy

Introducing Nova Forge SDK, a seamless way to customize Nova models for enterprise AI

digitado ⋅ 18 de March de 2026

Large language models (LLMs) have transformed how we interact with AI, but one size doesn’t fit at all. Out-of-the-box LLMs are trained with broad, general knowledge and improved for a wide range of use cases, but they often fall short when it comes to domain-specific tasks, proprietary workflows, or unique business requirements. Enterprise customers increasingly need specialized LLMs that deeply understand their proprietary data, business processes, and domain-specific terminology. Without customization, you’re forced to choose between accepting generic […]

Ver mais

Like 0

Liked Liked

technocracy

Don’t Always Pick the Highest-Performing Model: An Information Theoretic View of LLM Ensemble Selection

digitado ⋅ 10 de February de 2026

arXiv:2602.08003v1 Announce Type: cross Abstract: Large language models (LLMs) are often ensembled together to improve overall reliability and robustness, but in practice models are strongly correlated. This raises a fundamental question: which models should be selected when forming an LLM ensemble? We formulate budgeted ensemble selection as maximizing the mutual information between the true label and predictions of the selected models. Furthermore, to explain why performance can saturate even with many models, we model the correlated errors of […]

Ver mais

Like 0

Liked Liked

technocracy

Two-dimensional Entanglement-assisted Quantum Quasi-cyclic Low-density Parity-check Codes

digitado ⋅ 15 de January de 2026

arXiv:2601.08927v1 Announce Type: new Abstract: For any positive integer $g ge 2$, we derive general conditions for the existence of a $2g$-cycle in the Tanner graph of two-dimensional ($2$-D) classical quasi-cyclic (QC) low-density parity-check (LDPC) codes. Based on these conditions, we construct a family of $2$-D classical QC-LDPC codes with girth greater than $4$ by stacking $p times p times p$ tensors, where $p$ is an odd prime. Furthermore, for composite values of $p$, we propose two additional […]

Ver mais

Like 0

Liked Liked

technocracy

VegaChat: A Robust Framework for LLM-Based Chart Generation and Assessment

digitado ⋅ 23 de January de 2026

arXiv:2601.15385v1 Announce Type: new Abstract: Natural-language-to-visualization (NL2VIS) systems based on large language models (LLMs) have substantially improved the accessibility of data visualization. However, their further adoption is hindered by two coupled challenges: (i) the absence of standardized evaluation metrics makes it difficult to assess progress in the field and compare different approaches; and (ii) natural language descriptions are inherently underspecified, so multiple visualizations may be valid for the same query. To address these issues, we introduce VegaChat, a […]

Ver mais

Like 0

Liked Liked

technocracy

Simplify ModelOps with Amazon SageMaker AI Projects using Amazon S3-based templates

digitado ⋅ 30 de January de 2026

Managing ModelOps workflows can be complex and time-consuming. If you’ve struggled with setting up project templates for your data science team, you know that the previous approach using AWS Service Catalog required configuring portfolios, products, and managing complex permissions—adding significant administrative overhead before your team could start building machine learning (ML) pipelines. Amazon SageMaker AI Projects now offers an easier path: Amazon S3 based templates. With this new capability, you can store AWS CloudFormation templates directly in Amazon […]

Ver mais

Like 0

Liked Liked

technocracy

Searching of Training Images with Rich Features Required for Generalization Performance of CNN Models Using Interactive Genetic Algorithms

digitado ⋅ 20 de April de 2026

Selecting training parameters for convolutional neural networks (CNNs) and determining the amount of training data required for reliable generalization remain challenging and often time-consuming tasks, typically relying on manual trial-and-error. While genetic algorithms (GAs) have been applied to hyperparameter tuning, less attention has been given to how the proportion of training data influences generalization performance. In this study, we propose an interactive GA-based framework that simultaneously optimizes key training parameters and the image usage rate, defined as the […]

Ver mais

Like 0

Liked Liked

technocracy

Bandit Allocational Instability

digitado ⋅ 10 de February de 2026

arXiv:2602.07472v1 Announce Type: cross Abstract: When multi-armed bandit (MAB) algorithms allocate pulls among competing arms, the resulting allocation can exhibit huge variation. This is particularly harmful in modern applications such as learning-enhanced platform operations and post-bandit statistical inference. Thus motivated, we introduce a new performance metric of MAB algorithms termed allocation variability, which is the largest (over arms) standard deviation of an arm’s number of pulls. We establish a fundamental trade-off between allocation variability and regret, the canonical […]

Ver mais

Like 0

Liked Liked

technocracy

Joint Parameter and State-Space Bayesian Optimization: Using Process Expertise to Accelerate Manufacturing Optimization

digitado ⋅ 23 de February de 2026

arXiv:2602.17679v1 Announce Type: new Abstract: Bayesian optimization (BO) is a powerful method for optimizing black-box manufacturing processes, but its performance is often limited when dealing with high-dimensional multi-stage systems, where we can observe intermediate outputs. Standard BO models the process as a black box and ignores the intermediate observations and the underlying process structure. Partially Observable Gaussian Process Networks (POGPN) model the process as a Directed Acyclic Graph (DAG). However, using intermediate observations is challenging when the observations […]

Ver mais

Like 0

Liked Liked

technocracy

Ultrametric OGP – parametric RDT emph{symmetric} binary perceptron connection

digitado ⋅ 22 de April de 2026

arXiv:2604.19712v1 Announce Type: cross Abstract: In [97,99,100], an fl-RDT framework is introduced to characterize emph{statistical computational gaps} (SCGs). Studying emph{symmetric binary perceptrons} (SBPs), [100] obtained an emph{algorithmic} threshold estimate $alpha_aapprox alpha_c^{(7)}approx 1.6093$ at the 7th lifting level (for $kappa=1$ margin), closely approaching $1.58$ local entropy (LE) prediction [18]. In this paper, we further connect parametric RDT to overlap gap properties (OGPs), another key geometric feature of the solution space. Specifically, for any positive integer $s$, we consider $s$-level […]

Ver mais

Like 0

Liked Liked