March 2026

ReLope: KL-Regularized LoRA Probes for Multimodal LLM Routing

digitado ⋅ 27 de March de 2026

arXiv:2603.24787v1 Announce Type: new Abstract: Routing has emerged as a promising strategy for balancing performance and cost in large language model (LLM) systems that combine lightweight models with powerful but expensive large models. Recent studies show that emph{probe routing}, which predicts the correctness of a small model using its hidden states, provides an effective solution in text-only LLMs. However, we observe that these probes degrade substantially when applied to multimodal LLMs (MLLMs). Through empirical analysis, we find that […]

Ver mais

Like 0

Liked Liked

technocracy

Cyber-Physical System Design Space Exploration for Affordable Precision Agriculture

digitado ⋅ 27 de March de 2026

arXiv:2603.24785v1 Announce Type: new Abstract: Precision agriculture promises higher yields and sustainability, but adoption is slowed by the high cost of cyber-physical systems (CPS) and the lack of systematic design methods. We present a cost-aware design space exploration (DSE) framework for multimodal drone-rover platforms to integrate budget, energy, sensing, payload, computation, and communication constraints. Using integer linear programming (ILP) with SAT-based verification, our approach trades off among cost, coverage, and payload while ensuring constraint compliance and a multitude […]

Ver mais

Like 0

Liked Liked

technocracy

Transformers in the Dark: Navigating Unknown Search Spaces via Bandit Feedback

digitado ⋅ 27 de March de 2026

arXiv:2603.24780v1 Announce Type: new Abstract: Effective problem solving with Large Language Models (LLMs) can be enhanced when they are paired with external search algorithms. By viewing the space of diverse ideas and their follow-up possibilities as a tree structure, the search algorithm can navigate such a search space and guide the LLM toward better solutions more efficiently. While the search algorithm enables an effective balance between exploitation and exploration of a tree-structured space, the need for an external […]

Ver mais

Like 0

Liked Liked

technocracy

AIP: Agent Identity Protocol for Verifiable Delegation Across MCP and A2A

digitado ⋅ 27 de March de 2026

arXiv:2603.24775v1 Announce Type: new Abstract: AI agents increasingly call tools via the Model Context Protocol (MCP) and delegate to other agents via Agent-to-Agent (A2A), yet neither protocol verifies agent identity. A scan of approximately 2,000 MCP servers found all lacked authentication. In our survey, we did not identify a prior implemented protocol that jointly combines public-key verifiable delegation, holder-side attenuation, expressive chained policy, transport bindings across MCP/A2A/HTTP, and provenance-oriented completion records. We introduce Invocation-Bound Capability Tokens (IBCTs), a […]

Ver mais

Like 0

Liked Liked

technocracy

From Untestable to Testable: Metamorphic Testing in the Age of LLMs

digitado ⋅ 27 de March de 2026

arXiv:2603.24774v1 Announce Type: new Abstract: This article discusses the challenges of testing software systems with increasingly integrated AI and LLM functionalities. LLMs are powerful but unreliable, and labeled ground truth for testing rarely scales. Metamorphic Testing solves this by turning relations among multiple test executions into executable test oracles.

Ver mais

Like 0

Liked Liked

technocracy

Bridging the Gap Between Agility and Planning

digitado ⋅ 27 de March de 2026

arXiv:2603.24773v1 Announce Type: new Abstract: Milestone Driven Agile Execution is a hybrid management framework where the empirical control component of agile development is retained but the prioritization of the backlog is done according to a macro or strategic (milestone) plan that drives the execution of the project. MDAX is method agnostic, in the sense that the development approach is not embedded in the execution mechanism but in the plan that drives it. This allows organizations using it to […]

Ver mais

Like 0

Liked Liked

technocracy

Evaluating Fine-Tuned LLM Model For Medical Transcription With Small Low-Resource Languages Validated Dataset

digitado ⋅ 27 de March de 2026

arXiv:2603.24772v1 Announce Type: new Abstract: Clinical documentation is a critical factor for patient safety, diagnosis, and continuity of care. The administrative burden of EHRs is a significant factor in physician burnout. This is a critical issue for low-resource languages, including Finnish. This study aims to investigate the effectiveness of a domain-aligned natural language processing (NLP); large language model for medical transcription in Finnish by fine-tuning LLaMA 3.1-8B on a small validated corpus of simulated clinical conversations by students […]

Ver mais

Like 0

Liked Liked

technocracy

DRoPS: Dynamic 3D Reconstruction of Pre-Scanned Objects

digitado ⋅ 27 de March de 2026

arXiv:2603.24770v1 Announce Type: new Abstract: Dynamic scene reconstruction from casual videos has seen recent remarkable progress. Numerous approaches have attempted to overcome the ill-posedness of the task by distilling priors from 2D foundational models and by imposing hand-crafted regularization on the optimized motion. However, these methods struggle to reconstruct scenes from extreme novel viewpoints, especially when highly articulated motions are present. In this paper, we present DRoPS, a novel approach that leverages a static pre-scan of the dynamic […]

Ver mais

Like 0

Liked Liked

technocracy

Supervising Ralph Wiggum: Exploring a Metacognitive Co-Regulation Agentic AI Loop for Engineering Design

digitado ⋅ 27 de March de 2026

arXiv:2603.24768v1 Announce Type: new Abstract: The engineering design research community has studied agentic AI systems that use Large Language Model (LLM) agents to automate the engineering design process. However, these systems are prone to some of the same pathologies that plague humans. Just as human designers, LLM design agents can fixate on existing paradigms and fail to explore alternatives when solving design challenges, potentially leading to suboptimal solutions. In this work, we propose (1) a novel Self-Regulation Loop […]

Ver mais

Like 0

Liked Liked

technocracy

Fine-Tuning A Large Language Model for Systematic Review Screening

digitado ⋅ 27 de March de 2026

arXiv:2603.24767v1 Announce Type: new Abstract: Systematic reviews traditionally have taken considerable amounts of human time and energy to complete, in part due to the extensive number of titles and abstracts that must be reviewed for potential inclusion. Recently, researchers have begun to explore how to use large language models (LLMs) to make this process more efficient. However, research to date has shown inconsistent results. We posit this is because prompting alone may not provide sufficient context for the […]

Ver mais

Like 0

Liked Liked