digitado

Failure-Aware RL: Reliable Offline-to-Online Reinforcement Learning with Self-Recovery for Real-World Manipulation

digitado ⋅ 12 de January de 2026

Post-training algorithms based on deep reinforcement learning can push the limits of robotic models for specific objectives, such as generalizability, accuracy, and robustness. However, Intervention-requiring Failures (IR Failures) (e.g., a robot spilling water or breaking fragile glass) during real-world exploration happen inevitably, hindering the practical deployment of such a paradigm. To tackle this, we introduce Failure-Aware Offline-to-Online Reinforcement Learning (FARL), a new paradigm minimizing failures during real-world reinforcement learning. We create FailureBench, a benchmark that incorporates common failure […]

Ver mais

Like 0

Liked Liked

technocracy

Recursive Knowledge Synthesis for Multi-LLM Systems: Stability Analysis and Tri-Agent Audit Framework

digitado ⋅ 15 de January de 2026

arXiv:2601.08839v1 Announce Type: new Abstract: This paper presents a tri-agent cross-validation framework for analyzing stability and explainability in multi-model large language systems. The architecture integrates three heterogeneous LLMs-used for semantic generation, analytical consistency checking, and transparency auditing-into a recursive interaction cycle. This design induces Recursive Knowledge Synthesis (RKS), where intermediate representations are continuously refined through mutually constraining transformations irreducible to single-model behavior. Across 47 controlled trials using public-access LLM deployments (October 2025), we evaluated system stability via four […]

Ver mais

Like 0

Liked Liked

technocracy

Prompt Management Using Jinja

digitado ⋅ 24 de January de 2026

A guide on how Jinja2 templates can be used to manage prompts Image generated using ChatGPT This article is free to read. But, your contributions help me keep studying and creating content for you 😊. Introduction How do we usually manage prompts? We mostly rely on static prompts embedded within the code with a few custom variables here and there. The standard practice is to retype the whole prompt manually when making a change. As the system scales up, it becomes a […]

Ver mais

Like 0

Liked Liked

technocracy

Statistical description and dimension reduction of continuous time categorical trajectories with multivariate functional principal components

digitado ⋅ 6 de February de 2026

arXiv:2502.09986v4 Announce Type: replace-cross Abstract: Getting tools that allow simple representations and comparisons of a set of categorical trajectories is of major interest for statisticians. Without loosing any information, we associate to each state a binary random indicator function, taking values in ${0,1}$, and turn the problem of statistical description of the categorical trajectories into a multivariate functional principal components analysis. This viewpoint encompasses experimental frameworks where two or more states can be observed simultaneously. The sample paths […]

Ver mais

Like 0

Liked Liked

technocracy

Topological Collapse: Persistent Localization of Cryptographic Preimages in Deep Neural Manifolds incl: Appendix A – C

digitado ⋅ 23 de February de 2026

We demonstrate deterministic localization of cryptographic hash preimages within specific layers of deep neural networks trained on information-geometric principles. Using a modified Spin-Glass architecture, MD5 and SHA-256 password preimages are consistently identified in layers ES15-ES20 with >90% accuracy for passwords and >85% for hash values. Analysis reveals linear scaling where longer passwords occupy proportionally expanded layer space, with systematic replication in higher-dimensional layers showing exact topological correspondence.Critically, independent network runs with fresh initialization maintain 41.8% information persistence across […]

Ver mais

Like 0

Liked Liked

technocracy

Charting Empirical Laws for LLM Fine-Tuning in Scientific Multi-Discipline Learning

digitado ⋅ 13 de February de 2026

arXiv:2602.11215v1 Announce Type: new Abstract: While large language models (LLMs) have achieved strong performance through fine-tuning within individual scientific domains, their learning dynamics in multi-disciplinary contexts remains poorly understood, despite the promise of improved generalization and broader applicability through cross-domain knowledge synergy. In this work, we present the first systematic study of multi-disciplinary LLM fine-tuning, constructing a five-discipline corpus and analyzing learning patterns of full fine-tuning, LoRA, LoRA-MoE, and LoRA compositions. Particularly, our study shows that multi-disciplinary learning […]

Ver mais

Like 0

Liked Liked

technocracy

AI’s Diminishing Returns : Avoiding the Overreliance Trap in BFSI

digitado ⋅ 18 de February de 2026

AI’s Diminishing Returns: Avoiding the Overreliance Trap “I find it hard to see, how there can be a good return on investment given, the current math.” This warning about AI infrastructure economics is not just for tech investors, but it has a direct bearing on banking and financial services, where the current wave of AI adoption risks building systemic fragility under the banner of innovation. Unlike earlier generation of AI solutions, which were often developed in-house, fine-tuned on proprietary datasets, […]

Ver mais

Like 0

Liked Liked

technocracy

UpSkill: Mutual Information Skill Learning for Structured Response Diversity in LLMs

digitado ⋅ 25 de February de 2026

Reinforcement Learning with Verifiable Rewards (RLVR) has improved the reasoning abilities of large language models (LLMs) on mathematics and programming tasks, but standard approaches that optimize single-attempt accuracy can inadvertently suppress response diversity across repeated attempts, narrowing exploration and overlooking underrepresented strategies. We introduce UpSkill, a training time method that adapts Mutual Information Skill Learning (MISL) to LLMs for optimizing pass@k correctness. We propose a novel reward that we implement within Group Relative Policy Optimization (GRPO): a token-level […]

Ver mais

Like 0

Liked Liked

technocracy

DeepResearch-Slice: Bridging the Retrieval-Utilization Gap via Explicit Text Slicing

digitado ⋅ 8 de January de 2026

arXiv:2601.03261v1 Announce Type: new Abstract: Deep Research agents predominantly optimize search policies to maximize retrieval probability. However, we identify a critical bottleneck: the retrieval-utilization gap, where models fail to use gold evidence even after it is retrieved, due to context blindness in noisy environments. To bridge this gap, we propose DeepResearch-Slice, a simple yet effective neuro-symbolic framework. Unlike implicit attention, our approach predicts precise span indices to perform a deterministic hard filter before reasoning. Extensive evaluations across six […]

Ver mais

Like 0

Liked Liked

technocracy

GraphAllocBench: A Flexible Benchmark for Preference-Conditioned Multi-Objective Policy Learning

digitado ⋅ 28 de January de 2026

Preference-Conditioned Policy Learning (PCPL) in Multi-Objective Reinforcement Learning (MORL) aims to approximate diverse Pareto-optimal solutions by conditioning policies on user-specified preferences over objectives. This enables a single model to flexibly adapt to arbitrary trade-offs at run-time by producing a policy on or near the Pareto front. However, existing benchmarks for PCPL are largely restricted to toy tasks and fixed environments, limiting their realism and scalability. To address this gap, we introduce GraphAllocBench, a flexible benchmark built on a […]

Ver mais

Like 0

Liked Liked