digitado

MedPI: Evaluating AI Systems in Medical Patient-facing Interactions

digitado ⋅ 9 de January de 2026

arXiv:2601.04195v1 Announce Type: new Abstract: We present MedPI, a high-dimensional benchmark for evaluating large language models (LLMs) in patient-clinician conversations. Unlike single-turn question-answer (QA) benchmarks, MedPI evaluates the medical dialogue across 105 dimensions comprising the medical process, treatment safety, treatment outcomes and doctor-patient communication across a granular, accreditation-aligned rubric. MedPI comprises five layers: (1) Patient Packets (synthetic EHR-like ground truth); (2) an AI Patient instantiated through an LLM with memory and affect; (3) a Task Matrix spanning encounter […]

Ver mais

Like 0

Liked Liked

technocracy

Bridging the Gap: Empowering Small Models in Reliable OpenACC-based Parallelization via GEPA-Optimized Prompting

digitado ⋅ 15 de January de 2026

arXiv:2601.08884v1 Announce Type: new Abstract: OpenACC lowers the barrier to GPU offloading, but writing high-performing pragma remains complex, requiring deep domain expertise in memory hierarchies, data movement, and parallelization strategies. Large Language Models (LLMs) present a promising potential solution for automated parallel code generation, but naive prompting often results in syntactically incorrect directives, uncompilable code, or performance that fails to exceed CPU baselines. We present a systematic prompt optimization approach to enhance OpenACC pragma generation without the prohibitive […]

Ver mais

Like 0

Liked Liked

technocracy

Attribution Techniques for Mitigating Hallucinated Information in RAG Systems: A Survey

digitado ⋅ 29 de January de 2026

arXiv:2601.19927v1 Announce Type: new Abstract: Large Language Models (LLMs)-based question answering (QA) systems play a critical role in modern AI, demonstrating strong performance across various tasks. However, LLM-generated responses often suffer from hallucinations, unfaithful statements lacking reliable references. Retrieval-Augmented Generation (RAG) frameworks enhance LLM responses by incorporating external references but also introduce new forms of hallucination due to complex interactions between the retriever and generator. To address these challenges, researchers have explored attribution-based techniques that ensure responses are […]

Ver mais

Like 0

Liked Liked

technocracy

Decoding Rewards in Competitive Games: Inverse Game Theory with Entropy Regularization

digitado ⋅ 21 de January de 2026

arXiv:2601.12707v1 Announce Type: cross Abstract: Estimating the unknown reward functions driving agents’ behaviors is of central interest in inverse reinforcement learning and game theory. To tackle this problem, we develop a unified framework for reward function recovery in two-player zero-sum matrix games and Markov games with entropy regularization, where we aim to reconstruct the underlying reward functions given observed players’ strategies and actions. This task is challenging due to the inherent ambiguity of inverse problems, the non-uniqueness of […]

Ver mais

Like 0

Liked Liked

technocracy

oculomix: Hierarchical Sampling for Retinal-Based Systemic Disease Prediction

digitado ⋅ 29 de January de 2026

arXiv:2601.19939v1 Announce Type: new Abstract: Oculomics – the concept of predicting systemic diseases, such as cardiovascular disease and dementia, through retinal imaging – has advanced rapidly due to the data efficiency of transformer-based foundation models like RETFound. Image-level mixed sample data augmentations, such as CutMix and MixUp, are frequently used for training transformers, yet these techniques perturb patient-specific attributes, such as medical comorbidity and clinical factors, since they only account for images and labels. To address this limitation, […]

Ver mais

Like 0

Liked Liked

technocracy

Constrained Object Hierarchies as a Universal World Model for AGI

digitado ⋅ 29 de January de 2026

Constrained Object Hierarchies (COH) presents a neuroscience-grounded theoretical framework for artificial general intelligence that models intelligent systems as hierarchical structures of objects constrained by multi-domain rules. This paper demonstrates how COH, implemented through the General Intelligent System Modelling Language (GISMOL), serves as a universal world model capable of representing complex world systems across diverse domains including healthcare, finance, manufacturing, climate science, education, and urban governance. We present a comprehensive analysis of six complex world systems modelled using the […]

Ver mais

Like 0

Liked Liked

technocracy

Best Financial APIs for 2026

digitado ⋅ 27 de January de 2026

The best financial APIs in 2026 empower developers, analysts, and fintech startups with real-time and historical data for stocks, forex, crypto, and commodities. From Marketstack’s global market coverage to Intrinio’s financial statements, these APIs streamline trading, portfolio tracking, and AI-powered analysis, making enterprise-grade market insights accessible to all.

Ver mais

Like 0

Liked Liked

technocracy

Linguistic Analysis of Texts

digitado ⋅ 20 de June de 2016

Not long ago, Google released their new parser, oddly named Parsey McParseface. For a couple of days, popular media was swamped with announcements about Google solving all AI problems with their new magical software that understands language [e.g. 1, 2]. Well, that’s not quite what it does. In this post, I will explain about the different steps applied for analyzing sentence structure. These are usually used as a preprocessing step for higher-level tasks that try understanding the meaning […]

Ver mais

Like 0

Liked Liked

technocracy

On the Nonasymptotic Scaling Guarantee of Hyperparameter Estimation in Inhomogeneous, Weakly-Dependent Complex Network Dynamical Systems

digitado ⋅ 23 de January de 2026

arXiv:2601.15603v1 Announce Type: cross Abstract: Hierarchical Bayesian models are increasingly used in large, inhomogeneous complex network dynamical systems by modeling parameters as draws from a hyperparameter-governed distribution. However, theoretical guarantees for these estimates as the system size grows have been lacking. A critical concern is that hyperparameter estimation may diverge for larger networks, undermining the model’s reliability. Formulating the system’s evolution in a measure transport perspective, we propose a theoretical framework for estimating hyperparameters with mean-type observations, which […]

Ver mais

Like 0

Liked Liked

technocracy

Controlling Underestimation Bias in Constrained Reinforcement Learning for Safe Exploration

digitado ⋅ 17 de January de 2026

Constrained Reinforcement Learning (CRL) aims to maximize cumulative rewards while satisfying constraints. However, existing CRL algorithms often encounter significant constraint violations during training, limiting their applicability in safety-critical scenarios. In this paper, we identify the underestimation of the cost value function as a key factor contributing to these violations. To address this issue, we propose the Memory-driven Intrinsic Cost Estimation (MICE) method, which introduces intrinsic costs to mitigate underestimation and control bias to promote safer exploration. Inspired by […]

Ver mais

Like 0

Liked Liked