technocracy

Evaluating Reward Model Generalization via Pairwise Maximum Discrepancy Competitions

digitado ⋅ 27 de January de 2026

arXiv:2601.16987v1 Announce Type: new Abstract: Reward models (RMs) are central to aligning large language models, yet their practical effectiveness hinges on generalization to unseen prompts and shifting distributions. Most existing RM evaluations rely on static, pre-annotated preference datasets, which provide limited coverage and often fail to faithfully assess generalization in open-world settings. We introduce Pairwise Maximum Discrepancy Competition (PMDC), a dynamic and annotation-efficient framework for evaluating RM generalization using a large, unlabeled, open-domain prompt pool. PMDC actively selects […]

Ver mais

Like 0

Liked Liked

technocracy

Detecting Batch Heterogeneity via Likelihood Clustering

digitado ⋅ 16 de January de 2026

arXiv:2601.09758v1 Announce Type: cross Abstract: Batch effects represent a major confounder in genomic diagnostics. In copy number variant (CNV) detection from NGS, many algorithms compare read depth between test samples and a reference sample, assuming they are process-matched. When this assumption is violated, with causes ranging from reagent lot changes to multi-site processing, the reference becomes inappropriate, introducing false CNV calls or masking true pathogenic variants. Detecting such heterogeneity before downstream analysis is critical for reliable clinical interpretation. […]

Ver mais

Like 0

Liked Liked

technocracy

IWLV-Ramayana: A Sarga-Aligned Parallel Corpus of Valmiki’s Ramayana Across Indian Languages

digitado ⋅ 16 de April de 2026

arXiv:2604.13078v1 Announce Type: new Abstract: The Ramayana is among the most influential literary traditions of South and Southeast Asia, transmitted across numerous linguistic and cultural contexts over two millennia. Despite extensive scholarship on regional Ramayana traditions, computational resources enabling systematic cross-linguistic analysis remain limited. This paper introduces the IWLV Ramayana Corpus, a structured parallel corpus aligning Valmiki’s Ramayana across multiple Indian languages at the level of the sarga (chapter). The corpus currently includes complete English and Malayalam layers, […]

Ver mais

Like 0

Liked Liked

technocracy

Right to History: A Sovereignty Kernel for Verifiable AI Agent Execution

digitado ⋅ 25 de February de 2026

arXiv:2602.20214v1 Announce Type: new Abstract: AI agents increasingly act on behalf of humans, yet no existing system provides a tamper-evident, independently verifiable record of what they did. As regulations such as the EU AI Act begin mandating automatic logging for high-risk AI systems, this gap carries concrete consequences — especially for agents running on personal hardware, where no centralized provider controls the log. Extending Floridi’s informational rights framework from data about individuals to actions performed on their behalf, […]

Ver mais

Like 0

Liked Liked

technocracy

Towards Fair and Efficient De-identification: Quantifying the Efficiency and Generalizability of De-identification Approaches

digitado ⋅ 19 de February de 2026

arXiv:2602.15869v1 Announce Type: new Abstract: Large language models (LLMs) have shown strong performance on clinical de-identification, the task of identifying sensitive identifiers to protect privacy. However, previous work has not examined their generalizability between formats, cultures, and genders. In this work, we systematically evaluate fine-tuned transformer models (BERT, ClinicalBERT, ModernBERT), small LLMs (Llama 1-8B, Qwen 1.5-7B), and large LLMs (Llama-70B, Qwen-72B) at de-identification. We show that smaller models achieve comparable performance while substantially reducing inference cost, making them […]

Ver mais

Like 0

Liked Liked

technocracy

Investigating Bystander Privacy in Chinese Smart Home Apps

digitado ⋅ 11 de February de 2026

arXiv:2602.09254v1 Announce Type: new Abstract: Bystander privacy in smart homes has been widely studied in Western contexts, yet it remains underexplored in non-Western countries such as China. In this study, we analyze 49 Chinese smart home apps using a mixed-methods approach, including privacy policy review, UX/UI evaluation, and assessment of Apple App Store privacy labels. While most apps nominally comply with national regulations, we identify significant gaps between written policies and actual implementation. Our traceability analysis highlights inconsistencies […]

Ver mais

Like 0

Liked Liked

technocracy

Using a Knowledge Graph to Generate Predictive Models for the Oscars

digitado ⋅ 4 de March de 2026

Generated by ChatGPT A layered semantic data foundation for agent-driven forecasting Git Repo: https://github.com/SteveHedden/fckg The full ontology, data pipelines, and modeling code are available in the repository. Using AI to predict the Oscars is easy. Building the infrastructure that lets anyone (human or agent) produce forecasts is harder. This post is not about a single model that forecasts winners. It’s about constructing a reusable semantic data foundation that makes prediction, analysis, and reasoning straightforward. The Oscars are simply a case […]

Ver mais

Like 0

Liked Liked

technocracy

Grok 4.2 vs. Sonnet 4.6: Early Impressions From Hands-On Testing

digitado ⋅ 24 de February de 2026

We got new model releases from xAI and Anthropic last week, and I wanted to give my quick impressions to help you know if/when you should care. This is just after a half day of testing, so my impressions may change, but… we’re usually locked in on the vibe pretty quickly. By the way, even if you aren’t interested in Grok, take a read of the analysis below — we’ll talk about subagent systems in a way that will […]

Ver mais

Like 0

Liked Liked

technocracy

Blockchain Federated Learning for Sustainable Retail: Reducing Waste through Collaborative Demand Forecasting

digitado ⋅ 4 de February de 2026

Effective demand forecasting is crucial for reducing food waste. However, data privacy concerns often hinder collaboration among retailers, limiting the potential for improved predictive accuracy. In this study, we explore the application of Federated Learning (FL) in Sustainable Supply Chain Management (SSCM), with a focus on the grocery retail sector dealing with perishable goods. We develop a baseline predictive model for demand forecasting and waste assessment in an isolated retailer scenario. Subsequently, we introduce a Blockchain-based FL model, […]

Ver mais

Like 0

Liked Liked

technocracy

FHIRPath-QA: Executable Question Answering over FHIR Electronic Health Records

digitado ⋅ 2 de March de 2026

arXiv:2602.23479v1 Announce Type: new Abstract: Though patients are increasingly granted digital access to their electronic health records (EHRs), existing interfaces may not support precise, trustworthy answers to patient-specific questions. Large language models (LLM) show promise in clinical question answering (QA), but retrieval-based approaches are computationally inefficient, prone to hallucination, and difficult to deploy over real-life EHRs. In this work, we introduce FHIRPath-QA, the first open dataset and benchmark for patient-specific QA that includes open-standard FHIRPath queries over real-world […]

Ver mais

Like 0

Liked Liked