digitado

Case-Aware LLM-as-a-Judge Evaluation for Enterprise-Scale RAG Systems

digitado ⋅ 25 de February de 2026

arXiv:2602.20379v1 Announce Type: new Abstract: Enterprise Retrieval-Augmented Generation (RAG) assistants operate in multi-turn, case-based workflows such as technical support and IT operations, where evaluation must reflect operational constraints, structured identifiers (e.g., error codes, versions), and resolution workflows. Existing RAG evaluation frameworks are primarily designed for benchmark-style or single-turn settings and often fail to capture enterprise-specific failure modes such as case misidentification, workflow misalignment, and partial resolution across turns. We present a case-aware LLM-as-a-Judge evaluation framework for enterprise multi-turn […]

Ver mais

Like 0

Liked Liked

technocracy

An AI Monkey Gets Grapes for Sure — Sphere Neural Networks for Reliable Decision-Making

digitado ⋅ 6 de January de 2026

arXiv:2601.00142v1 Announce Type: new Abstract: This paper compares three methodological categories of neural reasoning: LLM reasoning, supervised learning-based reasoning, and explicit model-based reasoning. LLMs remain unreliable and struggle with simple decision-making that animals can master without extensive corpora training. Through disjunctive syllogistic reasoning testing, we show that reasoning via supervised learning is less appealing than reasoning via explicit model construction. Concretely, we show that an Euler Net trained to achieve 100.00% in classic syllogistic reasoning can be trained […]

Ver mais

Like 0

Liked Liked

technocracy

Contextual Intelligence The Next Leap for Reinforcement Learning

digitado ⋅ 7 de April de 2026

arXiv:2604.02348v1 Announce Type: new Abstract: Reinforcement learning (RL) has produced spectacular results in games, robotics, and continuous control. Yet, despite these successes, learned policies often fail to generalize beyond their training distribution, limiting real-world impact. Recent work on contextual RL (cRL) shows that exposing agents to environment characteristics — contexts — can improve zero-shot transfer. So far, the community has treated context as a monolithic, static observable, an approach that constrains the generalization capabilities of RL agents. To […]

Ver mais

Like 0

Liked Liked

technocracy

The Discovery Gap: Why ChatGPT Knows Your Startup But Won’t Recommend It

digitado ⋅ 8 de January de 2026

TL;DR I tested 112 Product Hunt startups with 2,240 queries across ChatGPT and Perplexity. The results challenge conventional wisdom about “Generative Engine Optimization” (GEO): The Discovery Gap: ChatGPT recognizes 99% of products directly but recommends only 3% organically (30:1 ratio) GEO doesn’t work (yet): Zero correlation between GEO optimization and ChatGPT discovery Traditional SEO wins: Backlinks (r=+0.32) and Reddit presence (r=+0.40) are the strongest predictors Full paper: arXiv:2601.00912 n Code & Data:GitHub The Problem: ChatGPT Visibility for Startups […]

Ver mais

Like 0

Liked Liked

technocracy

Palantir employees are talking about company’s “descent into fascism”

digitado ⋅ 25 de April de 2026

It took just a few months of President Donald Trump’s second term for Palantir employees to question their company’s commitments to civil liberties. Last fall, Palantir seemed to become the technological backbone of Trump’s immigration enforcement machinery, providing software identifying, tracking, and helping deport immigrants on behalf of the Department of Homeland Security, when current and former employees started ringing the alarm. Around that time, two former employees reconnected by phone. Right as they picked up the call, […]

Ver mais

Like 0

Liked Liked

technocracy

Wasserstein GAN

digitado ⋅ 2 de May de 2019

[Editor’s Note: We are especially proud of this one. James and his group went above and beyond the call of duty and made a guide from their class that we feel is especially superb for understanding their target paper. Moving forward, he has forced us to up our game because it will be hard to release a curriculum that is not as strong as this one. We highly recommend earnestly studying with this at hand.] A number of […]

Ver mais

Like 0

Liked Liked

technocracy

On the role of memorization in learned priors for geophysical inverse problems

digitado ⋅ 23 de March de 2026

arXiv:2603.19629v1 Announce Type: new Abstract: Learned priors based on deep generative models offer data-driven regularization for seismic inversion, but training them requires a dataset of representative subsurface models — a resource that is inherently scarce in geoscience applications. Since the training objective of most generative models can be cast as maximum likelihood on a finite dataset, any such model risks converging to the empirical distribution — effectively memorizing the training examples rather than learning the underlying geological distribution. […]

Ver mais

Like 0

Liked Liked

technocracy

Power Consumption Patterns Using Telemetry Data

digitado ⋅ 27 de February de 2026

arXiv:2602.22339v1 Announce Type: new Abstract: This paper examines the analysis of package power consumption using Intel’s telemetry data. It challenges the prevailing belief that hardware choice is the primary determinant of a device’s power consumption and instead emphasizes the significant role of user behavior. The paper includes two sections: Exploratory Data Analysis (EDA) and a linear model for power consumption. The EDA section provides valuable insights from Intel’s telemetry data, comparing power consumption across countries, with a specific […]

Ver mais

Like 0

Liked Liked

technocracy

Algorithm Selection with Zero Domain Knowledge via Text Embeddings

digitado ⋅ 23 de April de 2026

arXiv:2604.19753v1 Announce Type: new Abstract: We propose a feature-free approach to algorithm selection that replaces hand-crafted instance features with pretrained text embeddings. Our method, ZeroFolio, proceeds in three steps: it reads the raw instance file as plain text, embeds it with a pretrained embedding model, and selects an algorithm via weighted k-nearest neighbors. The key to our approach is the observation that pretrained embeddings produce representations that distinguish problem instances without any domain knowledge or task-specific training. This […]

Ver mais

Like 0

Liked Liked

technocracy

Reinforcement Learning with Function Approximation for Non-Markov Processes

digitado ⋅ 6 de January de 2026

arXiv:2601.00151v1 Announce Type: new Abstract: We study reinforcement learning methods with linear function approximation under non-Markov state and cost processes. We first consider the policy evaluation method and show that the algorithm converges under suitable ergodicity conditions on the underlying non-Markov processes. Furthermore, we show that the limit corresponds to the fixed point of a joint operator composed of an orthogonal projection and the Bellman operator of an auxiliary emph{Markov} decision process. For Q-learning with linear function approximation, […]

Ver mais

Like 0

Liked Liked