digitado – Page 206

DIALEVAL: Automated Type-Theoretic Evaluation of LLM Instruction Following

digitado ⋅ 5 de March de 2026

arXiv:2603.03321v1 Announce Type: new Abstract: Evaluating instruction following in Large Language Models requires decomposing instructions into verifiable requirements and assessing satisfaction–tasks currently dependent on manual annotation and uniform criteria that do not align with human judgment patterns. We present DIALEVAL, a type-theoretic framework using dual LLM agents to automate instruction decomposition into typed predicates and implement type-specific satisfaction semantics. The framework enforces formal atomicity and independence constraints during automated extraction, then applies differentiated evaluation criteria–semantic equivalence for content […]

Ver mais

Like 0

Liked Liked

technocracy

Thomas Berry: Evening Thoughts: Reflecting on Earth as Sacred Community

digitado ⋅ 9 de November de 2024

Thomas Berry challenges us to rethink humanity’s story on Earth: we are not masters of a mechanical world but participants in a sacred community of life. Framing ecological collapse as a spiritual crisis, he invites us to embrace a new cosmology of meaning, belonging, and partnership with the planet, and to reclaim our role as the Earth’s “consciousness.”

Ver mais

Like 0

Liked Liked

technocracy

The Hidden Failure Mode in Multi-Agent Review

digitado ⋅ 17 de March de 2026

A few days ago my review stage did the most dangerous thing a multi‑agent system can do: it looked like it worked. The UI showed progress. The pipeline marched forward. And yet one of the agents had effectively returned “nothing,” which meant my final decision was being computed from a lie—an average that quietly pretended a missing opinion existed. That’s the moment you stop thinking about “LLM evals” and start thinking about defensive systems engineering. This post is […]

Ver mais

Like 0

Liked Liked

technocracy

Directly from Alpha to Omega: Controllable End-to-End Vector Floor Plan Generation

digitado ⋅ 25 de February de 2026

arXiv:2602.20377v1 Announce Type: new Abstract: Automated floor plan generation aims to create residential layouts by arranging rooms within a given boundary, balancing topological, geometric, and aesthetic considerations. The existing methods typically use a multi-step pipeline with intermediate representations to decompose the prediction process into several sub-tasks, limiting model flexibility and imposing predefined solution paths. This often results in unreasonable outputs when applied to data unsuitable for these predefined paths, making it challenging for these methods to match human […]

Ver mais

Like 0

Liked Liked

technocracy

A Cache-Aware Hybrid Sieve Combining Segmentation and Bit-Packing for Fast Prime Generation

digitado ⋅ 29 de January de 2026

arXiv:2601.19909v1 Announce Type: new Abstract: Prime generation is a fundamental task in cryptography, number theory, and randomized algorithms. While the classical Sieve of Eratosthenes is simple and efficient in theory, its practical performance on modern central processing units is often limited by memory access inefficiencies. This paper introduces a cache-aware hybrid sieve that integrates segmentation, bit-packing, and cache-line-aligned block processing to optimize memory bandwidth and level one and level two cache locality. The proposed approach reduces memory usage […]

Ver mais

Like 0

Liked Liked

technocracy

Bayesian Recovery for Probabilistic Coalition Structures

digitado ⋅ 12 de January de 2026

arXiv:2601.05273v1 Announce Type: new Abstract: Probabilistic Coalition Structure Generation (PCSG) is NP-hard and can be recast as an $l_0$-type sparse recovery problem by representing coalition structures as sparse coefficient vectors over a coalition-incidence design. A natural question is whether standard sparse methods, such as $l_1$ relaxations and greedy pursuits, can reliably recover the optimal coalition structure in this setting. We show that the answer is negative in a PCSG-inspired regime where overlapping coalitions generate highly coherent, near-duplicate columns: […]

Ver mais

Like 0

Liked Liked

technocracy

Prototype-driven fusion of pathology and spatial transcriptomics for interpretable survival prediction

digitado ⋅ 16 de February de 2026

arXiv:2602.12441v1 Announce Type: new Abstract: Whole slide images (WSIs) enable weakly supervised prognostic modeling via multiple instance learning (MIL). Spatial transcriptomics (ST) preserves in situ gene expression, providing a spatial molecular context that complements morphology. As paired WSI-ST cohorts scale to population level, leveraging their complementary spatial signals for prognosis becomes crucial; however, principled cross-modal fusion strategies remain limited for this paradigm. To this end, we introduce PathoSpatial, an interpretable end-to-end framework integrating co-registered WSIs and ST to […]

Ver mais

Like 0

Liked Liked

technocracy

Exploration Hacking: Can LLMs Learn to Resist RL Training?

digitado ⋅ 30 de April de 2026

Reinforcement learning (RL) has become essential to the post-training of large language models (LLMs) for reasoning, agentic capabilities and alignment. Successful RL relies on sufficient exploration of diverse actions by the model during training, which creates a potential failure mode: a model could strategically alter its exploration during training to influence the subsequent training outcome. In this paper we study this behavior, called exploration hacking. First, we create model organisms of selective RL resistance by fine-tuning LLMs to […]

Ver mais

Like 0

Liked Liked

technocracy

Operational Intelligence: The Real Moat in Venture-Backed Startups

digitado ⋅ 27 de March de 2026

In 2026, the conversation around startups feels noticeably more disciplined than it did a few years ago. Founders still talk about speed and innovation, but investor conversations now circle back to margin structure, cash visibility, staffing efficiency and capital allocation far earlier in a company’s lifecycle. Teams are leaner by design. AI has compressed certain workflows. Expectations for output per employee are higher. What used to be considered “later-stage operational maturity” is now being scrutinized in seed and […]

Ver mais

Like 0

Liked Liked

technocracy

How to Use GitHub Copilot Code Review in Pull Requests

digitado ⋅ 3 de June de 2026

GitHub offers several AI tools under the Copilot umbrella that cover your entire development workflow. Copilot can provide an AI-powered code review shortly after you open a pull request on GitHub. Rather than waiting for a teammate, you can add Copilot as a reviewer to receive context-aware feedback. With access to your entire codebase, it delivers actionable suggestions that you can apply in just a few clicks: Pull requests are the standard collaborative workflow provided by GitHub and […]

Ver mais

Like 0

Liked Liked