digitado

IslamicLegalBench: Evaluating LLMs Knowledge and Reasoning of Islamic Law Across 1,200 Years of Islamic Pluralist Legal Traditions

digitado ⋅ 26 de February de 2026

arXiv:2602.21226v1 Announce Type: new Abstract: As millions of Muslims turn to LLMs like GPT, Claude, and DeepSeek for religious guidance, a critical question arises: Can these AI systems reliably reason about Islamic law? We introduce IslamicLegalBench, the first benchmark evaluating LLMs across seven schools of Islamic jurisprudence, with 718 instances covering 13 tasks of varying complexity. Evaluation of nine state-of-the-art models reveals major limitations: the best model achieves only 68% correctness with 21% hallucination, while several models fall […]

Ver mais

Like 0

Liked Liked

technocracy

Opus 4.6 and Codex 5.3

digitado ⋅ 5 de February de 2026

Two major new model releases today, within about 15 minutes of each other. Anthropic released Opus 4.6. Here’s its pelican: OpenAI release GPT-5.3-Codex, albeit only via their Codex app, not yet in their API. Here’s its pelican: I’ve had a bit of preview access to both of these models and to be honest I’m finding it hard to find a good angle to write about them – they’re both really good, but so were their predecessors Codex 5.2 […]

Ver mais

Like 0

Liked Liked

technocracy

Authority Signals in AI Cited Health Sources: A Framework for Evaluating Source Credibility in ChatGPT Responses

digitado ⋅ 27 de January de 2026

arXiv:2601.17109v1 Announce Type: new Abstract: Health information seeking has fundamentally changed since the onset of Large Language Models (LLM), with nearly one third of ChatGPT’s 800 million users asking health questions weekly. Understanding the sources of those AI generated responses is vital, as health organizations and providers are also investing in digital strategies to organically improve their ranking, reach and visibility in LLM systems like ChatGPT. As AI search optimization strategies are gaining maturity, this study introduces an […]

Ver mais

Like 0

Liked Liked

technocracy

Sparse Additive Model Pruning for Order-Based Causal Structure Learning

digitado ⋅ 18 de February de 2026

arXiv:2602.15306v1 Announce Type: new Abstract: Causal structure learning, also known as causal discovery, aims to estimate causal relationships between variables as a form of a causal directed acyclic graph (DAG) from observational data. One of the major frameworks is the order-based approach that first estimates a topological order of the underlying DAG and then prunes spurious edges from the fully-connected DAG induced by the estimated topological order. Previous studies often focus on the former ordering step because it […]

Ver mais

Like 0

Liked Liked

technocracy

Decoupling Dynamical Richness from Representation Learning: Towards Practical Measurement

digitado ⋅ 3 de March de 2026

arXiv:2410.04264v3 Announce Type: replace Abstract: Dynamic feature transformation (the rich regime) does not always align with predictive performance (better representation), yet accuracy is often used as a proxy for richness, limiting analysis of their relationship. We propose a computationally efficient, performance-independent metric of richness grounded in the low-rank bias of rich dynamics, which recovers neural collapse as a special case. The metric is empirically more stable than existing alternatives and captures known lazy-torich transitions (e.g., grokking) without relying […]

Ver mais

Like 0

Liked Liked

technocracy

NVIDIA Releases DreamDojo: An Open-Source Robot World Model Trained on 44,711 Hours of Real-World Human Video Data

digitado ⋅ 21 de February de 2026

Building simulators for robots has been a long term challenge. Traditional engines require manual coding of physics and perfect 3D models. NVIDIA is changing this with DreamDojo, a fully open-source, generalizable robot world model. Instead of using a physics engine, DreamDojo ‘dreams’ the results of robot actions directly in pixels. https://arxiv.org/pdf/2602.06949 Scaling Robotics with 44k+ Hours of Human Experience The biggest hurdle for AI in robotics is data. Collecting robot-specific data is expensive and slow. DreamDojo solves this […]

Ver mais

Like 0

Liked Liked

technocracy

Qwen3.5-9B: A Small Model With a Massive Context Window

digitado ⋅ 5 de March de 2026

Qwen3.5-9B is a compact vision-capable LLM with 262K native context, MoE efficiency, strong math/coding, and 201-language support.

Ver mais

Like 0

Liked Liked

technocracy

Learning to Collaborate via Structures: Cluster-Guided Item Alignment for Federated Recommendation

digitado ⋅ 25 de February de 2026

Federated recommendation facilitates collaborative model training across distributed clients while keeping sensitive user interaction data local. Conventional approaches typically rely on synchronizing high-dimensional item representations between the server and clients. This paradigm implicitly assumes that precise geometric alignment of embedding coordinates is necessary for collaboration across clients. We posit that establishing relative semantic relationships among items is more effective than enforcing shared representations. Specifically, global semantic relations serve as structural constraints for items. Within these constraints, the framework […]

Ver mais

Like 0

Liked Liked

technocracy

Scalable Preconditioners for the Pseudo-4D DFN Lithium-ion Battery Model

digitado ⋅ 10 de February de 2026

arXiv:2602.07225v1 Announce Type: new Abstract: The pseudo-4D Doyle-Fuller-Newman (DFN) model enables predictive simulation of lithium-ion batteries with three-dimensional electrode architectures and particle-scale diffusion, extending the standard pseudo-2D (P2D) formulation to fully resolve cell geometry. This leads to large, nonlinear systems with strong coupling across multiple physical scales, posing significant challenges for scalable numerical solution. We introduce block-structured preconditioning strategies that exploit the mathematical properties of the coupled system, employing multigrid techniques for electrode-level operators and localized solvers for […]

Ver mais

Like 0

Liked Liked

technocracy

Long-Term Mapping of the Douro River Plume with Multi-Agent Reinforcement Learning

digitado ⋅ 12 de February de 2026

arXiv:2510.03534v4 Announce Type: replace-cross Abstract: We study the problem of long-term (multiple days) mapping of a river plume using multiple autonomous underwater vehicles (AUVs), focusing on the Douro river representative use-case. We propose an energy – and communication – efficient multi-agent reinforcement learning approach in which a central coordinator intermittently communicates with the AUVs, collecting measurements and issuing commands. Our approach integrates spatiotemporal Gaussian process regression (GPR) with a multi-head Q-network controller that regulates direction and speed for […]

Ver mais

Like 0

Liked Liked