digitado – Page 174

The Dunning-Kruger Effect in Large Language Models: An Empirical Study of Confidence Calibration

digitado ⋅ 12 de March de 2026

arXiv:2603.09985v1 Announce Type: new Abstract: Large language models (LLMs) have demonstrated remarkable capabilities across diverse tasks, yet their ability to accurately assess their own confidence remains poorly understood. We present an empirical study investigating whether LLMs exhibit patterns reminiscent of the Dunning-Kruger effect — a cognitive bias where individuals with limited competence tend to overestimate their abilities. We evaluate four state-of-the-art models (Claude Haiku 4.5, Gemini 2.5 Pro, Gemini 2.5 Flash, and Kimi K2) across four benchmark datasets […]

Ver mais

Like 0

Liked Liked

technocracy

Former astronaut on lunar spacesuits: “I don’t think they’re great right now”

digitado ⋅ 26 de January de 2026

Crew members traveling to the lunar surface on NASA’s Artemis missions should be gearing up for a grind. They will wear heavier spacesuits than those worn by the Apollo astronauts, and NASA will ask them to do more than the first Moonwalkers did more than 50 years ago. The Moonwalking experience will amount to an “extreme physical event” for crews selected for the Artemis program’s first lunar landings, a former NASA astronaut told a panel of researchers, physicians, […]

Ver mais

Like 0

Liked Liked

technocracy

Differentiable Cyclic Causal Discovery Under Unmeasured Confounders

digitado ⋅ 26 de January de 2026

arXiv:2508.08450v3 Announce Type: replace-cross Abstract: Understanding causal relationships between variables is fundamental across scientific disciplines. Most causal discovery algorithms rely on two key assumptions: (i) all variables are observed, and (ii) the underlying causal graph is acyclic. While these assumptions simplify theoretical analysis, they are often violated in real-world systems, such as biological networks. Existing methods that account for confounders either assume linearity or struggle with scalability. To address these limitations, we propose DCCD-CONF, a novel framework for […]

Ver mais

Like 0

Liked Liked

technocracy

Quoting Armin Ronacher

digitado ⋅ 24 de May de 2026

The most frustrating failure mode right now is that people submit issues that are not in their own voice. They contain an observed problem somewhere, but it has been thrown into a clanker and the clanker reworded it and made a huge mess of it. Typically, it was prompted so badly that the conclusions produced are more often than not inaccurate but always full of confidence. The result is complete guesswork on root causes, fake-minimal repros, suggested implementation […]

Ver mais

Like 0

Liked Liked

technocracy

Gamers react with overwhelming disgust to DLSS 5’s generative AI glow-ups

digitado ⋅ 17 de March de 2026

Since deep-learning super-sampling (DLSS) launched on 2018’s RTX 2080 cards, gamers have been generally bullish on the technology as a way to effectively use machine-learning upscaling techniques to increase resolutions or juice frame rates in games. With yesterday’s tease of the upcoming DLSS 5, though, Nvidia has crossed a line from mere upscaling into complete lighting and texture overhauls influenced by “generative AI.” The result is a bland, uncanny gloss that has received an instant and overwhelmingly negative […]

Ver mais

Like 0

Liked Liked

technocracy

B-DENSE: Branching For Dense Ensemble Network Learning

digitado ⋅ 17 de February de 2026

Inspired by non-equilibrium thermodynamics, diffusion models have achieved state-of-the-art performance in generative modeling. However, their iterative sampling nature results in high inference latency. While recent distillation techniques accelerate sampling, they discard intermediate trajectory steps. This sparse supervision leads to a loss of structural information and introduces significant discretization errors. To mitigate this, we propose B-DENSE, a novel framework that leverages multi-branch trajectory alignment. We modify the student architecture to output $K$-fold expanded channels, where each subset corresponds to […]

Ver mais

Like 0

Liked Liked

technocracy

FADE: Selective Forgetting via Sparse LoRA and Self-Distillation

digitado ⋅ 10 de February de 2026

arXiv:2602.07058v1 Announce Type: new Abstract: Machine Unlearning aims to remove the influence of specific data or concepts from trained models while preserving overall performance, a capability increasingly required by data protection regulations and responsible AI practices. Despite recent progress, unlearning in text-to-image diffusion models remains challenging due to high computational costs and the difficulty of balancing effective forgetting with retention of unrelated concepts. We introduce FADE (Fast Adapter for Data Erasure), a two-stage unlearning method for image generation […]

Ver mais

Like 0

Liked Liked

technocracy

Social Engineering Attacks: A Systemisation of Knowledge on People Against Humans

digitado ⋅ 9 de January de 2026

arXiv:2601.04215v1 Announce Type: new Abstract: Our systematisation of knowledge on Social Engineering Attacks (SEAs), identifies the human, organisational, and adversarial dimensions of cyber threats. It addresses the growing risks posed by SEAs, highly relevant in the context physical cyber places, such as travellers at airports and residents in smart cities, and synthesizes findings from peer reviewed studies, industry and government reports to inform effective countermeasures that can be embedded into future smart city strategies. SEAs increasingly sidestep technical […]

Ver mais

Like 0

Liked Liked

technocracy

OpenPRC: A Unified Open-Source Framework for Physics-to-Task Evaluation in Physical Reservoir Computing

digitado ⋅ 10 de April de 2026

arXiv:2604.07423v1 Announce Type: new Abstract: Physical Reservoir Computing (PRC) leverages the intrinsic nonlinear dynamics of physical substrates, mechanical, optical, spintronic, and beyond, as fixed computational reservoirs, offering a compelling paradigm for energy-efficient and embodied machine learning. However, the practical workflow for developing and evaluating PRC systems remains fragmented: existing tools typically address only isolated parts of the pipeline, such as substrate-specific simulation, digital reservoir benchmarking, or readout training. What is missing is a unified framework that can represent […]

Ver mais

Like 0

Liked Liked

technocracy

Memory Retention Is Not Enough to Master Memory Tasks in Reinforcement Learning

digitado ⋅ 21 de January de 2026

Effective decision-making in the real world depends on memory that is both stable and adaptive: environments change over time, and agents must retain relevant information over long horizons while also updating or overwriting outdated content when circumstances shift. Existing Reinforcement Learning (RL) benchmarks and memory-augmented agents focus primarily on retention, leaving the equally critical ability of memory rewriting largely unexplored. To address this gap, we introduce a benchmark that explicitly tests continual memory updating under partial observability, i.e. […]

Ver mais

Like 0

Liked Liked