technocracy

Taming Asynchronous CPU-GPU Coupling for Frequency-aware Latency Estimation on Mobile Edge

digitado ⋅ 20 de April de 2026

arXiv:2604.15357v1 Announce Type: new Abstract: Precise estimation of model inference latency is crucial for time-critical mobile edge applications, enabling devices to calculate latency margins against deadlines and trade them for enhanced model performance or resource savings. However, the ubiquity of Dynamic Voltage and Frequency Scaling (DVFS) renders traditional static profiling invalid in real-world deployments, as inference latency fluctuates with varying processor (CPU and GPU) frequencies. While extensive profiling across frequency combinations is theoretically possible, it is prohibitively expensive, […]

Ver mais

Like 0

Liked Liked

technocracy

Adaptive Test-Time Compute Allocation via Learned Heuristics over Categorical Structure

digitado ⋅ 5 de February de 2026

arXiv:2602.03975v1 Announce Type: new Abstract: Test-time computation has become a primary driver of progress in large language model (LLM) reasoning, but it is increasingly bottlenecked by expensive verification. In many reasoning systems, a large fraction of verifier calls are spent on redundant or unpromising intermediate hypotheses. We study reasoning under a emph{verification-cost-limited} setting and ask how verification effort should be allocated across intermediate states. We propose a state-level selective verification framework that combines (i) deterministic feasibility gating over […]

Ver mais

Like 0

Liked Liked

technocracy

LinuxArena: A Control Setting for AI Agents in Live Production Software Environments

digitado ⋅ 20 de April de 2026

arXiv:2604.15384v1 Announce Type: new Abstract: We introduce LinuxArena, a control setting in which agents operate directly on live, multi-service production environments. LinuxArena contains 20 environments, 1,671 main tasks representing legitimate software engineering work, and 184 side tasks representing safety failures such as data exfiltration and backdooring, making it the largest and most diverse control setting for software engineering to date. We validate LinuxArena is useful for control research by running sabotage evaluations, which measure whether attackers can complete […]

Ver mais

Like 0

Liked Liked

technocracy

Multi-Robot Decentralized Collaborative SLAM in Planetary Analogue Environments: Dataset, Challenges, and Lessons Learned

digitado ⋅ 30 de January de 2026

arXiv:2601.21063v1 Announce Type: new Abstract: Decentralized collaborative simultaneous localization and mapping (C-SLAM) is essential to enable multirobot missions in unknown environments without relying on preexisting localization and communication infrastructure. This technology is anticipated to play a key role in the exploration of the Moon, Mars, and other planets. In this article, we share insights and lessons learned from C-SLAM experiments involving three robots operating on a Mars analogue terrain and communicating over an ad hoc network. We examine […]

Ver mais

Like 0

Liked Liked

technocracy

Can coding agents relicense open source through a “clean room” implementation of code?

digitado ⋅ 5 de March de 2026

Over the past few months it’s become clear that coding agents are extraordinarily good at building a weird version of a “clean room” implementation of code. The most famous version of this pattern is when Compaq created a clean-room clone of the IBM BIOS back in 1982. They had one team of engineers reverse engineer the BIOS to create a specification, then handed that specification to another team to build a new ground-up version. This process used to […]

Ver mais

Like 0

Liked Liked

technocracy

Learning Machine Learning

digitado ⋅ 16 de September de 2018

Machine learning is one of the most interesting and advancing aspects of engineering and computer science. It shows you the potential of machines in our modern day world. Machine learning is essentially making a machine learn to perform different tasks by feeding enormous amounts of data. To accomplish this, there are multiple algorithms used for machines to learn and adapt to perform a task easier with new inputs. One of the algorithms is called neural networks. A Neural […]

Ver mais

Like 0

Liked Liked

technocracy

All of DOGE’s work could be undone as lawsuit against Musk proceeds

digitado ⋅ 24 de March de 2026

Elon Musk must defend himself against a lawsuit alleging that he unlawfully seized too much power as the leader of the Department of Government Efficiency (DOGE), a judge ruled Monday. According to the plaintiffs, Musk needed Senate confirmation before directing DOGE on drastic actions like eliminating agencies, mass firings, and steep budget cuts. Allegedly going far beyond the authority granted in President Donald Trump’s most expansive DOGE executive orders, Musk took every inch of power granted and then […]

Ver mais

Like 0

Liked Liked

technocracy

MongoDB vs ScyllaDB: Architecture Comparison

digitado ⋅ 26 de January de 2026

benchANT compares MongoDB and ScyllaDB architectures, with a focus on what the differences mean for performance and scalability When choosing a NoSQL database, the options can be overwhelming. One of the most popular choices is MongoDB, known for its easy use. But the highly performance-oriented ScyllaDB is one of the rising challengers. This benchANT report takes a closer technical look at both databases – comparing their architectures from an independent, technical angle. Both MongoDB and ScyllaDB promise a high-available, performant […]

Ver mais

Like 0

Liked Liked

technocracy

StratRAG: A Multi-Hop Retrieval Evaluation Dataset for Retrieval-Augmented Generation Systems

digitado ⋅ 28 de April de 2026

arXiv:2604.22757v1 Announce Type: new Abstract: We introduce StratRAG, an open-source retrieval evaluation dataset for benchmarking Retrieval-Augmented Generation (RAG) systems on multi-hop reasoning tasks under realistic, noisy document-pool conditions. Derived from HotpotQA (distractor setting), StratRAG comprises 2,200 examples across three question types — bridge, comparison, and yes-no — each paired with a pool of 15 candidate documents containing exactly 2 gold documents and 13 topically related distractors. We benchmark three retrieval strategies — BM25, dense retrieval (all-MiniLM-L6-v2), and hybrid […]

Ver mais

Like 0

Liked Liked

technocracy

Althea: Human-AI Collaboration for Fact-Checking and Critical Reasoning

digitado ⋅ 13 de February de 2026

arXiv:2602.11161v1 Announce Type: new Abstract: The web’s information ecosystem demands fact-checking systems that are both scalable and epistemically trustworthy. Automated approaches offer efficiency but often lack transparency, while human verification remains slow and inconsistent. We introduce Althea, a retrieval-augmented system that integrates question generation, evidence retrieval, and structured reasoning to support user-driven evaluation of online claims. On the AVeriTeC benchmark, Althea achieves a Macro-F1 of 0.44, outperforming standard verification pipelines and improving discrimination between supported and refuted claims. […]

Ver mais

Like 0

Liked Liked