digitado – Page 571

ReDiPrune: Relevance-Diversity Pre-Projection Token Pruning for Efficient Multimodal LLMs

digitado ⋅ 27 de March de 2026

arXiv:2603.24680v1 Announce Type: new Abstract: Recent multimodal large language models are computationally expensive because Transformers must process a large number of visual tokens. We present textbf{ReDiPrune}, a training-free token pruning method applied before the vision-language projector, where visual features remain rich and discriminative. Unlike post-projection pruning methods that operate on compressed representations, ReDiPrune selects informative tokens directly from vision encoder outputs, preserving fine-grained spatial and semantic cues. Each token is scored by a lightweight rule that jointly consider […]

Ver mais

Like 0

Liked Liked

technocracy

From PDF to RAG-Ready: Evaluating Document Conversion Frameworks for Domain-Specific Question Answering

digitado ⋅ 8 de April de 2026

arXiv:2604.04948v1 Announce Type: new Abstract: Retrieval-Augmented Generation (RAG) systems depend critically on the quality of document preprocessing, yet no prior study has evaluated PDF processing frameworks by their impact on downstream question-answering accuracy. We address this gap through a systematic comparison of four open-source PDF-to-Markdown conversion frameworks, Docling, MinerU, Marker, and DeepSeek OCR, across 19 pipeline configurations for extracting text and other contents from PDFs, varying the conversion tool, cleaning transformations, splitting strategy, and metadata enrichment. Evaluation was […]

Ver mais

Like 0

Liked Liked

technocracy

Safety Generalization Under Distribution Shift in Safe Reinforcement Learning: A Diabetes Testbed

digitado ⋅ 28 de January de 2026

Safe Reinforcement Learning (RL) algorithms are typically evaluated under fixed training conditions. We investigate whether training-time safety guarantees transfer to deployment under distribution shift, using diabetes management as a safety-critical testbed. We benchmark safe RL algorithms on a unified clinical simulator and reveal a safety generalization gap: policies satisfying constraints during training frequently violate safety requirements on unseen patients. We demonstrate that test-time shielding, which filters unsafe actions using learned dynamics models, effectively restores safety across algorithms and […]

Ver mais

Like 0

Liked Liked

technocracy

Expert Available: Why Do We Love Upsets in Sports But Fear Them in Tech?

digitado ⋅ 8 de December de 2025

As sports fans throughout the country gear up for rivalries and the playoffs this holiday season, Cato Institute senior fellow in technology policy Jennifer Huddleston’s new blog, titled What Sports Can Teach Us About Competition Policy, compares competition in sports to competition in the technology market: lead , “Competition doesn’t only exist on the field. It also exists in the market. So why then do we seem not to greet technology disruptors’ success with the same sense of pride and […]

Ver mais

Like 0

Liked Liked

technocracy

Learning to Explore: Policy-Guided Outlier Synthesis for Graph Out-of-Distribution Detection

digitado ⋅ 28 de February de 2026

Detecting out-of-distribution (OOD) graphs is crucial for ensuring the safety and reliability of Graph Neural Networks. In unsupervised graph-level OOD detection, models are typically trained using only in-distribution (ID) data, resulting in incomplete feature space characterization and weak decision boundaries. Although synthesizing outliers offers a promising solution, existing approaches rely on fixed, non-adaptive sampling heuristics (e.g., distance- or density-based), limiting their ability to explore informative OOD regions. We propose a Policy-Guided Outlier Synthesis (PGOS) framework that replaces static […]

Ver mais

Like 0

Liked Liked

technocracy

Linguistic Blind Spots in Clinical Decision Extraction

digitado ⋅ 5 de February de 2026

arXiv:2602.03942v1 Announce Type: new Abstract: Extracting medical decisions from clinical notes is a key step for clinical decision support and patient-facing care summaries. We study how the linguistic characteristics of clinical decisions vary across decision categories and whether these differences explain extraction failures. Using MedDec discharge summaries annotated with decision categories from the Decision Identification and Classification Taxonomy for Use in Medicine (DICTUM), we compute seven linguistic indices for each decision span and analyze span-level extraction recall of […]

Ver mais

Like 0

Liked Liked

technocracy

We Were Promised Jetpacks: Why AI Isn’t Accelerating Feature Delivery

digitado ⋅ 6 de April de 2026

Where are the engineering productivity and velocity gains we were promised with AI coding tools? AI tools assist in writing half of Google’s code. Microsoft is not far behind at 30%. With so many more lines of code generated by AI, you may wonder: where are the massive engineering productivity improvements? Where are the spades of new features being delivered? Consider this: shovelware apps have not exploded since coding tools emerged. They’ve actually declined. In fact, there’s a growing […]

Ver mais

Like 0

Liked Liked

technocracy

Pro-ZD: A Transferable Graph Neural Network Approach for Proactive Zero-Day Threats Mitigation

digitado ⋅ 10 de February de 2026

arXiv:2602.07073v1 Announce Type: new Abstract: In today’s enterprise network landscape, the combination of perimeter and distributed firewall rules governs connectivity. To address challenges arising from increased traffic and diverse network architectures, organizations employ automated tools for firewall rule and access policy generation. Yet, effectively managing risks arising from dynamically generated policies, especially concerning critical asset exposure, remains a major challenge. This challenge is amplified by evolving network structures due to trends like remote users, bring-your-own devices, and cloud […]

Ver mais

Like 0

Liked Liked

technocracy

Robotic Assembly Using Deep Reinforcement Learning

digitado ⋅ 21 de October de 2020

Introduction Disclaimer: This article is a cross post from Pytorch Medium Blog Post. One of the most exciting advancements, that has pushed the frontier of the Artificial Intelligence (AI) in recent years, is Deep Reinforcement Learning (DRL). DRL belongs to the family of machine learning algorithms. It assumes that intelligent machines can learn from their actions similar to the way humans learn from experience. Over the recent years we could witness some impressive real-world applications of DRL. The […]

Ver mais

Like 0

Liked Liked

technocracy

Global Web, Local Privacy? An International Review of Web Tracking

digitado ⋅ 22 de April de 2026

arXiv:2604.18633v1 Announce Type: new Abstract: Web tracking by ad networks, social networks, and other third parties is privacy-invasive. To protect users’ privacy an increasing number of countries are adopting new privacy laws. However, a major reason why their application on the web is so challenging is that privacy laws are local while the web is global. To that end, we evaluate websites’ tracker connections for ten countries for two sets of sites — the global Common Top 525 […]

Ver mais

Like 0

Liked Liked