February 2026

RAGdb: A Zero-Dependency, Embeddable Architecture for Multimodal Retrieval-Augmented Generation on the Edge

digitado ⋅ 27 de February de 2026

arXiv:2602.22217v1 Announce Type: new Abstract: Retrieval-Augmented Generation (RAG) has established itself as the standard paradigm for grounding Large Language Models (LLMs) in domain-specific, up-to-date data. However, the prevailing architecture for RAG has evolved into a complex, distributed stack requiring cloud-hosted vector databases, heavy deep learning frameworks (e.g., PyTorch, CUDA), and high-latency embedding inference servers. This “infrastructure bloat” creates a significant barrier to entry for edge computing, air-gapped environments, and privacy-constrained applications where data sovereignty is paramount. This paper […]

Ver mais

Like 0

Liked Liked

technocracy

Retrieval-Augmented Generation Assistant for Anatomical Pathology Laboratories

digitado ⋅ 27 de February de 2026

arXiv:2602.22216v1 Announce Type: new Abstract: Accurate and efficient access to laboratory protocols is essential in Anatomical Pathology (AP), where up to 70% of medical decisions depend on laboratory diagnoses. However, static documentation such as printed manuals or PDFs is often outdated, fragmented, and difficult to search, creating risks of workflow errors and diagnostic delays. This study proposes and evaluates a Retrieval-Augmented Generation (RAG) assistant tailored to AP laboratories, designed to provide technicians with context-grounded answers to protocol-related queries. […]

Ver mais

Like 0

Liked Liked

technocracy

Graph Your Way to Inspiration: Integrating Co-Author Graphs with Retrieval-Augmented Generation for Large Language Model Based Scientific Idea Generation

digitado ⋅ 27 de February de 2026

arXiv:2602.22215v1 Announce Type: new Abstract: Large Language Models (LLMs) demonstrate potential in the field of scientific idea generation. However, the generated results often lack controllable academic context and traceable inspiration pathways. To bridge this gap, this paper proposes a scientific idea generation system called GYWI, which combines author knowledge graphs with retrieval-augmented generation (RAG) to form an external knowledge base to provide controllable context and trace of inspiration path for LLMs to generate new scientific ideas. We first […]

Ver mais

Like 0

Liked Liked

technocracy

Adaptive Prefiltering for High-Dimensional Similarity Search: A Frequency-Aware Approach

digitado ⋅ 27 de February de 2026

arXiv:2602.22214v1 Announce Type: new Abstract: High-dimensional similarity search underpins modern retrieval systems, yet uniform search strategies fail to exploit the heterogeneous nature of real-world query distributions. We present an adaptive prefiltering framework that leverages query frequency patterns and cluster coherence metrics to dynamically allocate computational budgets. Our approach partitions the query space into frequency tiers following Zipfian distributions and assigns differentiated search policies based on historical access patterns and local density characteristics. Experiments on ImageNet-1k using CLIP embeddings […]

Ver mais

Like 0

Liked Liked

technocracy

Enriching Taxonomies Using Large Language Models

digitado ⋅ 27 de February de 2026

arXiv:2602.22213v1 Announce Type: new Abstract: Taxonomies play a vital role in structuring and categorizing information across domains. However, many existing taxonomies suffer from limited coverage and outdated or ambiguous nodes, reducing their effectiveness in knowledge retrieval. To address this, we present Taxoria, a novel taxonomy enrichment pipeline that leverages Large Language Models (LLMs) to enhance a given taxonomy. Unlike approaches that extract internal LLM taxonomies, Taxoria uses an existing taxonomy as a seed and prompts an LLM to […]

Ver mais

Like 0

Liked Liked

technocracy

General Bayesian Policy Learning

digitado ⋅ 27 de February de 2026

This study proposes the General Bayes framework for policy learning. We consider decision problems in which a decision-maker chooses an action from an action set to maximize its expected welfare. Typical examples include treatment choice and portfolio selection. In such problems, the statistical target is a decision rule, and the prediction of each outcome $Y(a)$ is not necessarily of primary interest. We formulate this policy learning problem by loss-based Bayesian updating. Our main technical device is a squared-loss […]

Ver mais

Like 0

Liked Liked

technocracy

Active Learning for Planet Habitability Classification under Extreme Class Imbalance

digitado ⋅ 27 de February de 2026

The increasing size and heterogeneity of exoplanet catalogs have made systematic habitability assessment challenging, particularly given the extreme scarcity of potentially habitable planets and the evolving nature of their labels. In this study, we explore the use of pool-based active learning to improve the efficiency of habitability classification under realistic observational constraints. We construct a unified dataset from the Habitable World Catalog and the NASA Exoplanet Archive and formulate habitability assessment as a binary classification problem. A supervised […]

Ver mais

Like 0

Liked Liked

technocracy

Geodesic Semantic Search: Learning Local Riemannian Metrics for Citation Graph Retrieval

digitado ⋅ 27 de February de 2026

We present Geodesic Semantic Search (GSS), a retrieval system that learns node-specific Riemannian metrics on citation graphs to enable geometry-aware semantic search. Unlike standard embedding-based retrieval that relies on fixed Euclidean distances, gss{} learns a low-rank metric tensor $mL_i in R^{d times r}$ at each node, inducing a local positive semi-definite metric $mG_i = mL_i mL_i^top + eps mI$. This parameterization guarantees valid metrics while keeping the model tractable. Retrieval proceeds via multi-source Dijkstra on the learned geodesic […]

Ver mais

Like 0

Liked Liked

technocracy

Disentangled Mode-Specific Representations for Tensor Time Series via Contrastive Learning

digitado ⋅ 27 de February de 2026

Multi-mode tensor time series (TTS) can be found in many domains, such as search engines and environmental monitoring systems. Learning representations of a TTS benefits various applications, but it is also challenging since the complexities inherent in the tensor hinder the realization of rich representations. In this paper, we propose a novel representation learning method designed specifically for TTS, namely MoST. Specifically, MoST uses a tensor slicing approach to reduce the complexity of the TTS structure and learns […]

Ver mais

Like 0

Liked Liked

technocracy

A Review of YOLO Family from YOLOv1 to YOLO26

digitado ⋅ 27 de February de 2026

Object detection technologies form the foundation of real-time performance across a broad spectrum of applications, ranging from autonomous systems to medical imaging. This study analyzes the extensive architectural evolution of the YOLO series, the benchmark for this field, from its initial version to the cur rent YOLO26 model. Throughout the paper, structural transformations in the backbone, neck, and detection head components are examined chronologically. The review focuses on critical technical milestones, including the transition from anchor-based to anchor-free […]

Ver mais

Like 0

Liked Liked