digitado

Discern Truth from Falsehood: Reducing Over-Refusal via Contrastive Refinement

digitado ⋅ 5 de March de 2026

arXiv:2603.03323v1 Announce Type: new Abstract: Large language models (LLMs) aligned for safety often suffer from over-refusal, the tendency to reject seemingly toxic or benign prompts by misclassifying them as toxic. This behavior undermines models’ helpfulness and restricts usability in sensitive or nuanced contexts. While prior work has proposed mitigation strategies such as data augmentation and activation steering, these approaches often face a trade-off: reducing over-refusal typically degrades the model’s ability to reject genuinely harmful content. We argue that […]

Ver mais

Like 0

Liked Liked

technocracy

Adaptive Double-Booking Strategy for Outpatient Scheduling Using Multi-Objective Reinforcement Learning

digitado ⋅ 7 de March de 2026

Patient no-shows disrupt outpatient clinic operations, reduce productivity, and may delay necessary care. Clinics often adopt overbooking or double-booking to mitigate these effects. However, poorly calibrated policies can increase congestion and waiting times. Most existing methods rely on fixed heuristics and fail to adapt to real-time scheduling conditions or patient-specific no-show risk. To address these limitations, we propose an adaptive outpatient double-booking framework that integrates individualized no-show prediction with multi-objective reinforcement learning. The scheduling problem is formulated as […]

Ver mais

Like 0

Liked Liked

technocracy

Beyond the Next Port: A Multi-Task Transformer for Forecasting Future Voyage Segment Durations

digitado ⋅ 14 de January de 2026

arXiv:2601.08013v1 Announce Type: new Abstract: Accurate forecasts of segment-level sailing durations are fundamental to enhancing maritime schedule reliability and optimizing long-term port operations. However, conventional estimated time of arrival (ETA) models are primarily designed for the immediate next port of call and rely heavily on real-time automatic identification system (AIS) data, which is inherently unavailable for future voyage segments. To address this gap, the study reformulates future-port ETA prediction as a segment-level time-series forecasting problem. We develop a […]

Ver mais

Like 0

Liked Liked

technocracy

Introduction to Tiny ML

digitado ⋅ 7 de June de 2023

TinyML – machine learning in embedded devices – is actively gaining popularity. According to ABI Research’s latest whitepaper, TinyML device shipments will grow to 2.5 Billion in 2030, up from 15 million in 2020. With the vast proliferation of intelligent devices, it has become vital to equip them with machine-learning capabilities that run smoothly and fast, even in limited memory and computational power conditions. This post will explain how TinyML works and why it is important. TinyML Vs. […]

Ver mais

Like 0

Liked Liked

technocracy

CD-PIM: A High-Bandwidth and Compute-Efficient LPDDR5-Based PIM for Low-Batch LLM Acceleration on Edge-Device

digitado ⋅ 21 de January de 2026

arXiv:2601.12298v1 Announce Type: new Abstract: Edge deployment of low-batch large language models (LLMs) faces critical memory bandwidth bottlenecks when executing memory-intensive general matrix-vector multiplications (GEMV) operations. While digital processing-in-memory (PIM) architectures promise to accelerate GEMV operations, existing PIM-equipped edge devices still suffer from three key limitations: limited bandwidth improvement, component under-utilization in mixed workloads, and low compute capacity of computing units (CUs). In this paper, we propose CD-PIM to address these challenges through three key innovations. First, we […]

Ver mais

Like 0

Liked Liked

technocracy

E3VA: Enhancing Emotional Expressiveness in Virtual Conversational Agents

digitado ⋅ 27 de February de 2026

arXiv:2602.22362v1 Announce Type: new Abstract: With the advent of generative AI and large language models, embodied conversational agents are becoming synonymous with online interactions. These agents possess vast amounts of knowledge but suffer from exhibiting limited emotional expressiveness. Without adequate expressions, agents might fail to adapt to users’ emotions, which may result in a sub-optimal user experience and engagement. Most current systems prioritize content based responses, neglecting the emotional context of conversations. Research in this space is currently […]

Ver mais

Like 0

Liked Liked

technocracy

Unconditionally Long-Time Stable Variable-Step Second-Order Exponential Time-Differencing Schemes for the Incompressible NSE

digitado ⋅ 12 de February de 2026

arXiv:2602.10268v1 Announce Type: new Abstract: We develop an efficient, unconditionally stable, variable step second order exponential time differencing scheme for the incompressible Navier Stokes equations in two and three spatial dimensions under periodic boundary conditions, together with an embedded adaptive time stepping variant. The scheme is unconditionally uniform in time stable in the sense that the numerical solution admits a time uniform bound in Linfinity over time with values in L2 to the power d whenever the external […]

Ver mais

Like 0

Liked Liked

technocracy

Rethinking Soft Compression in Retrieval-Augmented Generation: A Query-Conditioned Selector Perspective

digitado ⋅ 19 de February de 2026

arXiv:2602.15856v1 Announce Type: new Abstract: Retrieval-Augmented Generation (RAG) effectively grounds Large Language Models (LLMs) with external knowledge and is widely applied to Web-related tasks. However, its scalability is hindered by excessive context length and redundant retrievals. Recent research on soft context compression aims to address this by encoding long documents into compact embeddings, yet they often underperform non-compressed RAG due to their reliance on auto-encoder-like full-compression that forces the encoder to compress all document information regardless of relevance […]

Ver mais

Like 0

Liked Liked

technocracy

Flatter Tokens are More Valuable for Speculative Draft Model Training

digitado ⋅ 28 de January de 2026

arXiv:2601.18902v1 Announce Type: new Abstract: Speculative Decoding (SD) is a key technique for accelerating Large Language Model (LLM) inference, but it typically requires training a draft model on a large dataset. We approach this problem from a data-centric perspective, finding that not all training samples contribute equally to the SD acceptance rate. Specifically, our theoretical analysis and empirical validation reveals that tokens inducing flatter predictive distributions from the target model are more valuable than those yielding sharply peaked […]

Ver mais

Like 0

Liked Liked

technocracy

Accelerating Large Language Model Inference with Self-Supervised Early Exits

digitado ⋅ 13 de February de 2026

arXiv:2407.21082v2 Announce Type: replace-cross Abstract: This paper presents a modular approach to accelerate inference in large language models (LLMs) by adding early exit heads at intermediate transformer layers. Each head is trained in a self-supervised manner to mimic the main model’s predictions, allowing computation to stop early when a calibrated confidence threshold is reached. We evaluate several confidence metrics and show that entropy provides the most reliable separation between correct and incorrect predictions. Experiments on the Pythia model […]

Ver mais

Like 0

Liked Liked