digitado

A Regularized Actor-Critic Algorithm for Bi-Level Reinforcement Learning

digitado ⋅ 23 de January de 2026

We study a structured bi-level optimization problem where the upper-level objective is a smooth function and the lower-level problem is policy optimization in a Markov decision process (MDP). The upper-level decision variable parameterizes the reward of the lower-level MDP, and the upper-level objective depends on the optimal induced policy. Existing methods for bi-level optimization and RL often require second-order information, impose strong regularization at the lower level, or inefficiently use samples through nested-loop procedures. In this work, we […]

Ver mais

Like 0

Liked Liked

technocracy

CADO: From Imitation to Cost Minimization for Heatmap-based Solvers in Combinatorial Optimization

digitado ⋅ 10 de February de 2026

arXiv:2602.08210v1 Announce Type: cross Abstract: Heatmap-based solvers have emerged as a promising paradigm for Combinatorial Optimization (CO). However, we argue that the dominant Supervised Learning (SL) training paradigm suffers from a fundamental objective mismatch: minimizing imitation loss (e.g., cross-entropy) does not guarantee solution cost minimization. We dissect this mismatch into two deficiencies: Decoder-Blindness (being oblivious to the non-differentiable decoding process) and Cost-Blindness (prioritizing structural imitation over solution quality). We empirically demonstrate that these intrinsic flaws impose a hard […]

Ver mais

Like 0

Liked Liked

technocracy

A Lightweight Defense Mechanism against Next Generation of Phishing Emails using Distilled Attention-Augmented BiLSTM

digitado ⋅ 27 de February de 2026

arXiv:2602.22250v1 Announce Type: new Abstract: The current generation of large language models produces sophisticated social-engineering content that bypasses standard text screening systems in business communication platforms. Our proposed solution for mail gateway and endpoint deception detection operates in a privacy-protective manner while handling the performance requirements of network and mobile security systems. The MobileBERT teacher receives fine-tuning before its transformation into a BiLSTM model with multi-head attention which maintains semantic discrimination only with 4.5 million parameters. The hybrid […]

Ver mais

Like 0

Liked Liked

technocracy

Building a RL agent For Prince of persia(1989)

digitado ⋅ 9 de February de 2026

I’ve been working on a reinforcement learning project around the original Prince of Persia (1989) using SDLPoP. Instead of using raw pixels, I built a grid-based observation directly from the game state. Each room becomes a small multi-channel grid showing platforms, hazards, gates, exits, items, and character positions. The idea is to reduce the CNN’s burden of trying to understand interactable platforms and hazards from just a few pixels and instead give structured spatial information. On the action […]

Ver mais

Like 0

Liked Liked

technocracy

Digital Transformation Awards 2026 Now Open for Global Entries

digitado ⋅ 11 de March de 2026

London, United Kingdom | The Digital Transformation Awards an independent, global programme celebrating excellence in digital innovation, have officially opened entries for the 2026 Awards. The prestigious gala ceremony will take place on 16 June 2026 in London, United Kingdom, with the final entry deadline set for 14 May 2026. Recognising outstanding digital achievements across all industries, the awards honour businesses, teams, and individuals who have successfully leveraged technology to transform operations, enhance customer experiences, and drive meaningful cultural change. The programme is open to […]

Ver mais

Like 0

Liked Liked

technocracy

MirrorMark: A Distortion-Free Multi-Bit Watermark for Large Language Models

digitado ⋅ 2 de February de 2026

arXiv:2601.22246v1 Announce Type: new Abstract: As large language models (LLMs) become integral to applications such as question answering and content creation, reliable content attribution has become increasingly important. Watermarking is a promising approach, but existing methods either provide only binary signals or distort the sampling distribution, degrading text quality; distortion-free approaches, in turn, often suffer from weak detectability or robustness. We propose MirrorMark, a multi-bit and distortion-free watermark for LLMs. By mirroring sampling randomness in a measure-preserving manner, […]

Ver mais

Like 0

Liked Liked

technocracy

Statistical-Geometric Degeneracy in UAV Search: A Physics-Aware Asymmetric Filtering Approach

digitado ⋅ 19 de February de 2026

arXiv:2602.15893v1 Announce Type: new Abstract: Post-disaster survivor localization using Unmanned Aerial Vehicles (UAVs) faces a fundamental physical challenge: the prevalence of Non-Line-of-Sight (NLOS) propagation in collapsed structures. Unlike standard Gaussian noise, signal reflection from debris introduces strictly non-negative ranging biases. Existing robust estimators, typically designed with symmetric loss functions (e.g., Huber or Tukey), implicitly rely on the assumption of error symmetry. Consequently, they experience a theoretical mismatch in this regime, leading to a phenomenon we formally identify as […]

Ver mais

Like 0

Liked Liked

technocracy

PlotChain: Deterministic Checkpointed Evaluation of Multimodal LLMs on Engineering Plot Reading

digitado ⋅ 17 de February de 2026

arXiv:2602.13232v1 Announce Type: new Abstract: We present PlotChain, a deterministic, generator-based benchmark for evaluating multimodal large language models (MLLMs) on engineering plot reading-recovering quantitative values from classic plots (e.g., Bode/FFT, step response, stress-strain, pump curves) rather than OCR-only extraction or free-form captioning. PlotChain contains 15 plot families with 450 rendered plots (30 per family), where every item is produced from known parameters and paired with exact ground truth computed directly from the generating process. A central contribution is […]

Ver mais

Like 0

Liked Liked

technocracy

The Big Picture of AI Research

digitado ⋅ 18 de January de 2024

More papers on AI are published than ever before but each paper tends to only present its part of the picture—and it becomes difficult to recognize the larger story to which a paper is connected. To encourage the community to explore broader research narratives, we co-organized the Big Picture Workshop at EMNLP 2023. We received a number of high-quality submissions that distill important research topics, from narrative understanding to modern generation techniques. My favorite part of the workshop, however, were the invited talks. We had asked […]

Ver mais

Like 0

Liked Liked

technocracy

ICPO: Illocution-Calibrated Policy Optimization for Multi-Turn Conversation

digitado ⋅ 23 de January de 2026

arXiv:2601.15330v1 Announce Type: new Abstract: Large Language Models (LLMs) in multi-turn conversations often suffer from a “lost-in-conversation” phenomenon, where they struggle to recover from early incorrect assumptions, particularly when users provide ambiguous initial instructions. We find that standard post-training techniques like Reinforcement Learning with Verifiable Rewards (RLVR) exacerbate this issue by rewarding confident, direct answers, thereby inducing overconfidence and discouraging the model from seeking clarification. To address this, we propose Illocution-Calibrated Policy Optimization (ICPO), a novel training framework […]

Ver mais

Like 0

Liked Liked