digitado

Yahtzee: Reinforcement Learning Techniques for Stochastic Combinatorial Games

digitado ⋅ 6 de January de 2026

arXiv:2601.00007v1 Announce Type: new Abstract: Yahtzee is a classic dice game with a stochastic, combinatorial structure and delayed rewards, making it an interesting mid-scale RL benchmark. While an optimal policy for solitaire Yahtzee can be computed using dynamic programming methods, multiplayer is intractable, motivating approximation methods. We formulate Yahtzee as a Markov Decision Process (MDP), and train self-play agents using various policy gradient methods: REINFORCE, Advantage Actor-Critic (A2C), and Proximal Policy Optimization (PPO), all using a multi-headed network […]

Ver mais

Like 0

Liked Liked

technocracy

The Causal Impact of Tool Affordance on Safety Alignment in LLM Agents

digitado ⋅ 24 de March de 2026

arXiv:2603.20320v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly deployed as agents with access to executable tools, enabling direct interaction with external systems. However, most safety evaluations remain text-centric and assume that compliant language implies safe behavior, an assumption that becomes unreliable once models are allowed to act. In this work, we empirically examine how executable tool affordance alters safety alignment in LLM agents using a paired evaluation framework that compares text-only chatbot behavior with tool-enabled […]

Ver mais

Like 0

Liked Liked

technocracy

E2HiL: Entropy-Guided Sample Selection for Efficient Real-World Human-in-the-Loop Reinforcement Learning

digitado ⋅ 29 de January de 2026

arXiv:2601.19969v1 Announce Type: new Abstract: Human-in-the-loop guidance has emerged as an effective approach for enabling faster convergence in online reinforcement learning (RL) of complex real-world manipulation tasks. However, existing human-in-the-loop RL (HiL-RL) frameworks often suffer from low sample efficiency, requiring substantial human interventions to achieve convergence and thereby leading to high labor costs. To address this, we propose a sample-efficient real-world human-in-the-loop RL framework named method, which requires fewer human intervention by actively selecting informative samples. Specifically, stable […]

Ver mais

Like 0

Liked Liked

technocracy

Retrieval Heads are Dynamic

digitado ⋅ 13 de February de 2026

arXiv:2602.11162v1 Announce Type: new Abstract: Recent studies have identified “retrieval heads” in Large Language Models (LLMs) responsible for extracting information from input contexts. However, prior works largely rely on static statistics aggregated across datasets, identifying heads that perform retrieval on average. This perspective overlooks the fine-grained temporal dynamics of autoregressive generation. In this paper, we investigate retrieval heads from a dynamic perspective. Through extensive analysis, we establish three core claims: (1) Dynamism: Retrieval heads vary dynamically across timesteps; […]

Ver mais

Like 0

Liked Liked

technocracy

Securing AI systems under today’s and tomorrow’s conditions

digitado ⋅ 24 de March de 2026

Evidence cited in an eBook titled “AI Quantum Resilience”, published by Utimaco [email wall], shows organisations consider security risks as the leading barrier to effective adoption of AI on data they hold. AI’s value depends on data amassed by an organisation. However, there are security risks to building models and training them on that data. These risks are in addition to better-publicised threats to intellectual property that exist around the point of inference (prompt engineering, for example). The […]

Ver mais

Like 0

Liked Liked

technocracy

Converting XQuery to SQL with Local LLMs: Do I Need Fine-Tuning or a Better Approach? [P]

digitado ⋅ 19 de April de 2026

I am trying to convert XQuery statements into SQL queries within an enterprise context, with the constraint that the solution must rely on locally run LLMs. A key challenge is the limited availability of training data (pairs of XQueries and their corresponding SQL queries), especially with enough diversity to cover different patterns. I initially experimented with a parsing-based approach. The idea was to extract elements such as table names, columns, and conditions from the XQuery (using a […]

Ver mais

Like 0

Liked Liked

technocracy

RoSLAC: Robust Simultaneous Localization and Calibration of Multiple Magnetometers

digitado ⋅ 18 de April de 2026

arXiv:2604.14353v1 Announce Type: new Abstract: Localization of autonomous mobile robots (AMRs) in enclosed or semi-enclosed environments such as offices, hotels, hospitals, indoor parking facilities, and underground spaces where GPS signals are weak or unavailable remains a major obstacle to the deployment of fully autonomous systems. Infrastructure-based localization approaches, such as QR codes and RFID, are constrained by high installation and maintenance costs as well as limited flexibility, while onboard sensor-based methods, including LiDAR- and vision-based solutions, are affected […]

Ver mais

Like 0

Liked Liked

technocracy

Limits of Residual-Based Detection for Physically Consistent False Data Injection

digitado ⋅ 12 de February de 2026

arXiv:2602.10162v1 Announce Type: new Abstract: False data injection attacks (FDIAs) pose a persistent challenge to AC power system state estimation. In current practice, detection relies primarily on topology-aware residual-based tests that assume malicious measurements can be distinguished from normal operation through physical inconsistency reflected in abnormal residual behavior. This paper shows that this assumption does not always hold: when FDIA scenarios produce manipulated measurements that remain on the measurement manifold induced by AC power flow relations and measurement […]

Ver mais

Like 0

Liked Liked

technocracy

Next project doubt

digitado ⋅ 6 de February de 2026

I think I have 2 options for my next project , either build something like my passion project to showcase my skills or build a project that solves a real problem but I won’t be able to show my skills as much as the latter .. which do you think should be more impactful and good for portfolio(Rl-project) and tbh I can only create a protype I was thinking some rl project for my college .. or do […]

Ver mais

Like 0

Liked Liked

technocracy

52-Hz Whale Song: An Embodied VR Experience for Exploring Misunderstanding and Empathy

digitado ⋅ 25 de February de 2026

arXiv:2602.20348v1 Announce Type: new Abstract: Experiences of being misunderstood often stem not from a lack of voice, but from mismatches between how individuals express themselves and how others listen. Such communicative mismatches arise across many social settings, including situations involving linguistic and cultural displacement. While prior HCI research has explored empathy through virtual reality, many approaches rely on narrative explanation, positioning users as observers rather than embodied participants. We present 52-Hz Whale Song, an embodied VR experience that […]

Ver mais

Like 0

Liked Liked