February 2026

Flow Actor-Critic for Offline Reinforcement Learning

digitado ⋅ 20 de February de 2026

The dataset distributions in offline reinforcement learning (RL) often exhibit complex and multi-modal distributions, necessitating expressive policies to capture such distributions beyond widely-used Gaussian policies. To handle such complex and multi-modal datasets, in this paper, we propose Flow Actor-Critic, a new actor-critic method for offline RL, based on recent flow policies. The proposed method not only uses the flow model for actor as in previous flow policies but also exploits the expressive flow model for conservative critic acquisition […]

Ver mais

Like 0

Liked Liked

technocracy

NIMMGen: Learning Neural-Integrated Mechanistic Digital Twins with LLMs

digitado ⋅ 20 de February de 2026

Mechanistic models encode scientific knowledge about dynamical systems and are widely used in downstream scientific and policy applications. Recent work has explored LLM-based agentic frameworks to automatically construct mechanistic models from data; however, existing problem settings substantially oversimplify real-world conditions, leaving it unclear whether LLM-generated mechanistic models are reliable in practice. To address this gap, we introduce the Neural-Integrated Mechanistic Modeling (NIMM) evaluation framework, which evaluates LLM-generated mechanistic models under realistic settings with partial observations and diversified task […]

Ver mais

Like 0

Liked Liked

technocracy

Read-Modify-Writable Snapshots from Read/Write operations

digitado ⋅ 20 de February de 2026

arXiv:2602.16903v1 Announce Type: new Abstract: In the context of asynchronous concurrent shared-memory systems, a snapshot algorithm allows failure-prone processes to concurrently and atomically write on the entries of a shared array MEM , and also atomically read the whole array. Recently, Read-Modify-Writable (RMWable) snapshot was proposed, a variant of snapshot that allows processes to perform operations more complex than just read and write, specifically, each entry MEM[k] is an arbitrary readable object. The known RMWable snapshot algorithms heavily […]

Ver mais

Like 0

Liked Liked

technocracy

LLM-WikiRace: Benchmarking Long-term Planning and Reasoning over Real-World Knowledge Graphs

digitado ⋅ 20 de February de 2026

arXiv:2602.16902v1 Announce Type: new Abstract: We introduce LLM-Wikirace, a benchmark for evaluating planning, reasoning, and world knowledge in large language models (LLMs). In LLM-Wikirace, models must efficiently navigate Wikipedia hyperlinks step by step to reach a target page from a given source, requiring look-ahead planning and the ability to reason about how concepts are connected in the real world. We evaluate a broad set of open- and closed-source models, including Gemini-3, GPT-5, and Claude Opus 4.5, which achieve […]

Ver mais

Like 0

Liked Liked

technocracy

AgentLAB: Benchmarking LLM Agents against Long-Horizon Attacks

digitado ⋅ 20 de February de 2026

arXiv:2602.16901v1 Announce Type: new Abstract: LLM agents are increasingly deployed in long-horizon, complex environments to solve challenging problems, but this expansion exposes them to long-horizon attacks that exploit multi-turn user-agent-environment interactions to achieve objectives infeasible in single-turn settings. To measure agent vulnerabilities to such risks, we present AgentLAB, the first benchmark dedicated to evaluating LLM agent susceptibility to adaptive, long-horizon attacks. Currently, AgentLAB supports five novel attack types including intent hijacking, tool chaining, task injection, objective drifting, and […]

Ver mais

Like 0

Liked Liked

technocracy

Evidotes: Integrating Scientific Evidence and Anecdotes to Support Uncertainties Triggered by Peer Health Posts

digitado ⋅ 20 de February de 2026

arXiv:2602.16900v1 Announce Type: new Abstract: Peer health posts surface new uncertainties, such as questions and concerns for readers. Prior work focused primarily on improving relevance and accuracy fails to address users’ diverse information needs and emotions triggered. Instead, we propose directly addressing these by information augmentation. We introduce Evidotes, an information support system that augments individual posts with relevant scientific and anecdotal information retrieved using three user-selectable lenses (dive deeper, focus on positivity, and big picture). In a […]

Ver mais

Like 0

Liked Liked

technocracy

MALLVI: a multi agent framework for integrated generalized robotics manipulation

digitado ⋅ 20 de February de 2026

arXiv:2602.16898v1 Announce Type: new Abstract: Task planning for robotic manipulation with large language models (LLMs) is an emerging area. Prior approaches rely on specialized models, fine tuning, or prompt tuning, and often operate in an open loop manner without robust environmental feedback, making them fragile in dynamic settings.We present MALLVi, a Multi Agent Large Language and Vision framework that enables closed loop feedback driven robotic manipulation. Given a natural language instruction and an image of the environment, MALLVi […]

Ver mais

Like 0

Liked Liked

technocracy

Connecting the Dots: Surfacing Structure in Documents through AI-Generated Cross-Modal Links

digitado ⋅ 20 de February de 2026

arXiv:2602.16895v1 Announce Type: new Abstract: Understanding information-dense documents like recipes and scientific papers requires readers to find, interpret, and connect details scattered across text, figures, tables, and other visual elements. These documents are often long and filled with specialized terminology, hindering the ability to locate relevant information or piece together related ideas. Existing tools offer limited support for synthesizing information across media types. As a result, understanding complex material remains cognitively demanding. This paper presents a framework for […]

Ver mais

Like 0

Liked Liked

technocracy

CreateAI Insights from an NSF Workshop on K12 Students, Teachers, and Families as Designers of Artificial Intelligence and Machine Learning Applications

digitado ⋅ 20 de February de 2026

arXiv:2602.16894v1 Announce Type: new Abstract: In response to the exponential growth in the use of artificial intelligence and machine learning applications, educators, researchers and policymakers have taken steps to integrate artificial intelligence applications into K-12 education. Among these efforts, one equally important approach has received little, if any attention: What if students and teachers were not just learning to be competent users of AI but also its creators? This question is at the heart of CreateAI in which […]

Ver mais

Like 0

Liked Liked

technocracy

CalmReminder: A Design Probe for Parental Engagement with Children with Hyperactivity, Augmented by Real-Time Motion Sensing with a Watch

digitado ⋅ 20 de February de 2026

arXiv:2602.16893v1 Announce Type: new Abstract: Families raising children with ADHD often experience heightened stress and reactive parenting. While digital interventions promise personalization, many remain one-size-fits-all and fail to reflect parents’ lived practices. We present CalmReminder, a watch-based system that detects children’s calm moments and delivers just-in-time prompts to parents. Through a four-week deployment with 16 families (twelve completed) of children with ADHD, we compared notification strategies ranging from hourly to random to only when the child was inferred […]

Ver mais

Like 0

Liked Liked