digitado

CoBA-RL: Capability-Oriented Budget Allocation for Reinforcement Learning in LLMs

digitado ⋅ 3 de February de 2026

Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a key approach for enhancing LLM reasoning. However, standard frameworks like Group Relative Policy Optimization (GRPO) typically employ a uniform rollout budget, leading to resource inefficiency. Moreover, existing adaptive methods often rely on instance-level metrics, such as task pass rates, failing to capture the model’s dynamic learning state. To address these limitations, we propose CoBA-RL, a reinforcement learning algorithm designed to adaptively allocate rollout budgets based on the model’s […]

Ver mais

Like 0

Liked Liked

technocracy

Vibe Coding Didn’t Break These Startups. Vibe Deploying Did!

digitado ⋅ 6 de March de 2026

7 real incidents, what actually went wrong, and the simple rules that would have prevented all of them. Continue reading on Towards AI »

Ver mais

Like 0

Liked Liked

technocracy

STaR: Scalable Task-Conditioned Retrieval for Long-Horizon Multimodal Robot Memory

digitado ⋅ 11 de February de 2026

arXiv:2602.09255v1 Announce Type: new Abstract: Mobile robots are often deployed over long durations in diverse open, dynamic scenes, including indoor setting such as warehouses and manufacturing facilities, and outdoor settings such as agricultural and roadway operations. A core challenge is to build a scalable long-horizon memory that supports an agentic workflow for planning, retrieval, and reasoning over open-ended instructions at variable granularity, while producing precise, actionable answers for navigation. We present STaR, an agentic reasoning framework that (i) […]

Ver mais

Like 0

Liked Liked

technocracy

Energy-Based Injury Protection Database: Including Shearing Contact Thresholds for Hand and Finger Using Porcine Surrogates

digitado ⋅ 25 de February de 2026

arXiv:2602.20362v1 Announce Type: new Abstract: While robotics research continues to propose strategies for collision avoidance in human-robot interaction, the reality of constrained environments and future humanoid systems makes contact inevitable. To mitigate injury risks, energy-constraining control approaches are commonly used, often relying on safety thresholds derived from blunt impact data in EN ISO 10218-2:2025. However, this dataset does not extend to edged or pointed collisions. Without scalable, clinically grounded datasets covering diverse contact scenarios, safety validation remains limited. […]

Ver mais

Like 0

Liked Liked

technocracy

Approximately Optimal Global Planning for Contact-Rich SE(2) Manipulation on a Graph of Reachable Sets

digitado ⋅ 19 de January de 2026

arXiv:2601.10827v1 Announce Type: new Abstract: If we consider human manipulation, it is clear that contact-rich manipulation (CRM)-the ability to use any surface of the manipulator to make contact with objects-can be far more efficient and natural than relying solely on end-effectors (i.e., fingertips). However, state-of-the-art model-based planners for CRM are still focused on feasibility rather than optimality, limiting their ability to fully exploit CRM’s advantages. We introduce a new paradigm that computes approximately optimal manipulator plans. This approach […]

Ver mais

Like 0

Liked Liked

technocracy

Lost in Execution: On the Multilingual Robustness of Tool Calling in Large Language Models

digitado ⋅ 12 de January de 2026

arXiv:2601.05366v1 Announce Type: new Abstract: Large Language Models (LLMs) are increasingly deployed as agents that invoke external tools through structured function calls. While recent work reports strong tool-calling performance under standard English-centric evaluations, the robustness of tool calling under multilingual user interactions remains underexplored. In this work, we introduce MLCL, a diagnostic benchmark, and conduct a systematic evaluation of multilingual tool calling across Chinese, Hindi, and the low-resource language Igbo. Through fine-grained error analysis, we show that many […]

Ver mais

Like 0

Liked Liked

technocracy

Quoting David Crawshaw

digitado ⋅ 7 de February de 2026

I am having more fun programming than I ever have, because so many more of the programs I wish I could find the time to write actually exist. I wish I could share this joy with the people who are fearful about the changes agents are bringing. The fear itself I understand, I have fear more broadly about what the end-game is for intelligence on tap in our society. But in the limited domain of writing computer programs […]

Ver mais

Like 0

Liked Liked

technocracy

LLMs as In-Context Meta-Learners for Model and Hyperparameter Selection

digitado ⋅ 16 de February de 2026

arXiv:2510.26510v3 Announce Type: replace-cross Abstract: Model and hyperparameter selection are critical but challenging in machine learning, typically requiring expert intuition or expensive automated search. We investigate whether large language models (LLMs) can act as in-context meta-learners for this task. By converting each dataset into interpretable metadata, we prompt an LLM to recommend both model families and hyperparameters. We study two prompting strategies: (1) a zero-shot mode relying solely on pretrained knowledge, and (2) a meta-informed mode augmented with […]

Ver mais

Like 0

Liked Liked

technocracy

SpaceX Acquires xAI: Elon Musk’s Biggest Bet on AI Beyond Earth

digitado ⋅ 3 de February de 2026

Elon Musk, the richest person in this world, is the owner of six different tech and engineering companies that are dedicated to five different things. First up, you’ve Tesla, which is an electric vehicle and energy company. Next up, you’ve SpaceX (aerospace and satellite internet), X (formerly Twitter), Neuralink (brain-computer interfaces), The Boring Company (tunnels), and xAI (artificial intelligence). Speaking of xAI, it has now been acquired by SpaceX. Yes, that’s the same SpaceX that I mentioned above. […]

Ver mais

Like 0

Liked Liked

technocracy

Not All Negative Samples Are Equal: LLMs Learn Better from Plausible Reasoning

digitado ⋅ 3 de February de 2026

Learning from negative samples holds great promise for improving Large Language Model (LLM) reasoning capability, yet existing methods treat all incorrect responses as equally informative, overlooking the crucial role of sample quality. To address this, we propose Plausible Negative Samples (PNS), a method that synthesizes high-quality negative samples exhibiting expected format and structural coherence while ultimately yielding incorrect answers. PNS trains a dedicated model via reverse reinforcement learning (RL) guided by a composite reward combining format compliance, accuracy […]

Ver mais

Like 0

Liked Liked