technocracy

IPD: Boosting Sequential Policy with Imaginary Planning Distillation in Offline Reinforcement Learning

digitado ⋅ 4 de March de 2026

Decision transformer based sequential policies have emerged as a powerful paradigm in offline reinforcement learning (RL), yet their efficacy remains constrained by the quality of static datasets and inherent architectural limitations. Specifically, these models often struggle to effectively integrate suboptimal experiences and fail to explicitly plan for an optimal policy. To bridge this gap, we propose textbf{Imaginary Planning Distillation (IPD)}, a novel framework that seamlessly incorporates offline planning into data generation, supervised training, and online inference. Our framework […]

Ver mais

Like 0

Liked Liked

technocracy

From Coder to Engineer: The Hidden Skills Every Developer Needs (But Nobody Teaches You)

digitado ⋅ 7 de March de 2026

Why writing code is only 20% of the job — and what the other 80% actually looks like Meet Amith. Amith is a smart developer. He can write Python functions, debug logic errors, and push features under pressure. But right before a major deadline, his project breaks — and he has no idea how to recover. His project folder looks like this: numbers.pynumbers1.pynumbers2.pynumbers3.pynumbers_final.pynumbers_final_v2.pynumbers_ACTUAL_FINAL.py Sound familiar? Amith isn’t struggling because he’s a bad programmer. He’s struggling because nobody taught him software engineering — and those are two very […]

Ver mais

Like 0

Liked Liked

technocracy

On the Role of DAG topology in Energy-Aware Cloud Scheduling : A GNN-Based Deep Reinforcement Learning Approach

digitado ⋅ 10 de April de 2026

Cloud providers must assign heterogeneous compute resources to workflow DAGs while balancing competing objectives such as completion time, cost, and energy consumption. In this work, we study a single-workflow, queue-free scheduling setting and consider a graph neural network (GNN)-based deep reinforcement learning scheduler designed to minimize workflow completion time and energy usage. We identify specific out-of-distribution (OOD) conditions under which GNN-based deep reinforcement learning schedulers fail and provide a principled explanation of why these failures occur. Through controlled […]

Ver mais

Like 0

Liked Liked

technocracy

Continuous-Utility Direct Preference Optimization

digitado ⋅ 3 de February de 2026

arXiv:2602.00931v1 Announce Type: new Abstract: Large language model reasoning is often treated as a monolithic capability, relying on binary preference supervision that fails to capture partial progress or fine-grained reasoning quality. We introduce Continuous Utility Direct Preference Optimization (CU-DPO), a framework that aligns models to a portfolio of prompt-based cognitive strategies by replacing binary labels with continuous scores that capture fine-grained reasoning quality. We prove that learning with K strategies yields a Theta(K log K) improvement in sample […]

Ver mais

Like 0

Liked Liked

technocracy

EFF Tells Patent Office: Don’t Cut the Public Out of Patent Review

digitado ⋅ 8 de December de 2025

EFF has submitted its formal comment to the U.S. Patent and Trademark Office (USPTO) opposing a set of proposed rules that would sharply restrict the public’s ability to challenge wrongly granted patents. These rules would make inter partes review (IPR)—the main tool Congress created to fix improperly granted patents—unavailable in most of the situations where it’s needed most. If adopted, they would give patent trolls exactly what they want: a way to keep questionable patents alive and out […]

Ver mais

Like 0

Liked Liked

technocracy

The life of a prescription at Amazon Pharmacy

digitado ⋅ 30 de September de 2024

The life of a prescription at Amazon Pharmacy From pricing estimation and regulatory compliance to inventory management and chatbot assistants, machine learning models help Amazon Pharmacy customers stay healthy and save time and money. Conversational AI Alexandre Alves Anita Vila September 30, 01:32 PM October 02, 11:42 AM Pharmacies play a vital role in ensuring patients health, but the process of dispensing medications is far more complex than it may appear. At Amazon Pharmacy, we are using artificial […]

Ver mais

Like 0

Liked Liked

technocracy

RLVR for code execution prediction

digitado ⋅ 27 de February de 2026

Hi everyone, I’m currently training a small language model to improve its accuracy on code execution prediction (i.e., predicting the exact output from the code and input). I’m working with the Qwen3-4B model and have been using GRPO for training. By combining various dense reward signals, I was able to increase the accuracy to around 72%. This approach also helped eliminate the infinite Repeat Curse(a common problem in smaller Qwen models), and overall training has been stable and […]

Ver mais

Like 0

Liked Liked

technocracy

CausalEvolve: Towards Open-Ended Discovery with Causal Scratchpad

digitado ⋅ 31 de March de 2026

arXiv:2603.14575v2 Announce Type: replace-cross Abstract: Evolve-based agent such as AlphaEvolve is one of the notable successes in using Large Language Models (LLMs) to build AI Scientists. These agents tackle open-ended scientific problems by iteratively improving and evolving programs, leveraging the prior knowledge and reasoning capabilities of LLMs. Despite the success, existing evolve-based agents lack targeted guidance for evolution and effective mechanisms for organizing and utilizing knowledge acquired from past evolutionary experience. Consequently, they suffer from decreasing evolution efficiency […]

Ver mais

Like 0

Liked Liked

technocracy

MoBiQuant: Mixture-of-Bits Quantization for Token-Adaptive Elastic LLMs

digitado ⋅ 25 de February de 2026

arXiv:2602.20191v1 Announce Type: new Abstract: Changing runtime complexity on cloud and edge devices necessitates elastic large language model (LLM) deployment, where an LLM can be inferred with various quantization precisions based on available computational resources. However, it has been observed that the calibration parameters for quantization are typically linked to specific precisions, which presents challenges during elastic-precision calibration and precision switching at runtime. In this work, we attribute the source of varying calibration parameters to the varying token-level […]

Ver mais

Like 0

Liked Liked

technocracy

Vernor Vinge: The Coming Technological Singularity: How to Survive in the Post-Human Era

digitado ⋅ 11 de August de 2025

Within thirty years, we will have the technological means to create superhuman intelligence. Shortly after, the human era will be ended. Is such progress avoidable? If not to be avoided, can events be guided so that we may survive? These questions are investigated. Some possible answers (and some further dangers) are presented.

Ver mais

Like 0

Liked Liked