digitado – Page 112

Preventing Learning Stagnation in PPO by Scaling to 1 Million Parallel Environments

digitado ⋅ 6 de March de 2026

Plateaus, where an agent’s performance stagnates at a suboptimal level, are a common problem in deep on-policy RL. Focusing on PPO due to its widespread adoption, we show that plateaus in certain regimes arise not because of known exploration, capacity, or optimization challenges, but because sample-based estimates of the loss eventually become poor proxies for the true objective over the course of training. As a recap, PPO switches between sampling rollouts from several parallel environments online using the […]

Ver mais

Like 0

Liked Liked

technocracy

Ross Ashby: Design for a Brain: The Origin of Adaptive Behavior

digitado ⋅ 8 de November de 2024

W. Ross Ashby argues that the brain’s genius lies not in mystical purpose, but in mechanism. It is a self-tuning machine, a web of feedback loops constantly testing and correcting itself—adapting not by foresight but by iteration, by trial, by the quiet mathematics of survival. Through ideas like essential variables, homeostasis, and adaptive mechanisms, Ashby shows how the mind sustains its delicate balance amid chaos—how order, astonishingly, can arise from pure mechanics.

Ver mais

Like 0

Liked Liked

technocracy

How You Can Test Your Kids’ Smart Toys For Privacy

digitado ⋅ 24 de January de 2026

The Markup, now a part of CalMatters, uses investigative reporting, data analysis, and software engineering to challenge technology to serve the public good. Sign up for Klaxon, a newsletter that delivers our stories and tools directly to your inbox. It’s a new era in entertainment for kids. Everywhere you look, new toys and devices are marketed toward children with smart features. Gift guides and store shelves tout Bluetooth- and Wi-Fi-enabled devices for kids, which promise an iPad-native generation […]

Ver mais

Like 0

Liked Liked

technocracy

An artificial intelligence framework for end-to-end rare disease phenotyping from clinical notes using large language models

digitado ⋅ 25 de February de 2026

arXiv:2602.20324v1 Announce Type: new Abstract: Phenotyping is fundamental to rare disease diagnosis, but manual curation of structured phenotypes from clinical notes is labor-intensive and difficult to scale. Existing artificial intelligence approaches typically optimize individual components of phenotyping but do not operationalize the full clinical workflow of extracting features from clinical text, standardizing them to Human Phenotype Ontology (HPO) terms, and prioritizing diagnostically informative HPO terms. We developed RARE-PHENIX, an end-to-end AI framework for rare disease phenotyping that integrates […]

Ver mais

Like 0

Liked Liked

technocracy

Small-Margin Preferences Still Matter-If You Train Them Right

digitado ⋅ 3 de February de 2026

arXiv:2602.00954v1 Announce Type: new Abstract: Preference optimization methods such as DPO align large language models (LLMs) using paired comparisons, but their effectiveness can be highly sensitive to the quality and difficulty of preference pairs. A common heuristic treats small-margin (ambiguous) pairs as noisy and filters them out. In this paper, we revisit this assumption and show that pair difficulty interacts strongly with the optimization objective: when trained with preference-based losses, difficult pairs can destabilize training and harm alignment, […]

Ver mais

Like 0

Liked Liked

technocracy

Probabilistic Retrofitting of Learned Simulators

digitado ⋅ 2 de March de 2026

Dominant approaches for modelling Partial Differential Equations (PDEs) rely on deterministic predictions, yet many physical systems of interest are inherently chaotic and uncertain. While training probabilistic models from scratch is possible, it is computationally expensive and fails to leverage the significant resources already invested in high-performing deterministic backbones. In this work, we adopt a training-efficient strategy to transform pre-trained deterministic models into probabilistic ones via retrofitting with a proper scoring rule: the Continuous Ranked Probability Score (CRPS). Crucially, […]

Ver mais

Like 0

Liked Liked

technocracy

Project Idea: Learning Origami Folding Strategies via Reinforcement Learning

digitado ⋅ 5 de February de 2026

I am taking a course on reinforcement learning and to pass the exam I need to propose and implement a project. After some thought, I came up with the idea of applying reinforcement learning to the problem of finding a sequence of actions, specifically, paper folds, that transform a flat sheet of paper into a desired target shape, given an origami model. It is a kind of inverse kinematics problem, but instead of robots, it is for sheets […]

Ver mais

Like 0

Liked Liked

technocracy

Introducing Chronos-2: From univariate to universal forecasting

digitado ⋅ 20 de October de 2025

Introducing Chronos-2: From univariate to universal forecasting In-context learning enables a model that can solve forecasting tasks with an arbitrary number of dimensions in a zero-shot manner. Machine learning Abdul Fatir Ansari Oleksandr Shchur Jaris Kken October 20, 09:53 AM January 06, 09:54 AM Time series forecasting is crucial to numerous applications in business, science, and engineering. Recently, foundation models have led to a paradigm shift in time series forecasting. Unlike statistical models, which extrapolate from a single […]

Ver mais

Like 0

Liked Liked

technocracy

Computing Perfect Bayesian Equilibria, with Application to Empirical Game-Theoretic Analysis

digitado ⋅ 18 de February de 2026

arXiv:2602.15233v1 Announce Type: new Abstract: Perfect Bayesian Equilibrium (PBE) is a refinement of the Nash equilibrium for imperfect-information extensive-form games (EFGs) that enforces consistency between the two components of a solution: agents’ strategy profile describing their decisions at information sets and the belief system quantifying their uncertainty over histories within an information set. We present a scalable approach for computing a PBE of an arbitrary two-player EFG. We adopt the definition of PBE enunciated by Bonanno in 2011 […]

Ver mais

Like 0

Liked Liked

technocracy

User Acceptance Analysis of a Static-Dynamic Employment Recommendation System for Computer Science Graduates

digitado ⋅ 9 de April de 2026

Employment recommendation systems are increasingly used to support graduate job matching. However, limited research has examined how graduating computer science students perceive and respond to a proposed employment recommendation approach that combines static information matching with dynamic interactive functions. Drawing on the Technology Acceptance Model (TAM) and Information System (IS) Success Model, this study conducted a questionnaire-based survey of 386 graduating students and included an exploratory assessment of the questionnaire’s internal consistency and construct structure. The findings show […]

Ver mais

Like 0

Liked Liked