technocracy

Alternating Reinforcement Learning with Contextual Rubric Rewards

digitado ⋅ 18 de March de 2026

arXiv:2603.15646v1 Announce Type: new Abstract: Reinforcement Learning with Rubric Rewards (RLRR) is a framework that extends conventional reinforcement learning from human feedback (RLHF) and verifiable rewards (RLVR) by replacing scalar preference signals with structured, multi-dimensional, contextual rubric-based evaluations. However, existing approaches in RLRR are limited to linearly compressing vector rewards into a scalar reward with a fixed weightings, which is sensitive to artificial score design and fails to capture correlations among reward dimensions. To overcome the limitations of […]

Ver mais

Like 0

Liked Liked

technocracy

Boosting Maximum Entropy Reinforcement Learning via One-Step Flow Matching

digitado ⋅ 2 de February de 2026

Diffusion policies are expressive yet incur high inference latency. Flow Matching (FM) enables one-step generation, but integrating it into Maximum Entropy Reinforcement Learning (MaxEnt RL) is challenging: the optimal policy is an intractable energy-based distribution, and the efficient log-likelihood estimation required to balance exploration and exploitation suffers from severe discretization bias. We propose textbf{F}low-based textbf{L}og-likelihood-textbf{A}ware textbf{M}aximum textbf{E}ntropy RL (textbf{FLAME}), a principled framework that addresses these challenges. First, we derive a Q-Reweighted FM objective that bypasses partition function estimation […]

Ver mais

Like 0

Liked Liked

technocracy

The TechBeat: Microsoft Generative AI Report: The 40 Most Disrupted Jobs & The 40 Most Secure Jobs (4/14/2026)

digitado ⋅ 14 de April de 2026

How are you, hacker? 🪐Want to know what’s trending right now?: The Techbeat by HackerNoon has got you covered with fresh content from our trending stories of the day! Set email preference here. ## AI Products Have Terrible UX: Here’s Why By @deeflect [ 8 Min read ] Most AI products have terrible UX – not because the AI is bad, but because no one who understands both AI and design is building them. Read More. OpenAI Bought […]

Ver mais

Like 0

Liked Liked

technocracy

GeMM-GAN: A Multimodal Generative Model Conditioned on Histopathology Images and Clinical Descriptions for Gene Expression Profile Generation

digitado ⋅ 23 de January de 2026

arXiv:2601.15392v1 Announce Type: new Abstract: Biomedical research increasingly relies on integrating diverse data modalities, including gene expression profiles, medical images, and clinical metadata. While medical images and clinical metadata are routinely collected in clinical practice, gene expression data presents unique challenges for widespread research use, mainly due to stringent privacy regulations and costly laboratory experiments. To address these limitations, we present GeMM-GAN, a novel Generative Adversarial Network conditioned on histopathology tissue slides and clinical metadata, designed to synthesize […]

Ver mais

Like 0

Liked Liked

technocracy

From Metadata to Meaning: A Semantic Units Knowledge Graph for the Biodiversity Exploratories

digitado ⋅ 6 de January de 2026

arXiv:2601.00002v1 Announce Type: new Abstract: Knowledge Graphs (KGs) bear great potential for ecology and biodiversity researchers in their ability to support synthesis and integration efforts, meta-analyses, reasoning tasks, and overall machine interoperability of research data. However, this potential is yet to be realized as KGs are notoriously difficult to interact with via their query language SPARQL for many user groups alike. Additionally, a further hindrance for user-KG interaction is the fundamental disconnect between user requirements and requirements KGs […]

Ver mais

Like 0

Liked Liked

technocracy

Learning Inference Concurrency in DynamicGate MLP Structural and Mathematical Justification

digitado ⋅ 15 de April de 2026

Conventional neural networks strictly separate learning and inference because if parameters are updated during inference, outputs become unstable and even the inference function itself is not well defined [1, 2, 3]. This paper shows that DynamicGate MLP structurally permits learning inference concurrency [4, 5]. The key idea is to separate routing (gating) parameters from representation (prediction) parameters, so that the gate can be adapted online while inference stability is preserved, or weights can be selectively updated only within […]

Ver mais

Like 0

Liked Liked

technocracy

Descriptor-Injected Cross-Modal Learning: A Systematic Exploration of Audio-MIDI Alignment via Spectral and Melodic Features

digitado ⋅ 11 de April de 2026

Cross-modal retrieval between audio recordings and symbolic music representations (MIDI) remains challenging because continuous waveforms and discrete event sequences encode different aspects of the same performance. We study descriptor injection, the augmentation of modality-specific encoders with hand-crafted domain features, as a bridge across this gap. In a three-phase campaign covering 13 descriptor-mechanism combinations, 6 architectural families, and 3 training schedules, the best configuration reaches a mean S of 84.0 percent across five independent seeds, improving the descriptor-free baseline […]

Ver mais

Like 0

Liked Liked

technocracy

Formulating Reinforcement Learning for Human-Robot Collaboration through Off-Policy Evaluation

digitado ⋅ 4 de February de 2026

arXiv:2602.02530v1 Announce Type: new Abstract: Reinforcement learning (RL) has the potential to transform real-world decision-making systems by enabling autonomous agents to learn from experience. Deploying RL in real-world settings, especially in the context of human-robot interaction, requires defining state representations and reward functions, which are critical for learning efficiency and policy performance. Traditional RL approaches often rely on domain expertise and trial-and-error, necessitating extensive human involvement as well as direct interaction with the environment, which can be costly […]

Ver mais

Like 0

Liked Liked

technocracy

Exahash, Zettahash, Yottahash

digitado ⋅ 22 de February de 2026

When I first heard of cryptographic hash functions, they were called “one-way functions” and seemed like a mild curiosity. I had no idea that one day the world would compute a mind-boggling number of hashes every second. Because Bitcoin mining requires computing hash functions to solve proof-of-work problems, the world currently computes around 1,000,000,000,000,000,000,000 hashes, one zettahash, per second. Other cryptocurrencies uses hash functions for proof-of-work as well, but they contribute a negligible amount of hashes per second […]

Ver mais

Like 0

Liked Liked

technocracy

Cross-talk based multi-task learning for fault classification of physically coupled machine system

digitado ⋅ 5 de February de 2026

Machine systems inherently generate signals in which fault conditions and various physical variables are physically coupled. Although many existing fault classification studies rely solely on direct fault labels, the aforementioned signals naturally embed additional information shaped by other physically coupled information. Herein, we leverage this coupling through a multi-task learning (MTL) framework that jointly learns fault conditions and the related physical variables. Among MTL architectures, crosstalk structures have distinct advantages because they allow for controlled information exchange between […]

Ver mais

Like 0

Liked Liked