Página de exemplo
Política de privacidade

Should reward functions always show a sigmoid function-like outcome?

Should reward functions always show a sigmoid function-like outcome?

digitado ⋅ 19 de June de 2026

Should reward functions always show a sigmoid function-like outcome?

My reward function looks more like this

Curious what you would use for inference as well. Of course going for the peak might be best in terms of reward but the model does not seem robust, whereas where it plateaus, the model may be more reliable.

submitted by /u/Markovvy
[link] [comments]

Like 0

Liked Liked

« Post Title » Encryption, spyware, and now Mythos: History shows why cyber export control doesn’t work

Search

Posts recentes

Salesforce CodeGen Tutorial: Generate, Validate, and Rerank Python Functions With Unit Tests and Safety Checks
Liquid AI Introduces LFM2.5-Embedding-350M and LFM2.5-ColBERT-350M: Dense Bi-Encoder and Late-Interaction Models for Fast Multilingual Search Across 11 Languages
VibeThinker-3B: A 3B Dense Reasoning Model Built on Qwen2.5-Coder-3B With the Spectrum-to-Signal Post-Training Pipeline
NVIDIA AI Introduce SpatialClaw: A Training-Free Agent That Treats Code as the Action Interface for Spatial Reasoning
When it comes to predicting people’s preferences, it pays to consider “the power of three”

Comentários

No comments to show.

Arquivos

Categorias

technocracy

Digitado © 2026