Principles and Values

digitado ⋅ 17 de February de 2026

Let me start off by saying “I just started studying RL and I don’t know if what I’m going to describe is a thing or if there’s an analogue to it in the DL world”.

Now, onto the idea:

Humans have an ability to know right from wrong and have a general sense of what’s good for them and what’s bad. Even babies seem to behave in a way that indicates this knowledge.

eg. babies preferring helpers over hinderers, avoiding bad actors or liking punishers of bad actors, being surprised at unfair distribution etc.

What we’re born with is just a set of principles and values. A sort of guidebook compiled from years of human experiences. Like, helping others because you know the bond formed after helping would be very beneficial later. This is why early communities formed (the sum of individual output is far lesser than the output of a organisation consisting of those individuals). This output (safety, increased quality of goods/services due to specialisation, etc.) was the reward.

The observation: “Humans can produce reward for themselves at will”. Your nervous system calming down when you say who/what you’re grateful for, that good feeling you get after you’ve helped someone (say donated money to the needy), etc. You recall what you’d done and feel proud of it (the reward). No eyes on you, there are no external rewards, it’s just you taking that decision consciously that doing this was good and was a reward in itself. Similarly, for when you do bad, you feel guilty and sad. That’s something primitive at play. I propose that this is the most prominent outcome of the evolutionary system. These principles and values that are inherent to us, notions of good and bad developed over generations. These are what drive the above mentioned self-reward mechanisms. When you choose to reward yourself (be proud of, tingly feeling when you list things you’re grateful for, etc.) or punish yourself (feeling guilty when you do some harm maybe), your biology is being guided by this primitive values-based system.

Coming back to RL, are there any systems/architectures that help incorporate the general ideas of something being good or bad for its current state so that the model itself can take advantage of a self-reward mechanism that helps it navigate/explore its environment effectively, without needing to reach the end state to know the result and only then alter itself? This value based system needn’t actually have a strong correlation with the outcome but act as a guide on when to release their own reward.

For eg. in chess, there might be a computation to gauge how strong the current position of an agent is. This measure of how strong the current position is, could’ve been one of the many things captured by our value-based model and help the agent reward itself or punish itself (instead of it being provided by our system).

submitted by /u/Specialist_Ad8835
[link] [comments]

Like 0

Liked Liked