Everyone has in mind the image of an animal conditioned by a scientist, adjusting its behavior based on a reward. That is essentially what RPE (Reward Prediction Error) is: an assisted dopaminergic reprogramming. And it is exactly on this principle that RLHF—OpenAI’s dearest achievement and the current paradigm of AI is based. The problem is, it is always claimed that this process requires real, objective feedback.
The intuition of the contrary came to me while running: as I was warming up—and since I have a non-standard way of doing so—I was adjusting my movements with a slight transition just to not look crazy. As it turns out, no one was watching me. Yet, I was constantly adjusting, as if a gaze existed.
In its classical formulation, RPE appears after an event: a behavior produces a result, and the gap between expected and obtained reward serves to correct the next action. But in this specific case, no feedback existed. The adjustment was triggered by the anticipation of a possible gap between two internal representations: that of my current behavior and that of what would be perceived as normal behavior.
In other words, the system was not comparing a prediction to reality, but rather two internal states. And if we push this reasoning a bit further, the boundary between internal and external feedback has never truly existed. But as we merge with our own reference frames (and yet even when reading the same thing that will form that reference, comprehension is never universal: we understand and reject what suits us), it becomes easier to tell ourselves that others are preventing us from being free rather than noticing that what imprisons us is ourselves.
R-RPE is the mathematical translation of this intuition: a framework where prediction errors can emerge from internal gaps between representations, independent of any real feedback. A mechanism that opens the possibility for forms of self-learning in artificial intelligence, but which also describes how humans—and machines—can end up trapped within their own narratives.
For a long time, I envied what I saw on TV—the way everyone seemed so alive—and this impression grew without me even realizing it as I lost my capacity to feel, as I disconnected from my body. I was always taught to persevere, to put in more effort than others.
When in reality, for a goal to manifest, it must simply first become a somatic reality. If the body does not already simulate the success of the intention through a complete alignment—what some call grounding—one ends up moving forward while constantly fighting against oneself.
This work was born out of a personal disgust: that of human self-justification. This tendency people have to always convince themselves they are in the right, to blame their failures on fate, and to perceive the world in a binary way—scavenging for 'villains' to justify their own mediocrity.