The second log entry of the Mad Machine Learning Scientist of Cybercom – Embodiment and Agency

2017-06-29, 14:36 Posted by: Tero Keski-Valkama

This is the second log entry which handles embodiment and agency, to continue on top of the first log entry which was about the substrate of consciousness.

Embodiment means that an intelligent agent has a physical or a virtual body which it can use to explore the world, learn, and project its will onto the world.

A cybernetic control loop of sense-think-act is often used to model such systems.

Sense Think Act cybernetic control loop

Having a body of some kind is a requirement for reinforcement learning. A reinforcement learning system generally has an embodiment, a policy and the reward system. In reinforcement learning, the policy choices made by the agent affect the rewards it receives. The expected rewards are modelled, and are used as a basis for an optimizer that tunes the policy. Policy is used to decide actions in situations where the agent finds itself in.

Both in machine learning and in the human brain the most important component of the reward system is the propagation of expected rewards backwards in time. Thus, the expected reward affects activities before the reward, through the points where decisions are made which lead to the reward. This is how these systems are able to learn policies about decisions as if the future rewards were immediate at the points of decisions.

The operation of such a reward system is far from trivial. In practice, it needs to model the world in its internal representation incorporating at least the details about expected rewards, all the interacting agents and branching sequences for different physically plausible eventualities. Such models are hierarchical and complex. The model needs to remember past situations, be capable of varying them, and plan. This kind of predictive model works better if it has a model of self, as having an expectation of itself acting in a specific fashion in the future across the expected reward paths increases the viability of the policies.

The human brain is a reinforcement learning machine. How does the human brain handle this?

This is a vastly simplified description, but in the human brain, the reward system tunes the dopamine levels in the brain matching with the expected future reward of the situation. If the future reward happens as expected, the dopamine level stays roughly the same, or has a small increase. If the expected reward fails to happen, the dopamine levels drop sharply down, and the reward system tunes its models to better predict future rewards.

High dopamine levels generally tune the neural connectivity of the brain so that such states that were associated to the events become more probable, increasing the confidence of the network in the choices it has made.


So, we got an embodied system which optimizes intelligently its actions in the environment. Where does the embodiment end and the environment begin?

Of course there is no sharp distinction between the system and its environment. This is why the reward system models decisions with a flexible model of self. The only important thing is the level of control the brain can exert on the specific parameters in the world. So a human can feel a bicycle to be an extension of her body, as the brain observes having a direct control over the bicycle. The limits of the perceived embodiment come from the points where the control becomes so indirect and complex that the brain cannot exert mechanistic goal-oriented control.

So, as an embodied agent is an inseparable part of its environment, modelling the environment leads to a requirement of modelling self. Self is basically the limited part of the universe which the brain believes it can directly control.

Paradoxically, the illusion of a separate self can only come about in a system that is inseparably integrated to its environment.

A satirical photo of Tero on a robotic horse by Tanja Lankia.

comments powered by Disqus