Abstract:
The social alignment of the human mind is omnipresent in our everyday life and culture. Yet, what mechanisms of the brain allow humans to be social, and how do they work and interact? Despite the apparent importance of this question, the nexus of cognitive processes underlying social intelligence is still largely unknown. A system of mirror neurons has been under deep, interdisciplinary consideration over recent years, and farreaching contributions to social cognition have been suggested, including understanding others' actions, intentions, and emotions. Theories of embodied cognition emphasize that our minds develop by processing and inferring structures given the encountered bodily experiences. It has been suggested that also action understanding is possible by simulating others' actions by means of the own embodied representations. Nonetheless, it remains largely unknown how the brain manages to map visually perceived biological motion of others onto principally embodied states like intentions and motor representations, and which processes foster suitable simulations thereof. Seeing that our minds are generative and predictive in nature, and that cognition is elementally anticipatory, also principles of predictive coding have been suggested to be involved in action understanding. This thesis puts forward a unifying hypothesis of embodied simulation, predictive coding, and perceptual inferences, and supports it with a neural network model. The model (i) learns encodings of embodied, self-centered visual and proprioceptive, modal and submodal perceptions as well as kinematic intentions in separate modules, (ii) learns temporal, recurrent predictions inside and across these modules to foster distributed and consistent simulations of unobservable embodied states, (iii) and applies top-down expectations to drive perceptual inferences and imagery processes that establish the correspondence between action observations and the unfolding, simulated self-representations. All components of the network are evaluated separately and in complete scenarios on motion capture data of human subjects. In the results, I show that the model becomes capable of simulating and reenacting observed actions based on its embodied experience, leading to action understanding in terms of motor preparations and inference of kinematic intentions. Furthermore, I show that perceptual inferences by means of perspective-taking and feature binding can establish the correspondence between self and other and might thus be deeply anchored in action understanding and other abilities attributed to the mirror neuron system. In conclusion, the model shows that it is indeed possible to develop embodied, neurocomputational models of the alleged principles of social cognition, providing support for the above hypotheses and opportunities for further investigations.