Gestalt Perception of Biological motion with a Generative Artificial Neural Network Model

DSpace Repository


Dokumentart: Dissertation
Date: 2022-05-09
Language: English
Faculty: 7 Mathematisch-Naturwissenschaftliche Fakultät
Department: Informatik
Advisor: Butz, Martin (Prof. Dr.)
Day of Oral Examination: 2022-04-01
DDC Classifikation: 004 - Data processing and computer science
Other Keywords:
Generative Recurrent Neural Network
Feature Binding
Perspective Taking
Gestalt perception
Behavior Inference
Spatial Encoding
Social Cognition
License: Publishing license including print on demand
Order a printed copy: Print-on-Demand
Show full item record


In cognitive modelling understanding of biological motion by inference of own sensorimotor skills is extremely valued and is known as a fundamental element of social intelligence. It has been suggested that a proper Gestalt perception depends on suitably binding visual features, decently adapting the matching perspective, and mapping the bound features onto the correct Gestalt templates. This thesis introduces a generative artificial neural network model, which implements such Gestalt perception mechanisms proposing an algorithmic explanation. The architectural design of the model is an extension, modification and further investigation of previous work by Fabian Schrodt \cite{Schrodt:2018} which relies on the principle of active inference and predictive coding, coupled with suitable inductive learning and processing biases. At first we train the model to learn sufficiently accurate generative models of dynamic biological, or other harmonic, motion patterns. Afterwards we scramble the input and vary the perspective onto it. To be able to properly route the input and adapt the internal perspective onto a known frame of reference, the suggested modularized architecture propagates the prediction error back onto a binding matrix which consists of hidden neural states that determine feature binding, and further back onto perspective taking neurons, which rotate and translate the input features. The resulting process ensures that various types of biological motion are inferred upon observation, resolving the challenges of (I) feature binding into Gestalten, (II) perspective taking, and (III) behavior interpretation. Ablation studies underline that, 1.~the separation of spatial input encodings into relative positional, directional, and motion magnitude pathways boost the quality of Gestalt perception, 2.~population encodings implicitly enable the parallel testing of alternative interpretation hypotheses and therefore further improve accurate inference, 3.~a temporal predictive processing module of the autoencoder-based compressed stimuli enables the retrospective inference of the unfolding behavior. I believe that similar components should be employed in other architectures where temporal bindings of information sources are beneficial. Moreover, given that binding, perspective taking, and intention interpretation are universal problems in cognitive science, our introduced mechanisms may be very useful for addressing similar challenges in other domains beyond biological motion patterns.

This item appears in the following Collection(s)