Abstract:
The present research is concerned with learning from text and pictures, that is, multimedia learning. With regard to multimedia learning there have been several recommendations on how to present text and pictures to learners to enhance learning outcomes. The most prominent theory concerning multimedia learning is the Cognitive Theory of Multimedia Learning (CTML; Mayer, 2009). This theory is based on an older version of Baddeley’s working memory model (e.g., 1992).
The assumption that working memory plays a crucial role in learning from multimedia is plausible not only from a theoretical point of view but has also been confirmed empirically (e.g., Gyselinck, Cornoldi, Dubois, De Beni, & Ehrlich, 2002). However, one problem appears within the framework of CTML with regard to the interpretation of processing written text: According to the CTML, written text is presented initially as image in the visuo-spatial sketchpad (VSSP), resulting in an overload situation when written text is presented together with pictures. With spoken text, however, no overload in the VSSP occurs because spoken text is not processed in the VSSP at all. Based on these theoretical assumptions, Mayer (2009) explains the modality effect, that is, the empirical finding that pictures presented together with spoken text lead to better performance than pictures presented together with written text. However, according to the working memory model written text is processed in the phonological loop from the beginning on, that is, no overload should appear when presenting pictures together with written text (cf. Rummer, Schweppe, Scheiter, & Gerjets, 2008).
In the present study, for the first time an alternative explanation for the modality effect was generated based on specifications of the structure of the VSSP. According to these specifications, the VSSP can be divided into a spatial and a visual component, where spatial and visual picture as well as text information is processed, respectively. Furthermore, the control of eye movements is located in the spatial VSSP. In the present research, these structural assumptions were incorporated into the CTML, resulting in an extended CTML, the ECTML. Based on the ECTML a modality effect is expected only when text containing spatial information is presented, because the processing of the spatial text contents, of the spatial picture information as well as the control of eye movements, associated with reading, interfere with each other in the spatial VSSP, resulting in an overload situation. There is some empirical evidence which supports the assumption of a modality effect only with text containing spatial information (e.g., Eddy & Glass; 1981; Kürschner & Schnotz, 2007).
Next to the prediction of a modality effect only with spatial text contents, the ECTML also predicts that text contents in general can influence the effectiveness of text-picture presentations. Accordingly, spatial text contents should interfere with picture information in the spatial VSSP, whereas no interference is expected when non-spatial text contents are presented.
In the present study, three experiments were conducted to test these assumptions. The results did not support the hypothesized moderation of the modality effect by text contents. However, there was also no general influence of text modality on learning, as predicted by the CTML. The prediction of worse performance when text containing spatial information is presented together with pictures was confirmed: Although the texts were equal in difficulty, learners with spatial text contents recalled the pictures and the text contents worse than learners with visual text contents. With regard to the involvement of working memory in multimedia learning, there were only low correlations between the capacity of the phonological loop, the spatial VSSP and the visual VSSP and recall performance. However, a dual task confirmed our assumption that working memory is important when learning with text and pictures.
To conclude, research on the cognitive foundations of multimedia learning builds the prerequisite to theoretically describe the processes associated with learning from text and pictures. Thus, it seems necessary to focus on the empirical validations of the assumed cognitive processes. The present study presents a first step in that direction, however, future work is needed to optimize the empirical approach.