Reprinted with permission of copyright holder .
Visual Testing-Visual Literacy's Second Dimension.
Szabo, M., DeMelo, H. T., & Dwyer, F. M. Visual testing - Visual literacy's second dimension. Educational Communication and Technology Journal, 1981, 29, 177-187.
One dimension of Visual Literacy is concerned with the use of visual materials for the improvement of student learning. Might not the use of visual testing employing similar types of visual materials provide a more valid assessment of the level of information acquisition achieved by students receiving visualized instruction?
The use of the visual medium for instructional purposes has become an instructional strategy employed world-wide within the teaching-learning process. However, one of the major criticisms of this phenomena is that most evaluation strategies currently in use to evaluate mediated instruction are of the pencil and paper type and are highly verbal, rather than visual in nature. There is an urgent need for systematic examination of the relationship which exists among different types of visual teaching-testing formats and the level of student achievement of different types of educational objectives.
A considerable amount of research has been conducted in an attempt to determine whether, in fact, there is a difference in achievement scores between students who receive visualized instruction and are evaluated verbally and those who are evaluated visually (Thurstone, 1941; Thalen, 1945; Brown, 1947, 1949; Ebel, 1951; Lefkowith, 1955a; Dwyer, 1978). Researchers investigating the visual testing phenomenon have explored a number of different dimensions. Studies conducted on the reliability, validity (Carpenter, 1954; Lefkowith, 1955b; Torrence, 1976), and administration (Pessinger, 1969; Hill, 1976) of visual tests have established that visual tests are indeed reliable, valid, and capable of being administered with a reasonable amount of success.
Strategies involving a number of different formats have been employed in investigating the parameters of visual testing: film (Carpenter, 1954), realistic photographs and line drawings (Lefkowith, 1955a; Dwyer, 1978), television (Hopkins, Lefever & Hopkins, 1967; Stallings, 1972), slides (Lumsdaine & Gladstone, 1958; Stoker et al., 1968; Tanner & Dwyer, 1977; Dwyer & Tanner, 1978). Most of the findings from these studies indicate that no significant differences occur in student performance as a result of their being evaluated visually rather than in the conventional verbal format. These general findings along with the time and expense involved in constructing visual tests have succeeded in preventing, if only temporarily, the impact that visual testing may eventually make on educational evaluation as we know it today.
A THEORETICAL FOUNDATION FOR VISUAL TESTING
One of the perennial problems associated with the teaching-learning process is the determination of how learners acquire, store and recall information. As a result, a number of information acquisition strategies have been proposed which attempt to explain how learners acquire and retrieve information. For example, Tversky (1969, 1973) has found that verbal and visual information are encoded differently depending on the learner's perceived use of the information. Glanzer and Clark (1963b) have advanced the notion of a single information processing system (verbal-loop hypothesis) which contends that visual information is translated into and stored in verbal/symbolic form. When this information is to be retrieved, it is retranslated from the verbal symbolic form back to the original visualization. A number of specific research studies have been conducted which can be interpreted to be supportive of this orientation (Glanzer & Clark, 1963a,b, 1964; Lantz & Stefflre, 1964; Smith & Larson, 1970).
Paivio, Rogers and Smythe (1968) have suggested the possible existence of dual encoding and retrieval systems each functioning as a separate entity with the capability of working in unison with each other. Basically, this orientation (Paivio et al., 1968; Paivio, 1971) proposes a model involving two independent memory systems: one having the capability of processing verbal symbols, the other having the capability of processing visual information. Although the dual encoding and retrieval systems are perceived as functioning as separate entities, they also possess the capability of functioning in unison with each other. Depending on the nature (form) of the information to be retrieved, action with the specific memory system would be initiated. Similarly, a number of research studies have been conducted which may be interpreted as being supportive of the dual encoding and retrieval systems (Bahrick & Boucher, 1968; Paivio & Csapo, 1969; Bahrick & Bahrick, 1971; Cermak, 1971; Ternes & Yuille, 1972; Levie & Levie, 1975).
Keele (1973, p. 17) in commenting on the literature relating to the area of information acquisition states:
The evidence reviewed indicates that once information is perceived a great deal of flexibility exists for the form of recoding. Verbal materials are often coded in an articulatory form, even when presentation is visual. Spatial material may be visually coded, even when presentation is auditory. Although the new code may be a different mode than the presentation mode, evidence is accumulating that a transformation in mode is not necessary.
The justification for using visual testing in situations where visualization is used to complement oral/print instruction appears to have its generic roots in the sign similarity hypothesis and the cue summation principle of learning.
In general, the essence of the cue summation principle of learning is that (Severin, 1967b, p. 237): " . . . learning is increased as the number of available cues or stimuli is increased." The strategy of attempting to use visualization both in the presentation and evaluation phases of instruction is an attempt to implement the stimulus generalization phenomena which contends that the amount of information that will be acquired by students increases as the testing situation becomes more similar to the situation in which the students received their instruction (Hartman, 1961; Severin, 1967a).
The purpose of this study was to investigate the role of visuals in the instructional and evaluation phases of classroom learning. Investigated were: (a) the effect of verbal instruction alone vs. verbal instruction complemented by means of simple line drawings; (b) the validity of using visual tests in measuring the level of information acquisition of different educational objectives achieved by learners from visualized instruction; and (c) the interaction between mode of instruction (visual and non-visual) and mode of testing (visual and non-visual).
The content material used in this study was a 2,000-word instructional unit describing the human heart, its parts, and the internal processes which occur during the systolic and diastolic phases (Dwyer, 1978). This content was selected because it permitted the evaluation of several types of learning objectives. The heart script was analyzed to identify the portions that presented critical information. Thirty-seven such areas were identified. Illustrations were designed specifically to portray each item of information.
CRITERION MEASURES
Each student in each treatment participated in one of the instructional presentations and then twenty-four hours later received three individual criterion tests. Scores received on three of the individual criterion tests were combined into a 60-item total criterion test. Students were permitted to take as much time as they needed to complete the instructional unit and the criterion tests.
The test items contained in the verbal version of the criterion measures consisted of sixty multiple-choice questions. The visual version of this test was constructed so that for each of the sixty verbal items there was a matching visual item. For each of the verbal distractors on each of the sixty multiple-choice items an "equivalent" visual distractor was constructed. The stem of both the verbal and visual test questions were verbal and asked the same question; however, the stem of the visual test items was modified slightly to make them appropriate to the visual distractors. Figure 1 presents a sample of the verbal and visual formats for questions #42 on the comprehension test.
Figure 1. Item 42 from the verbal test A (top) and visual test B (bottom).
A. VERBAL TEST ITEM
42. When blood is being forced out the right ventricle, in what position is the tricuspid valve?
A. partially opened
B. partially closed
C. open
D. closed
B. VISUAL TEST ITEM

42. The position of the tricuspid valve when blood is forced out of the right ventricle
TREATMENT GROUP
The sample for this study consisted of 96 high school biology students from an urban school in central Pennsylvania. Of the 57 females and 39 males, 77 were 10th graders, 15 were 11th graders, and 4 were 12th graders. The students were volunteers who had not previously studied biology, nor had they ever studied the physiology and functions of the human heart. The 96 students were randomly assigned to the following treatment conditions:
Treatment 1: Text with visuals. Students (N~48) in this treatment received the 2,000 word instructional unit in a self-paced booklet format which was complemented by means of 37 visuals (line drawings).
Treatment 2: Text minus visuals. Students (N=48) receiving this treatment received the same instruction as did students in Treatment 1; however, the verbal instruction in their self-instructional booklet was not complemented by the 37 visuals.
In the initial analysis the effect of visualization (verbal plus visualization vs. verbal alone) contained in the instructional sequence was the independent variable. The dependent variables were test scores on the identification, terminology, comprehension and total test. Subsequently, an attempt was made to investigate the validity of employing visual tests to access information presented by means of visualized instruction. Students in Treatment 1 (text with visuals) were randomly assigned to two sub-treatment groups each containing 24 students. One sub-group received the verbal criterion measures, the other the visual criterion measures. Students in treatment 2 (text minus visuals) were also randomly assigned to the same two sub-treatment groups as Treatment 1.
The first independent variable (verbal instruction plus visualization vs. verbal instruction alone) had two levels with and without visuals. The second analysis evaluating the testing mode also had two levels with and without visuals. These variables were crossed in a 2x2 randomized factorial posttest-only design. Two-way ANOVA statistical designs were conducted on each criterion measured. Where significant F-ratios (.05) were found to exist, differences between pairs of means were analyzed via Tukey's W-Procedure. Figure 2 shows this design in its expanded form. Figure 3 shows the simplified design; each of the four cells consisted of 24 students for a total of 96 subjects.
Figure 2. Experimental Design with Criteria
| Testing | ||
| Non-Visual | Visual | |
| Instruction | S1 S2 S3 S4 | S1 S2 S3 S4 |
| Non-Visual | 12 females 12 males |
7 females 17 males |
| Visual | 10 males 14 females |
10 males 14 females |
| S1=identification | S3=comprehension | |
| S2=terminology | S4=total crit. test |
Figure 3. Simplified Experimental Design Illustrating the Means by Cells, Rows, and Columns
| Instructional Conditions |
Testing Conditions | ||
| Non-visual | Visual | Rows | |
| Non-visual | M1 | M2 | M12 |
| Visual | M3 | M4 | M34 |
| Columns | M13 | M24 |
On the total criterion test and on the three individual criterion tests significant differences were found to exist among the means of the treatment groups: Total criterion test, F (1,92)=12.5,p<.05; Identification, , F (1,92)=11.1,p<.05; Terminology F (1,92)=10.6,p<.05; and Comprehension, , F (1,92)=4.2,p<.05.These results strongly support the role of visuals in the encoding stage of human learning.
An investigation of the validity of the visual instruction-visual testing hypothesis--that students receiving appropriately designed visuals in the instruction and testing modes would score higher was supported for the Identification test. A significant interaction between instruction and testing, , F (1,92)=6.1,p<.05; led to a follow-up analysis (Tukey) which revealed the superior performance of the visual instruction-visual testing group over each of the others (Figure 4). Nonsignificant interactions were obtained for the terminology, comprehension and total criterion measure.
Following are the K-R 20 reliability coefficients obtained for the non-visual and visual test formats on each criterion measure: identification .74, .79; terminology .62, .66; comprehension .54, .42; total criterion .82, .85. A complete description of each criterion test can be found in the Spring, 1976 issue of AVCR, pp. 52-53. Table 1.presents the means and standard deviations achieved by students receiving the non-visual and visual test formats for each criterion measure.

Figure 4. Interaction between mode of instruction and mode of testing on the identification test.
Table 1. Means and Standard Deviations Obtained from the Non-Visual and Visual Test Formats
| A. Non-Visual Test Format | ||||
| Variables | Identification | Terminology | Comprehension | Total Criterion Test |
| No. of Items | 20 | 20 | 20 | 60 |
| Mean | 6.9 | 6.8 | 7.0 | 20.7 |
| S.D. | 3.7 | 3.2 | 3.0 | 8.13 |
| B. Visual Test Format | ||||
| Variables | Identification | Terminology | Comprehension | Total Criterion Test |
| No. of Items | 20 | 20 | 20 | 60 |
| Mean | 8.3 | 7.5 | 6.7 | 22.6 |
| S.D. | 4.2 | 3.3 | 2.7 | 8.8 |
Verbal instruction with complementing visuals was found to have a significant positive effect in improving student information acquisition on all objectives measured in this study. For the type of students employed in this study, these results strongly support the role of visuals in the encoding stage of human learning.
The hypothesis that students receiving visuals in instruction and testing would score higher was supported for objectives measured by the Identification test. This hypothesis, however, was not supported for those objectives measured by terminology, comprehension and total criterion tests. Most of the research data available on visual testing indicates that there are no statistically significant differences in achievement scores between students who received visualized instruction and are evaluated verbally and those who are evaluated visually. These null results may be a function of how the visual materials are used in the teaching-learning process. If visualization is used to complement instruction, i.e., present redundant information visually, then visualization is making sure that the verbal instruction is conveying the message optimally. Under these circumstances the use of visualization may simply be providing an alternative iconic base from which students can comprehend complex content material. This line of reasoning seems to coincide with the verbal loop hypothesis (Glanzer & Clark, 1963a,b, 1964) which contends that a stimulus (object or illustration) viewed by the learner is translated into a series of words which are held in memory until they are needed by the learner in making a covert or overt response.
The results of this preliminary research indicates that visual testing is a feasible, reliable, and valid way of measuring student achievement of the type measured by the Identification test. The implication being that even though visualization is extremely effective in assisting in the conveyance of that content material, visual testing is not necessarily the most valid format for accessing retained information for all types of educational objectives. Visual testing needs to be implemented in those situations in which typical paper-pencil tests are found to be invalid for assessing optimum student performance levels of specific types of educational tasks. Additionally, the validity and effectiveness of visual testing might depend intimately on the (a) degree of realism contained in the visualization used to complement the instruction, (b) the method of presentation employed to present the content material to students (externally vs. self-paced instruction), (c) individual difference variables, i.e., amount of prior knowledge of the content area, reading and/or oral comprehension level, intelligence, etc., (d) level of educational objectives being assessed, and (e) the type and number of cueing techniques employed in the instructional environment (Dwyer, 1978).
SUMMARY AND IMPLICATIONS
The purpose of this study was to assess the theoretical justification for visual testing by examining the sign similarity hypothesis and the cue summation principle of learning--that the amount of information that will be acquired by students will increase as the testing situation becomes more similar to the situation in which students received their instruction. The results of this study reinforce the instructional strategy of implementing visualization both in the presentation and evaluation phases of instruction as a viable instructional variable. Although significant differences occurred only on the Identification test, it is important to remember that only one type of "visual testing" was employed on all criterion test. It may be that different visual testing formats are necessary if valid assessment of students' level of achievement of different educational objectives are to be realized. Additionally, a number of limitations inherent in the present study may account for the results, i.e., (a) students of the type employed in this study may not have developed skills in learning from visualized instruction; therefore, their potential for profiting from visual testing was minimal, and (b) visual distractors in the visual test format were designed to be congruent with the verbal distractors of the verbal items--this imposed a severe limitation on the investigators to design visual distractors which could have more validly assessed the level of students' information retention.
An implication can be made for further study of individual differences in information processing ability and their relationship with acquisition of understanding. It is interesting to speculate whether differential results would be obtained with college level students or with learners of different aptitudes. It can be hypothesized that students with reduced ability to recreate instructional environments will perform better under visual content conditions. Interactions of visual placement with inductive reasoning ability have been noted by Koran and Koran (1976).
Holliday, et al., (1977) have argued from a zero-sum standpoint that displaying of visuals in text may result in more attention paid to one cue and less to another, resulting in reduced effectiveness of visuals. These notions bear further investigation. If memory for visuals is greater than memory for text, the duration of that memory differential should be quantified with delayed retention tests.
Recent research suggests that training students to create mental images during instruction results in superior achievement (Canelos, 1979). A legitimate question to be investigated is the relative effectiveness of induced versus visuals provided in instruction as a function of individual learner differences. A corollary question addresses the effectiveness of asking students to generate images for which there is no personal knowledge or experience (i.e., episodic memory).
The results of this experiment have suggested a new direction for visuals in instruction and additional ideas for further reasoned study.
Bahrick, H.P., & Bahrick, P. Independence of verbal and visual codes of the same stimuli. Journal of Experimental Psychology, 1971, 91, 344-346.
Bahrick, H.P., & Boucher, B. Retention of visual and verbal codes of the same stimuli. Journal of Experimental Psychology, 1968, 78, 417-442.
Brown, J . W. A comparison of verbal and projected verbal-pictorial tests as measures of the ability to apply science principles. Unpublished doctoral dissertation, The University of Chicago, 1947.
Brown, J . W. Visualized testing. Educational Screening, 1949, 28, 116-117.
Canelos, J. The instructional effectiveness of differentiated imagery learning strategies on different levels of information processing when learners received visualized instruction consisting of varying stimulus complexity. Unpublished doctoral dissertation, The Pennsylvania State University, 1979.
Carpenter, C.R. Evaluation of the film: military police support in emergencies (riot control) TFl9-1701. Technical Report. SDC-269-7-52. Port Washington, N.Y.: Special Devices Center, Office of Naval Research, 1954.
Cermak, G . W. Short-term recognition memory for complex free-form figures. Psychonomic Science, 1971, 25, 209-211.
Dwyer, F.M. Strategies for improving visual learning. State College, PA.: Learning Services, Box 784, 1978.
Dwyer, F.M., & Tanner, G. Visual testing: a viable instructional variable. British Journal of Educational Technology, 1978, 1, 34-37.
Ebel, R.L. Writing the test item. In E. I. Lindquist (Ed.), Educational measures. Washington, D.C.: American Council on Education, 1951.
Glanzer, N., & Clark, W. H. Accuracy of perceptual recall: an analysis of organization. Journal of Verbal Learning and Verbal Behavior, 1963(a), 1, 289-299.
Glanzer, N., & Clark, W. H. The verbal loop hypothesis: binary numbers. Journal of Verbal Learning and Verbal Behavior, 1963(b), 2, 301-309.
Glanzer, N., & Clark, W. H. The verbal loop hypothesis: conventional figures. American Journal of Psychology, 1964, 77, 621-626.
Hartman, F. R. Recognition learning under multiple channel presentation and testing conditions. Audio Visual Communications Review, 1961, 9, 24-43.
Hill, R.T. The development and implementation of a model for administering a visual test of achievement over broadcast television. Unpublished master's thesis, The Pennsylvania State University, 1976.
Holliday, W.G., et al. Differential cognitive and affective responses to flow diagrams in science. Journal of Research in Science Teaching, 1977, 14, 129-134.
Hopkins, K.D., Lefever, D.W., & Hopkins, B.R. TV vs. teacher administration of standardized tests: comparability of scores. Journal of Educational Measurement, 1967, 4, 35-40.
Keele, S. W. Attention and human performance. Pacific Palisades, Goodyear Publishing Company, 1973.
Koran, ML., & Koran, J . J. Interaction of learner aptitudes with question pacing in learning from prose. Journal of Educational Psychology, 1975,67, 76-82.
Lantz, D., & Stefflre, V. Language and cognition revisited. Journal of Abnormal and Social Psychology, 1964, 69, 472-481.
Lefkowith, E.F. The effect of pictorial stimuli similarity in teaching and testing. Unpublished doctoral dissertation, The Pennsylvania State University, 1955.
Lefkowith, E.G. The validity of pictorial tests and their interaction with audio-visual teaching methods. Technical Report. SDC-169-7-49. Port Washington, N.Y.: Special Devices Center, Office of Naval Research, 1955(b).
Levie, W.H., & Levie, D. Pictorial memory processes. Audio Visual Communications Review, 1975, 23, 81-97.
Lumsdaine, A. A., & Gladstone, A. Overt practice and audiovisual embellishments. In M. A. May and A.A. Lumsdaine (Eds.). Learning from films. New Haven: Yale University Press, 1958.
Paivio, A., & Csapo, K Concrete-image and verbal memory codes. Journal of Experimental Psychology, 1969, 80, 279-285.
Paivio, A., Rogers, T. B., & Smythe, P.C. Why are pictures easier to recall than words? Psychonomic Science, 1968, 11, 137-138.
Paivio, A. Imagery and verbal processes. New York: Holt, Rinehart and Winston, 1971.
Pessinger, G. Test administration by video tape. Educational Television, 1969, 1, 19-20.
Severin, W.J. Cue summation in multiple channel communications. Unpublished doctoral dissertation, University of Wisconsin, 1967(a).
Severin, W.J. Another look at cue summation. Audio Visual Communications Review, 1967(b), 15, 233-245.
Smith, E.E., & Larson, D.E. The verbal loop hypothesis and the effects of similarity on recognition and communication in adults and children. Journal of Verbal Learning and Verbal Behavior, 1970, 9, 237-242.
Stallings, W.M. A comparison of television and audio presentations of the MLA French listening examination. Journal of Educational Research, 1972, 65, 472-474.
Stoker, H.W., Kropp, R.P., & Bashaw, W.L. A comparison of scores obtained through normal and visual administrations of the occupational interest inventory, 1968. (ED 015837)
Tanner, J., & Dwyer, F. Students' perception of visual testing. Perceptual and Motor Skills, 1977, 45, 744-746.
Ternes, W., & Yuille, J. C. Words and pictures in an STM task. Journal of Experimental Psychology, 1972, 96, 78-86.
Thalen, R.A. Testing by means of film slides with synchronized recorded sound. Educational and Psychological Measurement, 1945, 5, 33-48.
Thurstone, L.L. A micro-film projector method for psychological tests. Psychometrika, 1941, 6, 235-248.
Torrence, D.R. The television test of science processes. Unpublished doctoral dissertation, The Pennsylvania State University, 1976.
Tversky, B. Pictorial and verbal encoding in a short-term memory task. Perception and Psychophysics, 1969, 6, 225-233.
Tversky, B. Encoding processes in recognition and recall. Cognitive Psychology, 1973, 5, 275-287.