Journal of Autism and Developmental Disorders. Vol. 29. No. 1. 1999
The TOM Test: A New Instrument For Assessing Theory of Mind in Normal Children and Children with Pervasive Developmental Disorders Peter Muris,1,4 Pim Steerneman,2 Cor Meesters,3 Harald Merckelbach,1 Robert Horselenberg,' Tanja van den Hogen,3 and Lieke van Dongen3
This article describes a fust attempt to investigate the reliability and validity of the TOM test, a new instrument for assessing theory of mind ability in normal children and children with pervasive developmental disorders (PDDs). In Study 1, TOM test scores of normal children (n = 70) correlated positively with their performance on other theory of mind tasks. Furthermore, young children only succeeded on TOM items that tap the basic domains of theory of mind (e.g., emotion recognition), whereas older children also ed items that measure the more mature areas of theory of mind (e.g., understanding of humor, understanding of second-order beliefs). Taken together, the findings of Study 1 suggest that the TOM test is a valid measure. Study 2 showed for a separate sample of normal children (n = 12) that the TOM test possesses sufficient test-retest stability. Study 3 demonstrated for a sample of children with PDDs (n = 10) that the interrater reliability of the TOM test is good. Study 4 found that children with PDDs (n = 20) had significantly lower TOM test scores than children with other psychiatric disorders (e.g., children with Attention-deficit Hyperactivity Disorder; n = 32), a finding that underlines the discriminant validity of the TOM test. Furthermore, Study 4 showed that intelligence as indexed by the Wechsler Intelligence Scale for Children was positively associated with TOM test scores. Finally, in all studies, the TOM test was found to be reliable in of internal consistency. Altogether, results indicate that the TOM test is a reliable and valid instrument that can be employed to measure various aspects of theory of mind. KEY WORDS: Theory of mind; pervasive developmental disorders; reliability.
interest. Research in this area is described under the general heading "theory of mind." Premack and Woodruff (1978) were the first to use the term to refer to the child's ability to ascribe thoughts, feelings, ideas, and intentions to others and to employ this ability to anticipate the behavior of others. According to Wellman (1990), theory of mind is a prerequisite for the understanding of the social environment and for engaging in socially competent behavior (see also Astington & Jenkins, 1995). It has been proposed that autistic children are socially impaired precisely because they lack a theory of mind (Frith, 1989). In a series of studies, Baron-Cohen,
INTRODUCTION Recently, children's understanding of their own and others' mental states has been the focus of considerable 1Department
of Psychology, University of Limburg, P.O. Box 616. 6200 MD Maastricht, The Netherlands. 2 South-Limburg Centre of Autism, c/o RIAGG-OZL, P.O. Box 165. 6400 AD Heerlen, The Netherlands. 3 Department of Experimental Abnormal Psychology, University of Limburg. P.O. Box 616. 6200 MD Maastricht, The Netherlands. 4 Address all correspondence to Peter Muris. Department of Psychology, University of Limburg, P.O. Box 616. 6200 MD Maastricht, The Netherlands.
67 0162-3257/99/ 0200-0067$16.00/0 C 1999 Plenum Publishing Corporation
68
Leslie, and Frith (1985, 1986) demonstrated that the ability of autistic children to attribute mental states to others is seriously impaired. These researchers found that about 80% of the autistic children were unable to correctly predict the ideas of others, whereas most mentally retarded and normal controls of lower mental age were able to do so. Specific programs have been developed to train theory of mind skills in autistic children. For example, in a study by Ozonoff and Miller (1995), five autistic children received a training program in which they were not only taught specific interactional and conversational skills but also received explicit and systematic instruction regarding the underlying social-cognitive principles necessary to infer the mental states of others (i.e., theory of mind). Pre- and posttreatment assessment demonstrated that the trained children improved on a number of false belief tasks compared to control children who had received no treatment. Similar positive results were obtained by Swettenham (1996), Hadwin, Baron-Cohen, Howlin, and Hill (1996), Bowler, Strom, and Urquhart (1993), and Whiten, Irving, and Macintyre (1993). All these studies were successful in that autistic children who had received training were able to theory of mind tasks. Furthermore, in a recent study of Steerneman, Jackson, Pelzer, and Muris (1996), socially immature (but not autistic) children were given a social skills intervention program that incorporated theory of mind principles. Results showed that this type of training produced positive effects on theory of mind tests. Yet, it should be added that the treatment effects found in these studies do not always generalize to nonexperimental settings or to tasks in domains where children received no teaching (see, for a discussion of this issue, Slaugther & Gopnik, 1996). Given the availability of reasonably successful treatment programs, theory of mind assessment instruments are important for two reasons. First, such instruments can be used to identify those children who display deficits in theory of mind. Second, such instruments can be employed to evaluate the efficacy of theory of mind training programs. The assessment of theory of mind in children has been predominantly confined to so-called "false belief tasks. Such tasks intend to test children's comprehension of another person's wrong belief. An example is the socalled Smarties test (e.g., Hogrefe, Wimrner, & Pemer, 1986). During this test, children are presented with a Smarties box and asked what it contains. Children are highly familiar with these boxes and know that they usually contain Smarties, a desirable chocolate candy. When
Muris et al.
children give an answer in this sense, they are shown that the box actually contains a pencil. Next, children are told that another child will be asked what is in the box. They are then asked the crucial question: "What do you think the other child will say?" From their answer on this question, one can infer whether children are able to make a judgment about another person's false expectation. That is, an understanding of another individual's false belief—and presence of theory of mind— is demonstrated if children predict that another person will think that there are Smarties in the box. Conceptual difficulty with false belief attribution—and absence of theory of mind—is revealed if children assume that another person will think that there is a pencil in the box. Several authors have argued that theory of mind is more than just the comprehension of false belief. For example, Perner and Wimmer (1985) have described two other types of belief that play a crucial role in children's understanding of social interactions: first-order beliefs that refer to what children think about real events (e.g., "Michael thinks that Sophie is angry") and second-order beliefs that pertain to what children think about other people's thoughts (e.g., "Michael thinks that Sophie thinks that he's angry with her"). Flavell, Miller, and Miller (1993) argue that children develop a theory of mind along five successive stages. During the first stage, children adopt the concept of mind, that is, they attribute needs, emotions, and other mental states to people and use cognitive such as "know," "," and "think." During the second stage, children acknowledge that the mind has connections to the physical world. More specifically, they understand that certain stimuli lead to certain mental states, that these mental states lead to behavior, and that mental states can be inferred from stimulus-behavior links. During the third stage, children recognize that the mind is separate from and differs from the physical world. For example, they realize that a person can think about an object even though the object is not physically present. During the fourth stage, children learn that the mind can represent objects and events accurately or inaccurately. Thus, a representation can be false with respect to a real object or event (e.g., in a false belief task), behavior can be false with respect to a mental state (e.g., when a sad person smiles), and two people's perceptual views or beliefs can differ (i.e., perspective taking). During the fifth and final stage, children learn to understand that the mind actively mediates the interpretation of reality. For instance, children recognize that prior experiences affect current mental states which in turn affect emotions and social inferences. According to Flavell et al. (1993)
The TOM Test
Stages 1-3 can best be regarded as theory of mind precursors. These authors assume that these stages "probably emerge in quick succession, for they are very closely related concepts having to do with the differentiation of, and relations between, the mind and the external world" (p. 101). The step from Stage 3 to 4, the emergence of a "real" theory of mind, probably comes more slowly (around the age of 6); Stage 5, the "more mature" theory of mind, would emerge still later. Taken together, theory of mind refers to the child's capacity to analyze the behavior of others by recognizing the mental states (i.e., desires and beliefs) that underlie intentional behavior. Thus, theory of mind is a complex, developmental phenomenon, which implies certainly more than just the understanding of false belief. Obviously, there is a need for assessment tools that measure the developmental progression of theory of mind in a broader age range. One promising candidate in this respect is the Theory-of-Mind test (TOM test) designed by Steerneman (1994). The TOM test contains a variety of items that can be allocated to three subscales which correspond with the three main theory of mind stages as proposed by Flavell et al. (1993): (a) precursors of theory of mind (e.g., emotion recognition), (b) first manifestations of a real theory of mind (e.g., understanding of false belief), and (c) mature aspects of theory of mind (e.g., second-order beliefs). As a practical tool, the test provides information about the extent to which a child possesses social understanding, insight and sensibility, and the extent to which he or she takes the feelings and thoughts of others into . The present article is concerned with the reliability and validity of the TOM test.
STUDY 1 The purpose of Study 1 was twofold. First, the construct validity of the TOM test was investigated. The TOM test intends to be a developmental scale. Therefore, it was anticipated that TOM test scores correlate positively with age. That is, as children grow older, their theory of mind develops, and hence they more TOM test items. Furthermore, one expects that younger children predominantly succeed on TOM items that tap the basic domains of theory of mind (e.g., emotion recognition), whereas older children should increasingly items that measure the more mature aspects of theory of mind (e.g., understanding of false belief, understanding of humor, second-order belief). A second purpose of Study 1 was to evaluate the concurrent va-
69 lidity of the TOM test. More specifically, its relationship with other, more traditional, indices of theory of mind and social development was examined. Materials and Method Subjects and Procedure Seventy children (46 boys and 24 girls) recruited from a regular primary school ('De Driesprong' in Geleen, the Netherlands) participated in the study. The children ranged in age from 5 to 12 years. Ten children of each age level (i.e., 5, 6, 7, 8, 9, 10, and 11/12 years) were selected. All children were healthy, socially wellfunctioning, and none had learning difficulties. Thus, it can be assumed that they had normal intelligence. Children were tested at school in a private room with only the experimenter present. The assessment took place in two sessions. In one session, children underwent the TOM test. In another session, a series of alternative theory of mind or social development tasks was istered. The order of the sessions was counterbalanced within each age level group (i.e. half of the children started with the TOM test, while the other half first received the alternative battery of tests). The New Theory of Mind Test The TOM test comprises an interview that can be used in children between 5 and 12 years of age. The TOM test consists of vignettes, stories, and drawings about which the child has to answer a number of questions. The test lasts about 35 minutes and contains 78 items (i.e., questions). The TOM test contains three subscales: (a) precursors of theory of mind (i.e., TOM 1; 29 items; e.g., recognition of emotions, pretense), (b) first manifestations of a real theory of mind (i.e., TOM 2; 33 items; e.g., first-order belief, understanding of false belief), and (c) more advanced aspects of theory of mind (i.e., TOM 3; 16 items; e.g., second-order belief, understanding of humor). In the Appendix, examples of items of the three subscales are shown. Each TOM test item is scored as either failed (0) or ed (1). Accordingly, total TOM scores range between 0 and 78, with higher scores indicating a more mature theory of mind. TOM 1, TOM 2, and TOM 3 subscale scores vary between 0 and 29, 0 and 33, and 0 and 16, respectively. Alternative, More Traditional, Indices of Theory of Mind and Social Development A number of alternative indices of theory of mind and social development were employed in the current study.
Muris et al.
70
The Sally and Anne test (see Baron-Cohen et al., 1985) is a false belief task. It consists of a comic-strip story in which Sally and Anne are first introduced: Sally with a basket in front of her and Anne with a box. Next, Sally is shown placing a ball in the basket and leaving the room. Anne is then shown taking the ball from the basket and placing it in the box. Following this, Sally returns and children are asked: "Where will Sally look for her ball?" If the children point to the previous location of the ball, they the task because they acknowledge Sally's false belief (score = 1). If, however, they point to the ball's current location, they fail the task by not taking into Sally's false belief (score = 0). The Smarties test (Hogrefe et al., 1986) was used as an alternative false belief task (see Introduction). Scores on this test also vary between 0 (failed) and I (ed). Two tests of emotion recognition (Spence, 1980), the "Test of perception of emotion from facial expression" and the "Test of perception of emotion from posture cues" were istered. Children were asked to identify four basic emotions (happiness, fear, anger, and sadness) on pictures showing facial expressions or bodily postures. Scores on each test range between 0 and 4. The Social Interpretation Test (SIT; Vijftigschild, Berger, & Spaendonck, 1969) examines the child's ability to interpret social situations adequately. The test consists of a colored picture depicting a street in which a number of events take place. The child has to answer 9 questions about the picture (e.g., 'What has happened here?', 'Why is the ambulance driving in the street?'). The answers are ed, and classified into 24 categories. For each category, 1 point is given. SIT test scores range between 0 and 24 with higher scores reflecting greater ability to interpret social situations. The Picture Arrangement subtest of the Wechsler Intelligence Scale for Children-Revised (WISC-R; Wechsler. 1974) was used as a measure of social sensibility. This subtest asks children to order 12 series of 4 pictures in such a way that each series of pictures depicts a sensible story (range 0-12). The Role Taking test (Selman & Byrne. 1974) taps role taking skills of children. The test comprises a story of a social dilemma (a young girl has to save a little cat from a high tree, although she has just promised her father not to climb in trees anymore). Children are questioned about this story. From their answers on these questions, one can derive the level of role taking: egocentric role taking (i.e.. the child is not able to differentiate between his/her own point of view and that of
others, level 0); subjective role taking (i.e., the child recognizes his own point of view and that of others, level 1); self-reflective role taking (i.e., the child is able to adopt another person's perspective, level 2); and reciprocal role taking (i.e., the child weights his perspective against that of others and finds a solution for the social dilemma, level 3). The John and Mary test (Perner & Wimmer, 1985) assesses children's understanding of second-order beliefs. The test is an acted story in which two characters (John and Mary) are independently informed about an object's (an ice cream van) unexpected transfer to a new location. Hence both John and Mary know where the van is but there is a mistake in John's second-order belief about Mary's belief. "John thinks that Mary thinks that the van is still at the old place." Children's understanding of this second-order belief was tested by asking: 'Where does John think Mary will go for the ice cream?' Scores on this test are either 0 (failed) or 1 (ed).
RESULTS AND DISCUSSION General Results Reliability of the TOM Test The internal consistency of the TOM test was satisfactory, that is, Cronbach's alphas were .92 for the total TOM-scale, .84 for TOM 1, .86 for TOM 2, and .85 for TOM 3. Age and Theory of Mind Table 1 (right column) presents Pearson productmoment and point-biserial correlations between age, on the one hand, and theory of mind measures, on the other hand. As can be seen from this table, except for the Smarties test, all measures were positively and significantly associated with age. The absence of a connection between age and Smarties test performance is due to the fact that nearly all children in the present study, even the 5- to 6-year-olds, ed mis test. As expected, there was a robust correlation between TOM test and age: r(70) = .80, p < .001. Inspection of mean TOM scores per age level (see Table 1) showed that theory of mind capability increased linearly as children grew older. This indicates that the TOM test has one crucial property of a developmental scale, namely, it is sensitive to maturation. With respect to this result,
The TOM Test
71
Table I. Mean Scores of Children on Theory of Mind and Social Development Measures for Different Age Levels, and Pearson Product-Moment and Point-Biserial Correlations Between Age and Various Measures Age (in years)
7-8
5-6 Measure
M
TOM test Emotion recognition-face Emotion recognition-posture Sally and Anne test Smarties test Social Interpretation test WISC-R picture arrangement Role taking test John and Mary test
42.5
3.1 2.4 0.4
0.9 7.2 3.2 0.5 0.4
SD 7.4 0.9 1.1
0.5 0.3 3.0 3.0
0.6 0.5
M 59.3
3.4 2.7 0.7 0.9 8.8 8.3 1.6 0.9
9-10
SD 6.9 0.7 1.2 0.5 0.2 2.6 2.0 0.8 0.3
M 63.9
3.9 3.4
0.9 1.0 13.5
9.7 2.0 0.9
11-12
SD
5.2 0.3 0.9 0.2 0.0 2.8 1.6 0.6 0.3
M 68.1
3.9 3.7 0.8 1.0 14.7
9.4 2.3 0.9
SD 4.8 0.3 0.7 0.4 0.0 2.4 1.2 0.7 0.3
r with age
.80° .50" .46° .48°
.25 .74° .72° .73° .44°
"p < .05/9 (i.e., Bonferroni correction).
two further remarks are in order. To begin with, it should be noted that the most pronounced increase in theory of mind took place between ages 6 and 7. This is in line with the findings of previous studies showing that children of that age display marked improvement in their performance on more complicated theory of mind tasks (e.g., Perner & Wimmer, 1985). Second, the TOM test also proved suitable to index differential development of theory of mind in older age groups (i.e., in 9-10- and 11-12-year-old children). Note that a number of the alternative tasks tap an aspect of theory of mind that most normal children master at a relatively early age. For example, from age 7 onwards about 90% of the children successfully the John and Mary test, whereas from age 8 onwards most children recognize the four basic emotions from facial expression (see Table I). This indicates that these tests are less sensitive to index differential development of theory of mind in older age groups.
Construct Validity of the TOM Test As the TOM test intends to measure three successive developmental stages of children's theory of mind (i.e., precursors of theory of mind, first manifestations of a real theory of mind, mature theory of mind), one would expect that young children predominantly succeed on items that index the precursors of theory of mind, while at the same time they fail to items that measure the more mature aspects of theory of mind. For older ages, one would predict that an increasing number of children succeed on items that tap the more advanced areas of theory of mind. To examine this issue, for each
age level (i.e., 5, 6, 7, 8, 9, 10, and 11/12 years) success percentages of the three TOM subscales were calculated (i.e., number of ed items on a subscale divided by the total number of items of that subscale). Figure 1 shows mean success percentages on the three TOM subscales for the various age levels. A 3 (Subscales) X 7 (Age Levels) multivariate analysis of variance performed on these data revealed a significant effect of age, F(6, 63) = 32.1, p < .001, indicating that TOM test performance improves with age. Furthermore, a significant effect of subscale, Fhot(2, 62) = 133.2, p < .001, emerged due to the fact that success percentages on TOM 1 (i.e., precursors of theory of mind) were higher than those on TOM 3 (i.e., mature theory of mind), whereas success percentages of TOM 2 (i.e., first manifestations of a real theory of mind) were in between. Finally, the interaction of subscale with age also reached significance, Fhot(l2, 122) = 2.3, p < .05. As can be seen, 7-year-old children succeeded on the vast majority of TOM 1 and TOM 2 items (>80%), indicating that most of these children have ed the first two stages of theory of mind development. Note also that the mean success percentage on TOM 3 items in 5-year-old children was only 23.8%, whereas in 11- to 12-year-old children a success percentage of more than 80% is reached. Thus, as expected, children acquire advanced aspects of theory of mind at a relatively later age (i.e., after they have learned the more basic principles of theory of mind). Concurrent Validity of the TOM Test The relationships between TOM test and alternative indices of theory of mind were studied by means of
Muris et al.
72
Fig. 1. Mean success percentages on the three TOM subscales calculated per age level
Table II. Pearson Product-Moment and Point-Biserial Correlations Between TOM Test and Alternative Theory of Mind and Social Development Measures Variable
1. Emotion recognition-face
2 Emotion recognition-posture 3. Sally and Anne test 4. Smarties test 5, Social Interpretation Test 6, WISC-R picture arrangement f. Role taking test 8. John and Mary test
TOM .55b .46b
.50b .37b .61b .77b .75b .55"
1
2
— .27 .42* .45*
— .30 .30
.38* .45" .55"
.48b .44b .40"
.44b
.23
3
4
5
6
7
TOMa
.34 .30
— .16 ,29 .49b .40b .45b
.17 — .10 .27 .27
.20
—
.55b .57b .29
.29 .22 — .63* .54b
— .54b
.30 .40 .18
- To control for age effects. Pearson and point-biserial correlations were computed for each age level and then averaged. Mean correlations thus obtained are shown in this column. p < .05/36 (i.e.. Bonferroni correction).
Pearson product-moment correlations. In cases where dichotomous variables were involved, point-biserial correlations were used. As can be seen in Table II, most theory of mind indices are significantly correlated with each other. At first sight, it seems appropriate to compute correlations between TOM test and alternative indices of theory of mind while controlling for age (i.e., partial
correlations). However, by selecting 10 children of each age level, the design of Study 1 capitalized on the developmental progression of theory of mind. Thus, controlling for age would imply the elimination of an intrinsically important factor in both TOM and alternative tests (i.e., the developmental progression of theory of mind). To circumvent this problem, Pearson and point-biserial correlations between TOM test and con-
The TOM Test
current measures were computed for each age level separately. The mean of these separate correlations are presented in the right column of Table II. As can be seen, correlations attenuated considerably. Nevertheless, the TOM test was still positively associated with concurrent theory of mind indices. This result suggests that, as intended, the TOM test covers a broad range of theory of mind aspects.
STUDY 2 Study 2 intended to investigate another aspect of the reliability of the TOM test, namely, its test-retest stability. To examine this issue, 12 normal primary school children were tested twice with the TOM test, 8 weeks apart.
73 Table III. Demographic Variables of Normal Children in Study 2, and Their Total TOM Test Scores on Both Occasions TOM test scores (8 weeks apart) Child
Sex
Age
1 2 3 4 5 6
M M M F M F M M M F M F
5 6 6 7 8 8 9 9 10 11 11 12
7 8 9 10 11 12 M
SD
Occasion 1
Occasion 2
40 46 46 41 56 62 62 63 66 65 73 71
41 48 54 45 56 67 65 68 67 71 74 77
60.5 10.7
64.4 10.4
Method Subjects and Procedure Twelve children (8 boys and 4 girls) varying in age between 5 and 12 years from a regular primary school (De Pater van de Geld in Waalwijk, the Netherlands) participated in the study. AH children were healthy, normal-functioning children. Children were interviewed with the TOM test twice, 8 weeks apart. Both interviews were conducted by the same experimenter in a separate room at school. Results and Discussion Internal Consistency Internal consistency of the TOM test appeared to be sufficient: Cronbach's alphas were .95 for the total score, .62 for TOM 1, .94 for TOM 2, and .77 for TOM 3.
ficients were .99 (p < .001) for the total score, .80 (p < .005) for TOM 1, .98 (p < .001) for TOM 2, and .91 (p < .001) for TOM 3. These results indicate that the TOM test has sufficient test-retest stability and that the test can be used to measure children's development or improvement in theory of mind capability.
STUDY 3 The results presented so far suggest that the TOM test can be used as a measure of the efficacy of theory of mind training programs in children with pervasive developmental disorders (PDDs). Yet, as the TOM test is based on an interview with the child, data about the interrater reliability are needed. Study 3 addressed this issue. Ten children with PDDs were tested with the TOM test. Two independent observers classified the reactions of the children to each TOM test item as either failed or ed.
Test-Retest Reliability Table III shows demographic variables (age and sex) of the children as well as their total TOM test scores on both occasions. As can be seen. TOM test scores increased with age; the Pearson correlation was .88 (p < .001). Note further that most children slightly improved their score on Occasion 2. A paired t test showed that this improvement was significant. t(l 1) = 5.4. p < .01. Most important, test-retest reliability for the TOM test was satisfactory; intraclass correlation (ICC) coef-
Method Subjects and Procedure Ten children (10 boys) with PDDs were randomly selected for the purpose of the present study. Age of the children ranged between 7 and 13 years. All children were treated in one of the AUTI-groups of the Pediatric Center Overbunde, Maastricht, The Netherlands. After
Muris et al.
74
Table IV. Demographic Characteristics of 10 Boys and TOM Test Scores as Obtained by both Observers TOM test score Child
Age (years; months)
DSM-III-R diagnosis-
IQb
1
13:3 12:9 10:11
PDDNOS PDDNOS
4
7;6 8:1
92 93 82 86 93 119 92 97 96 92
2 3 5 6 7 8 9 10
11:2 10;8 12;3
6.9 7:10
AD AD PDDNOS PDDNOS PDDNOS PDDNOS PDDNOS PDDNOS
Observer 1 Observer 2
75 70 44 32 61 71 60 69 35 40
75 70 4S 33 59 71 59 68 33 38
Kappac
1.00 1.00 0.87 0.98 0.97 1.00 0,96 0.90 0.90 0.95
PDDNOS = pervasive developmental disorder not otherwise specified; AD = autistic disorder. indexed by the WISC-R, c Interrater reliability (Cohen's kappa). a
b As
extensive psychodiagnostic and psychiatric screening, the children were assigned a diagnosis of Autistic Disorder or Pervasive Developmental Disorder Not Otherwise Specified (PDDNOS). The children fulfilled the relevant DSM-III-R criteria (American Psychiatric Association, 1987). Diagnoses were made by a specialized, multidisciplinary team of professionals of the Center of Autism South-Limburg. The main demographic characteristics of the children are shown in Table IV. Children were tested in a silent room with two experimenters present. Five children were tested by Experimenter 1, while Experimenter 2 observed from a distance. For the other five children. Experimenter 2 istered the TOM test, while Experimenter 1 observed. Both experimenters monitored the responses and reactions of the children on-line. They were not able to observe each other's registrations. Results and Discussion Internal Consistency Internal consistency of the TOM test was good; Cronbach's alphas were .98 for the total score, .95 for TOM 1, .97 for TOM 2, and .95 for TOM 3. Interrater Reliability Interrater reliability of the TOM test was examined by computing Cohen's kappa using scores of both observers for the 78 items of the test. Kappas were calculated for each child separately because this makes it is possible to evaluate whether interrater reliability is
affected by the level of theory of mind development of each child. As can be seen in the right of Table IV, the kappa values were high (i.e., all exceeded .87). Furthermore, both observers produced a highly similar rank order of the children with regard to theory of mind; Spearman rank correlation was .99, p < .001. Altogether, the results of Study 3 indicate that the interrater reliability of the TOM test is good.
STUDY 4 Study 4 examined the discriminant validity of the TOM test. Various studies have concluded that a substantial proportion of the children with PDDs exhibit deficits in theory of mind. In most of these studies, theory of mind deficits have been demonstrated by means of false belief tasks (Baron-Cohen et al., 1985; Eisenmajer & Prior, 1991; Leslie & Frith, 1988; Perner, Frith, Leslie, & Leekam, 1989; Prior, Dahlstrom, & Squires, 1990). To investigate whether the TOM test is able to detect this specific deficit in children with PDDs, Study 4 compared TOM test scores of children with autism and PDDNOS with those of children who suffered from other psychiatric disorders (i.e., Attention-deficit/Hyperactivity Disorder, Anxiety Disorder). There is evidence to suggest that intelligence is a variable in performance on theory of mind tests (see, for a review, Happe, 1995), For example, Happe (1994) investigated the WISC-R scores of autistic children who either ed or failed a false belief task. Her results showed that ers had significantly higher IQ scores than failers. Most researchers in this domain
The TOM Test
75
Table V. Demographic Characteristics and Mean TOM Test Scores for Children with Attention-deficit/Hyperactivity Disorder (ADHD), Children with an Anxiety Disorder (AnxD), and Children with a Pervasive Developmental Disorder (PDD) Variablea Age Sex (m/f) TIQ VIQ PIQ TOM TOM1 TOM 2 TOM 3
ADHD children (n = 14)
AnxD children (n = 18)
PDD children (n = 20)
8.5 (0.9) 12/2 86.9(7.1) 91.6 (12.0) 83.4 (9.1)
9.1 (1.9) 11/7 93.6 (12.7) 90.5(11.9) 97.4 (14.3) 58.9 (9.9) 23.1 (3.1) 26.7 (4.5) 8.5 (3.2)
9.3 (2.4) 17/3 85.4 (12.9) 84.3 (16.1) 86.6 (10.9)
61.1 (8.4) 23.5 (3.2) 27.5 (3.8) 9.5 (22)
39.1 (24.9) 16.9 (8.6) 16.8(11.3) 4.9 (5.4)
F or X2
P
Post hoc comparisons
0.7 3.8 2.6 1.5 6.6 9.2 7.2
ns ns <10 ns <.005
PDD
10.9
6.4
<.00l <.005 <.001 <.005
PDD
PDD
' m = male; f = female; TOM = TOM total score: TOM 1 = precursors of theory of mind; TOM 2 = first manifestations of the 'real' theory of mind; TOM 3 = mature theory of mind. Levels of intelligence were measured with the WISC-R.
assume that it is especially verbal IQ that plays a role in the performance on false belief tasks (Happe, 1995). This may be relevant for the TOM test, as this test is essentially an interview instrument. Thus, it may well be the case that children's scores on this test are critically dependent on their verbal ability (i.e., language comprehension and/or expression ability). To examine this issue, WISC-R scores of the children in Study 4 were also obtained.
PDDNOS) also participated in Study 4. These children were chosen randomly from the database of the Center of Autism South-Limburg (see Study 3) and then interviewed with the TOM test. WISC-R scores of the PDD children were also available. Demographic characteristics (i.e., age, sex distribution, and IQ scores) of the three groups are shown in the upper part of Table V.
Method
Internal Consistency
Subjects and Procedure The subjects of Study 4 consisted of three groups: a group of anxiety-disordered children, a group of children with Attention-deficit/Hyperactivity Disorder (ADHD), and a group of children with pervasive developmental disorders. From the database (1996) of the children and youth section of the Community Mental Health Center, Eastern South-Limburg in Heerlen, The Netherlands, all children suffering from ADHD (n = 14) or an anxiety disorder (AnxD, i.e., obsessive-compulsive disorder, overanxious disorder, specific phobia, posttraumatic stress disorder, and separation anxiety disorder; n = 18) were selected. Children were classified on the basis of the DSM-III-R after extensive psychodiagnostic and psychiatric screening. As part of the intake procedure, all children completed the TOM test and the revised version of the Wechsler Intelligence Scale for Children (WISC-R; Wechsler, 1974). Twenty high-functioning children with PDDs (i.e., 8 children with Autistic Disorder and 12 children with
Results and Discussion
As in the previous studies, the internal consistency of the TOM test was satisfactory; Cronbach's alphas of the total scale and the various TOM subscales varied between .87 and .96 for the total group, .95 and .98 for the children with PDD, and .72 and .80 for psychiatric control children. Discriminant Validity The lower part of Table V shows mean TOM test scores for the three groups. Analyses of variance followed up by post-hoc t tests revealed that children with PDD had significant lower TOM test scores than children with ADHD and AnxD. For this sample, the Pearson product-moment correlation between TOM test and age was only .24 (p < .10). Correlations between TOM test scores, on the one hand, and Total IQ, Verbal IQ, and Performance IQ, on the other hand, however, were all positive and significant; r(52)s were .58 (p < .001), .61 (p < .001), and .45 (p < .001), respectively. Thus, children with higher intelligence scores performed better on the TOM test.
Muris et al.
76 To examine the unique contribution of the diagnosis Pervasive Developmental Disorder to TOM test performance, two additional analyses were performed. First of all, a multiple regression analysis (forward stepwise) was earned out with Diagnosis Autism, Diagnosis PDDNOS (both dummy variables), Verbal IQ, Performance IQ, and Age as the predictors, and TOM test scores being the dependent variable. Results showed that Diagnosis Autism entered on the first step r(52) = -.69, p < 0.001; ing for 47.6% of the TOM test scores. Verbal IQ (partial r = .32, p < .01), Age (partial r = .24, p < 0.05), and Diagnosis PDDNOS (partial r = -0,23, p < .05) entered on the second, third, and fourth step of the regression equation, ing for significant proportions of the variance (10.2, 5.8, and 4.4%, respectively). Second, an additional multiple regression analysis was performed while forcing Verbal IQ, Performance IQ, and Age in the equation at Step 1. Still, both Diagnosis Autism and Diagnosis PDDNOS contributed significantly to TOM test scores: partial rs being -.45 (p < .001) and -.22 (p < 0.05). Thus, even when controlling for IQ level and age, diagnoses still predicted TOM test performance; the more severe children's pervasive developmental disorder, the worse they performed on the TOM test. Altogether, the results of Study 4 the discriminant validity of the TOM test in that children with a PDD performed worse on the test than children with other psychiatric disorders. Furthermore, the findings indicate that this difference in TOM test performance is not carried by differences in intelligence. Even when controlling for intelligence, a significant and negative association between diagnoses of autism and PDDNOS, on the one hand, and TOM test performance, on the other hand, emerged. GENERAL DISCUSSION Theory of mind pertains to children's capacity to analyze the behavior of others by recognizing mental states (i.e., desires and beliefs) that underlie intentional and social behavior. Clearly, then, theory of mind consists of various aspects, such as the recognition of emotions, the assessment of how others think, and the understanding of the motives underlying behavior of others. The TOM test has been construed to measure this broad range of aspects from a developmental perspective. The test intends to tap three successive stages in the development of theory of mind: precursors of theory of mind, first manifestations of a real theory of mind, and more advanced aspects of theory of mind.
The current study was a first attempt to investigate the reliability and validity of the TOM test. The main results can be summarized as follows. To begin with, the TOM test was found to be a reliable instrument; internal consistency was good, test-retest reliability was sufficient, and interrater reliability was high. Second, TOM test scores increased with age, indicating that the test is sensitive to developmental progression. In line with this, young children only succeeded on TOM items that tap basic domains of theory of mind, whereas older children also ed items that measure the more advanced areas of theory of mind. Third, evidence was obtained that s the concurrent validity of the TOM test. That is, TOM test scores correlated positively and significantly with the performance on several other theory of mind tasks (i.e., tests of emotion recognition, understanding of false and second-order beliefs, and role taking). Fourth and finally, children with a PDD performed worse on the test than children with other psychiatric disorders. This suggests that the TOM test possesses discriminant validity. The TOM test can be used in three ways. First, the test can be employed to screen children for deficits in theory of mind. There is some evidence to suggest that a poorly developed theory of mind can have negative social-emotional consequences, even in normal children (Lalonde & Chandler, 1995). Consequently, an instrument that measures the maturity of children's theory of mind at different age levels is important. Second, because the TOM test is informative about the developmental phase of children's theory of mind, it enables clinicians to tailor their intervention to specific problems of each child. For example, when the TOM test indicates that a child even fails on items that measure precursors of theory of mind, it would be futile to teach this child understanding of false beliefs. Third, the TOM test can be used to evaluate the efficacy of theory of mind training programs. Altogether, the present findings imply that the TOM test is a reliable and valid instrument that can be employed to screen the development of theory of mind in 5- to 12-years-old normal children, children with pervasive developmental disorders, and other socially immature children. APPENDIX Examples of TOM Test Items Each question represents a TOM test item which is scored as either failed (0) or ed (1). The subscale to
The TOM Test
77
Fig. Al. Picture of Example 1.
Fig. A2. Picture of Example 3.
which each item belongs is mentioned between parentheses.
Story: Pirn is one year old. He's at home, playing on the ground Mother has given him a piece of apple. Suddenly, Pim bites his lip and he starts to cry. He throws the piece of apple on the ground. Mother lifts Pim up, comforts him, and puts the piece of apple on the table. When father arrives at home, mother is on the phone. Father lifts Pim up and hugs him. Then he puts Pim back on the ground, and gives him the piece of apple which is still lying on the table. As soon as Pim sees the piece of apple, he starts to cry. Question 1: Why is Pim crying when father gives him the piece of apple? (TOM 1) Question 2: Does father know why Pim is crying? (TOM 2) Question 3: Does father know that Pim has bitten his lip when he wanted to eat the apple? (TOM 2)
Example 1 Instruction: Take a look at this picture. Question 1: What has happened? Can you tell something about it? (TOM 1) Question 2: Who in this picture is afraid? (TOM I) Question 3: Why is this person afraid? (TOM 2) Question 4: Who in this picture is happy? (TOM 1) Question 5: Why is this person happy? (TOM 2) Question 6: Who in this picture is sad? (TOM 1) Question 7: Why is this person sad? (TOM 2) Question 8: Who in this picture is angry? (TOM 1) Question 9: Why is this person angry? (TOM 2) Example 2 Instruction: I will read you a short story. Listen carefully.
Example 3 Instruction: Take a look at this picture. Question 1: What, do you think, is happening in this picture? (TOM 1)
78
Muris et al.
Example 5 Instruction: Take a look at this picture. Story: This is Ben. Ben wants to play with his bricks. Question 1: Which box will Ben open to play with his bricks? (TOM 1) Story: Ben opens the box of bricks, and surprisingly he finds out that it is filled with washing powder! He closes the box, and opens the other smaller box. There are his bricks! He takes out some bricks, and goes playing with them in his bedroom. Then his brother Tim is entering the room. Tim also wants to play with the bricks... Question 2: Which box will Tim open to play with his bricks? (TOM 2) Question 3: Do you know where the bricks really are? (TOM 2)
Example 6 Instruction: I will read you a short story. Listen carefully. Story: Father and mother are at a birthday party. They only know a few people, and think the music is too loud. "Wow," says father, "It's a pleasure to be here!" Question 1: What does father mean? (TOM 3) Question 2: Why does father say: "It's a pleasure to be here!" (TOM 3) Fig. A3. Picture of Example 4. Example 7 Story: The two boys in the foreground gossip about the other boy. Suddenly, that boy approaches them and hears what they are saying. The two boys are startled. Question 1: How does this boy feel? (point at the boy in the background) (TOM 1) Question 2: How does this boy feel? (point at one of the boys in the foreground) (TOM 1)
Example 4 instruction: Take a look at this picture. Question 1: What has happened in this picture? (TOM 1) Question 2: How do you feel when you hurt yourself? (TOM 1) Question 3: Can you see from the girl's face how she really feels? (TOM 2)
Question 4: Is it possible to look happy, when you have hurt yourself? (TOM 2)
Question: Question: Question: Question: (TOM 2) Question: Question: 2) Question: Question:
Do as if you comb your hair. (TOM 1) Do as if you brush your teeth (TOM 1) Do as if you are feeling cold, (TOM 1) How can I see that you are feeling cold? Do as if you have a nasty drink. (TOM 1) How can I see that your drink is nasty? (TOM Do as. if you are scared? (TOM 1) How can I see that you are scared? (TOM 2)
Example 8 Instruction: Take a look at this picture. Story: This is John. John often dreams. Sometimes he dreams about a new bike that he likes to have. Question 1: Is John able to touch the bike that he dreams about? (TOM 1) Story: Sometimes John has a frightening dream. Then he dreams about shadows.
The TOM Test
79
Fig. A4. Pictures of Example 5.
Question 2: Does John really see these shadows with his eyes? (TOM 1) Question 3: Can somebody else see the shadows or the bike of John's dreams? (TOM 1)
Example 9 Instruction: I will read you a short story. Listen carefully. Story: It is summer. Will and Mike have their holidays. They go out for a bicycle ride. Suddenly, there is a downpour and they have to shelter in a bus station. There are two men in the bus station who also shelter from the rain. One of the men remarks: "Wow, we have nice weather today!" Question 1: What does the man mean? (TOM 3) Question 2: Is it true what the man says? (TOM 3) Question 3: Why does the man say: "Wow, we have nice weather today!" (TOM 3)
REFERENCES Astington. J. W., & Jenkins, J. M. (1995). Theory-of-mind development and social understanding. Cognition and Emotion, 9, 151165. American Psychiatric Association. (1987). Diagnostic and statistical manual of mental disorders (3rd ed., Rev.), Washington, DC: Author. Baron-Cohen, S., Leslie, A. M., & Frith, U. (1985). Does the autistic child have a 'theory of mind'? Cognition, 2J, 37-46. Baron-Cohen, S., Leslie, A. M., & Frith. U. (1986). Mechanical, behavioral and intentional understanding of picture stories in autistic children. British Journal of Developmental Psychology, 4, 113125. Bowler, D. M., Strom, E., & Urquhart, L. (1993). Elicitation of firstorder "theory of mind" in children with autism. Paper presented at the SRCD Conference, New Orleans, LA. Eisenmajer, R., & Prior. M. (1991). Cognitive linguistic correlates of "theory of mind" ability in autistic children. British Journal of Developmental Psychology. 9, 351-364. Flavell, J. H., Miller, P. H., & Miller, S. (1993). Cognitive development. Englewood Cliffs, NJ: Prentice-Hall. Frith. U. (1989). Autism; Explaining the enigma. Oxford: Blackwell. Hadwin, J., Baron-Cohen, S., Howlin, P., & Hill, K. (1996). Can we teach children with autism to understand emotions, belief, or pretence? Development and Psychopathology, S. 345-365.
Muris et al.
80
Fig. A5. Picture of Example 8.
Happe. F. (1994). Wechsler IQ profile and theory of mind in autism: A research note. Journal of Child Psychology and Psychiatry, 35, 1461-1471. Happe. F. (1995). The role of age and verbal ability in the theory-ofmind task performance of subjects with autism. Child Development, 66. 567-582. Hogrefe. G. J., Wimmer. H., & Perner, J. (1986). Ignorance versus false belief: A developmental lag in attribution of epistemic stales. Child Development. 57. 567-582.
Lalonde, C. E., & Chandler, M. J. (1995). False belief understanding goes to school: On the social-emotional consequences of coming early or late to a first theory of mind. Cognition and Emotion, 9, 167-185. Leslie. A. M., & Frith, U. (1988). Autistic children's understanding of seeing, knowing and believing. British Journal of Developmental Psychology, 6, 315-324. Ozonoff, S., & Miller, J. N. (1995). Teaching theory of mind: A new approach to social skills training for individuals with autism. Journal of Autism and Developmental Disorders, 25, 415-433. Perner, J., Frith, U., Leslie, A. M., & Leekam, S. (1989). Exploration of the autistic child's theory of mind: Knowledge, belief and communication. Child Development, 60, 689-700. Perner, J., & Wimmer, H. (1985). 'John thinks that Mary thinks that..,' Attribution of second-order beliefs by 5-10 years old children. Journal of Experimental Child Psychology, 39, 437-471. Premack, D., & Woodruff, G. (1978). Does the chimpanzee have a theory of mind? Behavioural and Brain Sciences, 4, 515-526. Prior, M., Dahlstrom, B., & Squires, T. (1990). Autistic children's knowledge of thinking and feeling states in other people. Journal of Child Psychology and Psychiatry, 31, 587-601. Selman, R. L., & Byrne, D. F. (1974). A structural-developmental analysis of levels of role taking in middle childhood. Child Development, 45, 803-806. Slaugther, V., & Gopnik, A. (1996). Conceptual coherence in the child's theory of mind: Training children to understand belief. Child Development, 67, 2967-2988. Spence. S. (1980). Social skills training with children and adolescents. A counselor's manual. Windsor: NFER/Nelson. Steerneman, P. (1994). Theory-of-mind screening-schaal fTlieory-ofmind screening-scale]. Leuven/Apeldoorn: Garant. Steerneman, P., Jackson. S., Pelzer, H., & Muris, P. (1996). Children with social handicaps: An intervention program using a theoryof-mind approach. Clinical Child Psychology and Psychiatry, I, 251-263. Swettenham, J. (1996). Can children with autism be taught to understand false belief using computers? Journal of Child Psychology and Psychiatry, 37, 157-165. Vijtigschild, W., Berger, H. J. C., & van Spaendonck, J. A. S. (1969). Sociale Interpretatie Test [Social Interpretation Test]. Amsterdam: Swets & Zeitlinger. Wechsler, D. (1974). Wechsler Intelligence Scale for Children (Rev.). New York: Psychological Corp. Wellman, H. (1990). The child's theory of mind. Cambridge. MA: MIT Press. Whiten, A., Irving, K., & Macintyre, K. (1993). Can three-year-olds and people with autism team to predict the consequences of false belief. Paper presented at the British Psychological Society Developmental Section Annual Conference, Birmingham, UK.