This research investigated the value added to middle school public education by pedagogically trained college students. An experimental design was employed in which 680 middle school pupils were randomly assigned to instructional groups. University arts and sciences students were put into two groups on the basis of those with formal teacher training and those without. Each student taught four lessons to his or her instructional group. Pupils were administered pre- and posttest measures on the content delivered in the four lessons and a reflection scale on lesson difficulty. Teachers' behaviors were recorded and scored independently by two trained observers. Results indicated that pupils' achievement was influenced by their perceptions of task difficulty and that teaching behaviors had a statistically significant influence on adjusted pupil achievement outcomes among students with formal pedagogical training. These results support the contention that pedagogical preparation of teachers adds value to middle school public education when measured in terms of pupil academic learning.

Keywords: achievement; teacher education; teaching behavior; student outcomes

Community leaders across the country work tirelessly every year to hire and retain effective teachers for their primary, elementary, and secondary schools. The challenges are complicated dramatically by little or no consensus on the definition of effective teachers and by a lack of evidence for the conditions that foster their development. Even well-informed, well-intentioned people often disagree on these matters. In recent years, conceptions of pupil learning have driven the debate; that is, when pupils master academic content, their teachers are thought to be effective, and in turn, the teacher education these teachers receive might be said to add value to the PreK- 12 enterprise. (1)

Critics have long questioned the value of formal teacher education programs (Bestor, 1985; Conant, 1963; Rickover, 1960). Teacher education has been panned as a haven for academically weak college students; a jargon-laden, commonsense enterprise masquerading as a profession; too long on theory, too short on practical knowledge; a protectionist alliance that works to exclude nonmembers from assuming teaching jobs; just plain unnecessary; and even harmful (Hess, 2001, 2006). Secretary of Education Paige (2002) deemed the number of state-required education courses "shocking" and labeled them the "Achilles Heel of the certification system." His report concluded by calling on states to end the "exclusive franchise" of schools of education. Paige's excoriation of education schools is simply one of the more recent and more visible in a litany of disparagement.

The No Child Left Behind legislation couches the debate about the value of teacher education in terms of pupils' performances. Teachers who produce pupil learning, as measured by standardized academic achievement tests, are said to "add value" to schools (McCaffrey, Lockwood, Koretz, & Hamilton, 2004; Millman, 1997; Odden, 2004; Wainer, 2004). By implication, teacher educators who produce such teachers can claim some credit. There is, however, scant empirical data in support of this proposition (Cochran-Smith & Zeichner, 2005).

Vartan Gregorian (2004), president of the Carnegie Corporation of New York has asked,

How is it possible that the United States, which claims to have three-fourths of the world's finest universities--and boasts 1,300 schools of education--has, in recent years, not only lacked qualified teachers but also had to venture beyond its own borders to find them?" (p. 16)

Concealed beneath the surface of this troubling question is the belief that the teacher education enterprise is duty bound to demonstrate clearly that it adds value to PreK-12 schooling. One might argue that teacher educators do far better than Gregorian gives them credit for. Regardless, the drive to determine if and how teacher education programs advance teacher performance, and in turn pupil learning, is shaping modem professional practice.

The focus of the present investigation was to examine the value added to pupil learning by contrasting teachers with and without formal pedagogical training and to do so within the context of a theoretical model that more fully accounts for the complexity of teachers' and pupils' educational lives than do other value-added schemes. We used the conceptual model in Figure 1 to guide our work (R. McNergney, Cohen, Hallahan, Kneedler, & Luftig, 2002). The basic dimensions of the model were first envisioned nearly a half century ago (Mitzel, 1960). More recently, others have amplified this view (Brophy, 1999; Brophy & Good, 1986; Dunkin & Biddle, 1974; Powell & Beard, 1984). Although the components or steps of the model guided our investigation of one university's school of education program, the model might generalize to many other teacher education efforts, thereby providing a framework for thinking about where sources of added value can be examined in particular programs.

Step 1: Presage Variables

Students enter the university's School of Education in their second year from the College of Arts and Sciences or by transferring from other institutions of higher education. They arrive with a set of formative experiences (educational and environmental factors that influence teachers' behaviors), demographic characteristics (race, ethnicity, gender, age, etc.), and personal properties (e.g., personality characteristics, attitudes, beliefs). These factors are often referred to collectively in the literature as presage variables, or variables that come before the act of teaching (Dunkin & Biddle, 1974). Presage variables are assumed to influence teaching processes or behaviors.

Formative experiences, as Dunkin and Biddle (1974) note, are both historical and contemporaneous. Teachers may have grown up in lower or higher socioeconomic communities, gone to more and less desirable schools, learned to speak one or more languages, and so forth. These factors are likely to influence their professional personas and their own development as teachers, once they are enrolled. During their program tenure, by virtue of their participation in all aspects of university life, they acquire knowledge that is likely to influence their professional practice. Academic and pedagogical coursework, field experiences, technology use, participation in volunteer activities, and the attitudes and abilities of those with whom they study, among many other factors, can be expected to shape or form teachers' planning and teaching behaviors.

Both demographic characteristics and personal properties periodically wax and wane in importance in the literature (Domas & Tiedeman, 1951; Getzels & Jackson, 1963; Guthrie, 1999; Steinberg, Grigorenko, Jarvin, & Stemler, 2004; Travers, 1973; Wittrock, 1986). Research on teachers' gender, physical characteristics, dispositions, and cultural or ethnic backgrounds has been thought to influence life in classrooms. Studies of teacher properties such as psychological traits and states, motives, abilities, propensities, beliefs, and attitudes are plentiful. Although research on teachers' demographic characteristics and on teacher properties continues to rise and fall in prominence, investigations might be of particular relevance in the context of this model. The model suggests that such characteristics are relevant for what they might reveal about teachers' capacities to demonstrate particular teaching processes. Many factors, then, not the least of which might be the academic and professional components of teacher education programs, can be expected to shape teachers' abilities to behave in ways that encourage pupil learning.

It is important to note, however, that the model does not assume that such factors produce uniform effects. Shulman's (1986) conception of "pedagogical content knowledge" is a case in point; good teaching in one subject does not always look like good teaching in another subject. Teachers are taught specialized knowledge of the good, better, and best ways to teach particular content in their own disciplines. Reason and evidence suggest that a kind of "pedagogical pupil knowledge" can exert similar effects. What is good teaching for one student is not always good for another. The factors of context noted in Figure 1 represent these possibilities. The balloon in Figure 1 that denotes teacher thinking represents characteristics or dispositions that are typically described as more immediate or temporally closer to teaching processes than are the presage variables noted above. There is a rich literature on these concepts (see Clark & Peterson, 1986). We did not, however, attempt to account for teacher thinking of this type in the research reported here.

Step 2: Teaching Process Variables

Once in education school courses and field experiences, students demonstrate teaching processes that are presumably influenced in part by their presage characteristics. For instance, when a student selects content for a demonstration lesson with her peers, the choice may be shaped by a course she took in her academic major area. Or, when a student teacher plans a test of his pupils' performance on a chapter test, he may draw on knowledge acquired in a tests and measurements course and by examining old tests created on the same material by his supervising teacher. Teachers plan and interact with their pupils in particular ways for many reasons, some of which are attributable to their training backgrounds.

Innumerable volumes have been written on what teaching is, how to do it, and how not to do it. Teaching processes, or behaviors variously defined, occupy a central, instrumental role in the model. Teaching processes are assumed to be influenced by presage variables, including but not limited to academic preparation and professional training. As such, teaching process variables might be characterized as dependent measures or as sources of information about students' abilities to apply the knowledge they have acquired in preparation programs. In turn, teaching processes are also assumed to influence Step 2.1 (pupil thinking and behavior) and ultimately Step 3 (pupil product variables). How well teachers behave--plan what is to be taught and learned, establish and maintain a social system within a classroom, communicate with pupils, evaluate pupil progress, and so on--can be inferred from pupil thoughts, actions, and achievement. In other words, the model posits that teaching success can be measured and judged in terms of pupils' performances.

Step 2.1: Pupil Thinking and Behavior

How pupils think about instruction and how they behave as a result can be described in various ways; two of the more useful terms are task difficulty and self-efficacy. Task difficulty, or lesson difficulty, can be thought of as a measure of pupils' perceptions of their own abilities to learn from the provided instruction. Self-efficacy is concerned with pupils' judgments of their personal capabilities. Self-efficacy has also been characterized as an assessment of one's own ability to perform a specific task (Pajares, 1996, 1997). Not surprisingly, pupils have been shown to be reasonably good judges of what does and does not constitute an intellectual challenge--they often know what they can and cannot do (Pressley & McCormick, 1995). Success raises pupils' efficacy, and failure lowers efficacy. Educational experiences where pupils master material are the most powerful sources of efficacy information (Atkinson, 1978; Woolfolk, 2004). Pupils' perceptions of their own abilities to perform a task are potentially related to achievement (Bandura, 1997; Schunk, 1984).

Step 3: Pupil Product Variables

Academic achievement tests are the most common standardized tests given to pupils. They are designed for both individuals and for groups and are intended to measure how much material a pupil has learned in particular content areas. Such tests can also be designed to assess basic skills. Roundly criticized for overemphasizing the acquisition of knowledge at the expense of other desirable outcomes, achievement tests have increasingly reflected concerns for the measurement of knowledge application in real-world conditions.

Other measures of academic achievement may not be standardized and may include more tightly circumscribed assessments of content mastery. Chapter tests, for example, focus on a set of objectives addressed over a period of weeks.

The model in Figure 1 does not specify what particular pupil outcomes are to be addressed. There are of course a host of other products that might serve--attitudes toward the subject matter, attendance, motivation to undertake additional work, and so forth. In all cases, pupil products are viewed as indications of the influence of teaching behaviors.

The bidirectional arrows in Figure 1 suggest that the variables exert reciprocal effects on one another to form a kind of programmatic feedback loop. For instance, knowledge of pupil learning can force a change in teaching behaviors; teaching behaviors that work can be expected to inform training programs.

Figure 2 represents our definitions of some of the key elements detailed in Figure 1 that are the focus of the present investigation. Teaching processes are operationalized in terms of five key teaching behaviors that are believed to be influential to the process of student learning (Eggen & Kauchak, 2006; Gunter, Estes, & Schwab, 1999; Joyce, Weil, & Calhoun, 2004). These processes are emphasized in a set of five general methods courses (courses that are not content specific) in the program we studied, which all education students take regardless of their academic specialties. An important question to be addressed by this work is whether teachers with and without formal teacher training display different levels of positive teaching behaviors. Pupil perceived lesson difficulty is included in our model as one indicator of pupil thinking described in relation to Figure 1. We operationalized this variable through pupil self-reports pertaining to internal perceptions of effort and attention to tasks. Both teaching behaviors and lesson difficulty are hypothesized to influence pupil learning in our model. An equally important question to be addressed is whether these hypothesized relationships remain invariant across groups of teachers with and without formal pedagogical training. The context of this analysis can be described at a variety of levels. The subject matter we consider is data representation and interpretation as taught to a group of typically developing students in Grades 6 through 8 attending public school. We chose this variable because it is related to learning mathematics, and it is also relevant to the study of science and social studies.

Method

Participants

Over the course of 2 academic years, we recruited students (n = 43) from the education school's bachelor of arts and master of teaching program (a 5-year teacher education program) and from the postgraduate master of teaching program (a 2-year teacher education program for those who already have a bachelor's degree). In addition, we recruited students from arts and sciences (n = 47). Students with pedagogical training were in their next-to-last year of study. Those with no pedagogical training were in their last year of study. Because pedagogical training requires that students complete extra coursework before graduation, both groups were at comparable points in their arts and sciences studies, even though education students may have been slightly older and hence slightly more mature.

The academic majors of students in both groups included English, environmental science, mathematics, psychology, Spanish, and studio arts. Due to some participant dropout during the study, the final sample of 90 included 39 matched pairs of students, 3 arts and sciences students who did not have matched counterparts from the education school, and 8 education school students who did not have a matched counterpart from arts and sciences. The group was composed of men (n = 16) and women (n = 74), the majority of whom were White (n = 75). All students received compensation for their participation.

The university students taught 680 sixth-, seventh-, and eighth-grade pupils in the full range of ability-grouped mathematics classes. The children in the middle schools in which they worked were composed of 51% boys, 49% girls, 61% White, and 39% minority. Pupils ranged in socioeconomic status as much as those in any large suburban school district in a state on the mid-Atlantic coast. The number of pupils taught by each arts and sciences student ranged from 3 to 12 (M = 7.89) and from 3 to 13 (M = 8.45) for education students.

Procedures

We employed an experimental design in which 680 middle school pupils were randomly assigned, by classroom, to smaller units; that is, we identified intact classrooms of pupils and subdivided pupils randomly into instructional groups. This strategy enabled us to place individual members of pairs of university students-matched by academic background and mismatched by pedagogical training background--to instructional groups of pupils with similar abilities. Our intent was to create a situation in which university students with and without pedagogical training taught data representation and interpretation to pupils of comparable ability in the mastery of these concepts.

We provided all students with common packets of materials necessary to instruct pupils in the subject matter. We also told students that they could use whatever outside resources they could find to develop the lessons. We were careful not to provide teaching tips or suggestions for managing the instructional groups.

We implemented the design detailed above in 2 consecutive years, with different students and pupils participating in each of the 2 years. Data collection in both years consisted of pre- and posttesting all pupils on the content (i.e., data representation and interpretation) delivered during the four lesson interventions. In an effort to ensure that the university students who served as teachers were blind to the actual items and were unable to teach to the test, all pupils were administered the pretest by their regular classroom teachers or by one of the investigators. In addition, we created alternate forms of these measures with the intent to assess the same content domains. Here again, this strategy was followed to reduce item exposure among teacher participants. We adjusted pupil achievement outcomes (i.e., posttest scores) in the investigated model to control for their initial status as determined by their pretest scores and to represent their predicted posttest achievement scores as a function of their pretest performance.

We videotaped either the second, third, or fourth instructional lesson of each university student. We avoided recording the first lesson in an effort to minimize any possible advantage that might accrue to those who were more familiar with working with children by virtue of their past experiences. All videographers were trained to place their cameras in similar positions in classrooms and to record both teachers and pupils at the beginning, middle, and end of lessons. We subjected the videotapes to scrutiny by two independent observers using the Teaching Performance Record, or TPR (University of Virginia, 2006). The TPR provides measures of five teacher quality indicators (i.e., focus/capacity, syntax, principles of reaction, social system, and evaluation). Following each instructional lesson, pupils completed a self-report questionnaire that contained items pertaining to the difficulty of the lesson presented. Although four teaching episodes were the most we could schedule, given student time constraints, we believe this was a sufficient number of experiences to adequately cover the material. At the same time, the possibility remains that more or fewer opportunities to teach might have affected the current findings.

Instrumentation

Pupil academic learning. Although the pre- and posttests were composed of different items in Years 1 and 2, both consisted of 19 items designed to be similar in terms of the outcomes that were assessed and their levels of difficulty. Correlations between pre- and posttests were large and positive in Year 1 (r = .88) and Year 2 (r = .89). Additional supporting evidence that the alternate forms were of equal difficulty came by way of nonsignificant mean differences between Year 1 and Year 2 pretest scores, t(88) = 1.29, p = .20, and posttest scores, t(88) = -0.149, p = .88.

The tests assessed a general understanding of data representation and interpretation (e.g., central tendency and variability); they were constructed to reflect the content represented in the state's Standards of Learning for the appropriate grade levels. Items consisted of three types: calculation, interpretation, and a combination of calculation and interpretation. They were vetted by a professor of mathematics, a mathematics educator, and a teacher educator with knowledge of science and social studies. The assessments were pilot tested with groups of middle school pupils.

Teaching Performance Record. The TPR is a low-inference instrument of 105 items used to observe teaching and pupil behaviors relevant to lessons in any discipline (University of Virginia, 2006). The TPR emanates from literature on teacher effectiveness (Brophy, 1999; Brophy & Good, 1986; Good, 1979; R. McNergney, 1988). Its content validity has been shaped by the teacher effectiveness literature, the teacher preparation standards from the school of education program goals, public school teaching expectations, and the Classroom Assessment Scoring System (Pianta, 2003). A panel of experts reviewed these multiple sources and compiled the dimensions of the TPR. The same panel specified the behaviors of each dimension of the TPR by creating a matrix of teaching constructs and their critical attributes. In addition, text and video examples were created that typified these behaviors.

The TPR provides opportunities for an observer to record data on four organizing concepts: classroom context, instructional planning, interactive teaching, and pupil behavior (involvement and misbehavior). In a pre-observation conference, the observer acquires classroom context information, including an estimate of the challenge the teacher faces in terms of the number of students with special needs, with limited English proficiency, and so on. Context also reflects the challenge of the lesson content, that is, whether it requires either lower level or higher level thinking from pupils. The observer records information on instructional planning from the pre-observation conference and from a written lesson plan. Three cycles of observation yield information on interactive teaching. A cycle consists of one 4-minute period of observing interactive teaching in real time, followed by one 4-minute period of reflection on the teaching just observed, followed by a 2-minute period of observing pupil behavior. Observers note pupil behavior for indications of both involvement and misbehavior. One cycle is conducted at the beginning of a lesson, one in the middle of the lesson, and one near the end of the lesson. Thus, a total of 18 minutes of direct observation of teachers and pupils and 12 minutes of reflection, or three cycles, equals one observational episode. The TPR is a sign system; that is, within a set period of time, an observer records the sign of a behavior and does not return to that behavior until a new time period of observation begins (R. F. McNergney & Carrier, 1981; Medley, Coker, & Soar, 1984). Data can be collected from either live or videotaped episodes of teaching and recorded on either paper forms or electronically.

Items on the TPR can be combined to produce five measures of strategic teaching: focus/capacity, syntax, principles of reaction, social system, and evaluation (see Table 1). A strategic view of teaching advances the use of instructional strategies to plan, teach, and assess teaching (Eggen & Kauchak, 2006; Gunter, Estes, & Schwab, 1999; Joyce et al., 2004). Such strategies are networks of decisions (both planning and interactive) about organizing people, materials, and ideas to produce learning. These strategies shape the objectives of classroom instruction, the means that will be employed, and the ways results will be evaluated. Some strategies involve relatively long-term decisions, such as a mathematics curriculum planned by a committee for a full school year. Others are fairly short-term decisions, for example, a single social studies lesson planned by a single teacher.

Although there is no general consensus on the combination of events that makes a good lesson or on the combination of qualities that makes a good teacher, the operative assumption is that potentially better teachers can plan and control their professional behavior or think and behave strategically (Joyce & Harootunian, 1967; R. F. McNergney & McNergney, 2007). The reasoning goes as follows: Because teachers face diverse learners who must accomplish a variety of outcomes, they are well served by being able to teach different ways or by possessing a rich repertoire of teaching strategies. Different strategies of teaching behavior are useful for different educational purposes, different content, and different learners. Teachers who possess such a repertoire have the potential to call up appropriate strategies when conditions warrant.

The strategic teaching constructs are defined by items not grouped together on the observation form, thus protecting against observer bias. The integrity of the five scales was established using Cronbach's alpha (average = .7). Each videotape was scored independently by two trained observers (generalizability coefficient = .78). The average of these two raters' scores served as the unit of analysis for each teacher.

Student Reflection Scale. This 3-point Likert-type inventory was designed as a consumer self-report on perceptions of challenge of the work and effort expended on assigned tasks (J. M. McNergney & McNergney, 2005). Nottingham (2005) isolated three factors that measured pupils' perceptions of the instructor challenge presented during the lesson, their own engagement with the content, and their motivation to learn. The four items from this scale that were used to measure lesson difficulty in the present study are shown in Figure 2. Internal consistency reliability estimates for these items were favorable ([alpha] = .82).

Data Analyses

The primary purpose of this research was to examine the relationships between teaching behaviors and pupil perceived lesson difficulty on adjusted achievement scores. The equality of measurement properties of the two predictors and their influence on pupil achievement were also examined through multigroup procedures to account for the multilevel nature of teachers nested within programs (i.e., arts and sciences students vs. education students).

A graphic representation of the full model that was investigated is presented in Figure 2. The five measured variables of teaching behaviors are enclosed in boxes on the left side of the figure, and the latent Teaching Behaviors factor is enclosed in an ellipse. The single-headed arrows running from the ellipse to the boxes represent the influence of this factor on its indicators, to be measured by the estimated factor loadings. The four items used to measure the latent perceived Lesson Difficulty factor were modeled in similar fashion. These factors were freely correlated with one another, as indicated by the curved double-headed arrows; their direct influence on pupil achievement is indicated by the single-headed arrows. The achievement variable in this model represents pupils' unstandardized posttest scores as predicted from their performance on the preinstruction test. This procedure was followed to control for any preexisting achievement differences among pupils.

Enclosed in circles, the e's (i.e., el, e2, etc.) depict residual sources of variance in the measured variables that are unaccounted for by the explicitly posited influences in the model. Scaling of the latent variables was accomplished by setting a single path for each to one, so that it assumed the scale of that variable.

The quality of the measurement structures underlying the latent variables of Teaching Behaviors and perceived Lesson Difficulty was also investigated by imposing equality constraints on the factor loadings across groups of teachers that differed in their type of preparation (i.e., arts and sciences students vs. education students). To this end, four separate models were examined. The first (Model 1) examined the quality of the full model depicted in Figure 2 through inclusion of all teachers, regardless of their preparation. Model 2 was an unconstrained multigroup model that evaluated the general form of the full model for teachers from different preparation programs. This model only investigated the degree of congruence between groups in terms of the free and fixed parameter estimates depicted in Figure 2. Model 3 added the additional restriction that the factor loadings linking Teaching Behaviors and Lesson Difficulty to their measured variables be constrained to be equal across groups. A nonsignificant comparison between Models 2 and 3 would indicate that the observed variables were functioning in the same way to measure their respective constructs for teachers from different academic programs. Thereafter, the multilevel nature of teachers nested within programs was modeled by estimating the influences of Teaching Behaviors and Perceived Lesson Difficulty on pupil achievement separately for each of the two groups of teachers. Model 4 consisted of a structured mean analysis that sought to determine whether the latent means of Teaching Behaviors and Lesson Difficulty varied for the two groups of teachers.

Numerous measures of model fit exist for evaluating the quality of measurement models, most developed under a somewhat different theoretical framework focusing on different components of fit (cf. Browne & Cudeck, 1993; Hu & Bentler, 1995). For this reason, it is generally recommended that multiple measures of fit be considered to highlight different aspects of fit (Tanaka, 1993). Given the well-known problems with chi-square ([chi square]) as a stand-alone measure of fit (Hu & Bentler, 1995; Kaplan, 1990), use of this statistic was limited to testing differences ([[chi square].sub.D]) between competing models. In addition, the normed chi-square statistic (NC), Tucker-Lewis Index (TLI), comparative fit index (CFI), and root mean square error of approximation (RMSEA) are reported for each model. NC values ranging from 1.0 to 3.0 are reflective of good-fitting models (Kline, 1998).

The TLI and CFI provide measures of model fit by comparing a given hypothesized model to a null model that assumes no relationship among the observed variables (Kranzler & Keith, 1999). These two measures generally range between 0 and 1.0, with larger values reflecting better fit. Values of .90 or greater are often taken as evidence of good-fitting models (Bentler & Bonett, 1980). Alternately, smaller RMSEA values support better fitting models. Here, values of .05 or below are generally taken to indicate good fit, although values of .08 or below are considered reasonable (Browne & Cudeck, 1993). All models were estimated with the AMOS (Analysis of Moment Structures) program using maximum likelihood estimation on covariance matrices (Arbuckle & Wothke, 1999).

Results

Correlations, means, and standard deviations for all measured variables are presented in Table 2. Model 1 examined the quality of the model depicted in Figure 2 without regard to teacher group membership. Fit statistics for this model were exceptional (see Table 3). NC was below 2.0; the TLI and CFI were well above .95; and the RMSEA was .05.

Model 2 examined the extent to which the model conceptualization illustrated in Figure 2 held for arts and sciences students versus education students. Results of this analysis indicated that this representation was viable for both groups. NC remained below 2.0, the TLI and CFI were above .90, and the RMSEA was .05. When equality constraints between arts and sciences students versus education students were placed on the factor loadings of the model (i.e., Model 3), measures of standalone fit remained encouraging (see Table 3). Moreover, a comparison between Models 2 and 3 failed to reveal a statistically significant decline in fit, [[chi square].sub.D](7) = 13.7, p > .05. Thus, measurement of Teaching Behaviors and Lesson Difficulty can be considered invariant for these two groups of teachers.

Standardized factor loadings linking Teaching Behaviors and Lesson Difficulty to their respective indicators are shown in Figure 2. All factor loadings were statistically significant and ranged from a low of .61 to a high .84 on Teaching Behaviors and from .59 to .90 on Lesson Difficulty. Values above the measured variables (enclosed in rectangles in Figure 2) are squared multiple correlations (SMC); these represent the amount of a measured variable's variance accounted for by a factor. The majority of SMCs were appreciable, ranging from a low of .38 to a high of .71 for variables measuring the Teaching Behaviors factor and from a low of .35 to a high of .82 for those measuring Lesson Difficulty. The correlation between Teaching Behaviors and Lesson Difficulty (-.17) was nonsignificant, suggesting that these two factors were relatively independent of one another.

The paths of primary substantive interest in this model were those linking Teaching Behaviors and Lesson Difficulty to pupils' adjusted achievement scores. Both were found to have statistically significant influences when the entire sample of teachers was considered (see Figure 2). Separate analysis by groups indicated that Teaching Behaviors continued to have a statistically significant influence on pupils' adjusted achievement scores (standardized regression weight = .36) and that Lesson Difficulty was no longer statistically significant (standardized regression weight = -.24) among teachers from education programs. Combined, these two factors accounted for 20% of the variance in pupils' adjusted achievement scores. In contrast, neither the influence of Teaching Behaviors nor of Lesson Difficulty on pupils' adjusted achievement scores was statistically significant (standardized regression weight = .31 and -.28, respectively) for teachers from arts and sciences programs.

Model 4 imposed additional constraints on Model 3 that allowed for an examination of whether one of the two teacher groups demonstrated more positive teaching behaviors or elicited a perception of greater lesson difficulty among their pupil groups. Results of this structured mean comparison revealed good model fit as evidence by a nonstatistically significant chi-square, [chi square](80) = 101.2, p > .05; failure to reveal a statistically significant decline in fit when compared to Model 3, ([chi square].sub.D)(7) = 5.86, p > .05; and other measures of model fit (see Table 3). Of substantive interest, Model 4 revealed a nonsignificant Lesson Difficulty difference of .08 units greater for teachers without formal teacher training, p > .05. However, the Teaching Behaviors factor mean for students without formal teacher training was 6.4 units below that for teachers with formal training, p < .05. In combination, these model comparisons reveal that teachers with formal training demonstrate more positive teaching behaviors and that these behaviors have a statistically significant influence on pupil knowledge of statistical concepts.

Discussion

These data tell a seemingly simple story: Teaching behaviors matter--they matter a great deal. In our investigation, the combined influences of Teaching Behaviors and Lesson Difficulty served to account for 20% of the variance in pupils' adjusted achievement scores among teachers with formal pedagogical training. Our results cannot be generalized to all teacher education programs. Yet it is reasonable to expect that how teachers plan instruction and interact with their pupils during class periods devoted to the mastery of data representation and interpretation influence pupil academic achievement. Moreover, there is reason to continue to explore the proposition that teachers who have participated in teacher education programs are more likely to behave in ways that help pupils learn than are teachers who have not participated in teacher education programs.

State standards of learning contain lists of tasks pupils must master to be deemed successful. Standards are organized by grade level and subject matter to suggest a scope and sequence of material. The challenges imposed by the tasks themselves may be reasonably uniform within particular content foci but may vary markedly within and between other content areas. Pupils in the present study needed both to compute and interpret material in order to solve problems of central tendency and variability. As both pupils and teachers would attest, the demands placed on middle school pupils as they try to master these concepts are quite different from the demands of the tasks they face in English classes as they punctuate sentences and differentiate between fact and opinion. An analysis of the skills and subskills required to accomplish such tasks would substantiate this proposition. Tasks demand varying levels of pupil understanding and effort if they are to be mastered. Teacher educators routinely encourage teachers to attend to these realities (Woolfolk, 2004).

As effective teachers learn in their preparation programs and as researchers have documented, the most important indications of task difficulty and its consequences may be present in the minds of pupils themselves. Pupils' perceptions can be influenced by both internal and external factors, including but not limited to their motives to succeed, the strength of their belief that if they try they are likely to succeed, and the incentive value of success (Atkinson, 1978). Some clues to what pupils in our study thought about the tasks they faced are present in the concept labeled Lesson Difficulty (see Figure 2).

Lesson Difficulty relates negatively to pupil achievement (-.24), which indicates that as pupils' perceptions of the difficulty of a lesson increased, their academic performance decreased. This may simply be an acknowledgment of the obvious: the easier the lesson in the minds of pupils, the higher their performance. Despite the fact that the SRS items failed to relate significantly to pupil learning outside the context of the full model, it is useful to speculate on the meaning of the items within the context of the full model. Two of the items ("Did you make yourself think during the lesson?" and "Did you try hard on this lesson?") may have tapped internal perceptions of effort that revolve around the perceived challenge. In this instance, higher academic performance was associated with lower effort. But might this mean instead that pupils who did well believed they were capable of doing well and thus did not have to exert undue effort to succeed? In other words, were these pupils taught to behave as though they were efficacious?

The remaining two SRS items ("Did the lesson make you think?" and "Did you have to work hard to understand the lesson?") reflect attention to attributes of the tasks themselves. In this case, lower challenges are associated with higher academic success. Pupils may have perceived tasks as inherently more and less doable based on their reading of the challenge involved. Although the relationship between Teaching Behaviors and Lesson Difficulty is not significant, researchers and teacher educators might justifiably increase their attention to teaching behaviors directed toward scaffolding instruction or helping pupils understand that they can handle the work. These complicated issues merit further exploration

The multidimensional nature of teaching behaviors was operationalized in this study to include five aspects of effective teaching (i.e., lesson focus/capacity, syntax, principles of reaction, social system, and evaluation). Each of these five indicators was found to be a good measure of teaching behavior as evidenced by the moderate to high factor loadings on the Teaching Behaviors factor. Moreover, these five measures were found to be equally effective measures of teaching behavior for teachers with and without pedagogical training. The outline of a teacher who scored high on the attributes might be sketched in terms of a person who provided support for instruction. These teachers gave clues and reminders, encouraged serious thought, broke problems into steps, provided examples, asked questions, gave feedback, and the like. Generally speaking, teaching behaviors that worked appeared to be responsive to pupil understanding of the material and aimed at producing independent learners.

Teaching behaviors were found to have a statistically significant influence on pupils' acquisition of content knowledge, application, and interpretation of basic data analysis concepts. However, this result was found to obtain only among students enrolled in a teacher education program. Students who were not enrolled in a teacher education program failed to demonstrate teaching behaviors that were significantly related to pupil learning. In other words, teacher education in this setting added value to the schools, as measured in terms of pupils' learning of data representation and analysis.

As the use of case method teaching and learning increases in teacher preparation, teacher educators, like professionals in business, medicine, and law, may exert greater influence on teachers' abilities to solve real-life problems. The influences of teacher education programs and faculty on teachers' thoughts and behaviors that ultimately affect pupils' learning have been linked to preparing teachers to identify key pupil problems--a central feature of much case-based work. Kessler's (2005) research revealed that teacher candidates' abilities to identify instructional problems in a case-based exercise is a greater predictor of pupil achievement than are the candidates' abilities to propose teaching actions. This finding "intimates that novice teachers' propensity to act--to leap before looking--might be detrimental to pupil achievement" (Kessler, 2005, p. iii). Teacher education then might be defined in part by encouraging candidates to identify problems and opportunities before they act on them. The challenge is to begin to identify why case-based approaches, methods of advancing strategic teaching, and other forms of teacher education make a difference in how teachers behave.

Programmatically speaking, there is little to be learned by examining the long jump between teacher characteristics and pupil learning (see Figure 1). Boyd, Grossman, Lankford, Loeb, and Wyckoff (2005) have demonstrated that, at least in the first few years of practice, teacher education background is associated with pupil achievement. We have not asked in these analyses whether pupils of teachers who are trained learn more than pupils of teachers who are untrained, because the answer would offer little if any direction to improve life in schools. The more important questions, we believe, are whether teacher education programs produce teachers who are likely to behave in ways that will influence pupil learning and whether these teachers are likely to change their behaviors when confronted with pupils who are not learning. Answers to these questions will push investigators to seek the reasons underlying the behaviors in the many contexts in which education occurs. Such evidence will inform the improvement of both teaching and teacher preparation.

References

Arbuckle, J. L., & Wothke, W. (1999). AMOS 4.0 user's guide. Chicago: Small Waters.

Atkinson, J. W. (1978). The mainspring of achievement-oriented activity. In J. W. Atkinson & J. O. Raynor (Eds.), Personality, motivation, and achievement (pp. 11-38). Washington, DC: Hemisphere.

Bandura, A. (1997). Self-efficacy: The exercise of control. New York: W. H. Freeman.

Bentler, P. M., & Bonett, D. G. (1980). Significance tests and goodness-of-fit in the analysis of covariance structures. Psychological Bulletin, 88, 588-606.

Bestor, A. (1985). Educational wastelands: The retreat from learning in our public schools (2nd ed.). Urbana: University of Illinois Press. Boyd, D., Grossman, P., Lankford, H., Loeb, S., & Wyckoff, J. (2005, November). How changes in entry requirements alter the teacher workforce and affect student achievement. Retrieved December 12, 2006, from http://www.teacherpolicyresearch.org/portals/1/ pdfs/how_changes_in_entry_requirements_alter the teacher workforce.pdf

Brophy, J. (1999). Teaching (Educational Practices Series No. 1). Geneva: International Bureau of Education. Retrieved January 29, 2005, from http://www.ibe.unesco.org/International/Publications/ EducationalPractices/prachome.htm

Brophy, J., & Good, T. L. (1986). Teacher behavior and student achievement. In M. Wittrock (Ed.), Handbook of research on teaching (3rd ed., pp. 340-370). New York: Macmillan.

Browne, M. W., & Cudeck, R. (1993). Alternative ways of assessing model fit. In K. A. Bollen & J. S. Long (Eds.), Testing structural equation models (pp. 136-162). Newbury Park, CA: Sage.

Clark, C. M., & Peterson, P. L. (1986). Teachers' thought processes. In M. Wittrock (Ed.), Handbook of research on teaching (3rd ed., pp. 255-296). New York: Macmillan.

Cochran-Smith, M., & Zeichner, K. M. (Eds.). (2005). Studying teacher education: The report of the AERA Panel on Research and Teacher Education. Mahwah, NJ: Lawrence Erlbaum.

Conant, J. B. (1963). The education of American teachers. New York: McGraw-Hill.

Domas, S. J., & Tiedeman, D. V. (1951). Teacher competence: An annotated bibliography. Journal of Experimental Education, 19, 103-218.

Dunkin, M. J., & Biddle, B. J. (1974). The study of teaching. New York: Holt, Rinehart and Winston.

Eggen, P. D., & Kauchak, D. P. (2006). Strategies and models for teachers: Teaching content and thinking skills (5th ed.). Boston: Allyn & Bacon.

Getzels, J. W., & Jackson, P. W. (1963). The teacher's personality and characteristics. In N. L. Gage (Ed.), Handbook of research on teaching: A project of the American Educational Research Association (pp. 506-582). Chicago: Rand McNally.

Good, T. L. (1979). Teacher effectiveness in the elementary school. Journal of Teacher Education, 30, 52-64.

Gregorian, V. (2004, November 10). No more silver bullets: Let's fix teacher education. Education Week, 24(11), 36, 48. Retrieved January 1, 2005, from http://www.edweek.org/ew/articles/2004/11/ 10/11 gregorian.h24.html

Gunter, M. A., Estes, T. H., & Schwab, J. (1999). Instruction: A models approach (3rd ed.). Boston: Allyn & Bacon.

Guthrie, J. W. (1999). A response to John Goodlad's Whither Schools of Education? Unless other changes occur, they might well wither. Journal of Teacher Education, 50, 363-376.

Hess, F. M. (2001, November). Tear down this wall: The case for a radical overhaul of teacher certification. Washington, DC: Progressive Policy Institute, 21st Century Schools Project. Retrieved December 12, 2006, from http://www.ppionline.org/ ppi_ci.cfm?knlgAreaID= 110&subsecID= 135&contentID=3964

Hess, F. M. (2006, February 5). Schools of reeducation? Washington Post, p. B07. Retrieved December 12, 2006, http://www .washingtonpost.com/wpdyn/content/article/2006/02/03/AR2006 020302603_pf.html

Hu, L., & Bentler, E M. (1995). Evaluating model fit. In R. H. Hoyle (Ed.), Structural equation modeling: Concepts, issues, and applications (pp. 76-99). Thousand Oaks, CA: Sage.

Joyce, B. R., & Harootunian, B. (1967). The structure of teaching. Chicago: Science Research Associates.

Joyce, B., Weil, M., & Calhoun, E. (2004). Models of teaching (7th ed.). Boston: Allyn & Bacon.

Kaplan, D. (1990). Evaluating and modifying covariance structure models: A review and recommendation. Multivariate Behavioral Research, 25, 137-155.

Kessler, L. W. (2005). Investigating teachers' decision-making abilities, in-class behaviors, and pupils' academic achievement. Unpublished doctoral dissertation, University of Virginia, Charlottesville.

Kline, R. B. (1998). Principles and practices of structural equation modeling. New York: Guilford.

Kranzler, J. H., & Keith, T. Z. (1999). Independent confirmatory factor analysis of the Cognitive Assessment System (CAS): What does the CAS measure? School Psychology Review, 28, 117-144.

McCaffrey, D. E, Lockwood, J. R., Koretz, D. M., & Hamilton, L. S. (2004). The promise and peril of using value-added modeling to measure teacher effectiveness. Santa Monica, CA: RAND.

McNergney, J. M., & McNergney, R. E (2005). Student reflection scales. Charlottesville: University of Virginia.

McNergney, R. (Ed.). (1988). Guide to classroom teaching. Boston: Allyn & Bacon.

McNergney, R. E, & Carder, C. A. (1981). Teacher development. New York: Macmillan.

McNergney, R., Cohen, S., Hallahan, D., Kneedler, R., & Luftig, V. (2002). The teaching assessment initiative of the teachers for a new era. Unpublished manuscript, University of Virginia, Charlottesville.

McNergney, R. E, & McNergney, J. M. (2007). The practice and profession of teaching. Boston: Allyn & Bacon.

Medley, D. M., Coker, H., & Soar, R. S. (1984). Measurement-based evaluation of teacher performance: An empirical approach. New York: Longman.

Millman, J. (Ed.). (1997). Grading teachers, grading schools. Thousand Oaks, CA: Corwin Press.

Mitzel, H. E. (1960). Teacher effectiveness. In C. W. Harris (Ed.), Encyclopedia of educational research (3rd ed., pp. 1481-1486). New York: Macmillan.

Nottingham, A. J. (2005). Investigating relationships between teaching behaviors and pupil outcomes (engagement, instructional challenge, and motivation to learn). Unpublished doctoral dissertation, University of Virginia, Charlottesville.

Odden, A. (Ed.). (2004). Assessing teacher, classroom, and school effects [Special issue]. Peabody Journal of Education, 79(4).

Paige, R. (2002). Meeting the highly qualified teachers challenge, the secretary's annual report on teacher quality. Washington, DC: U.S. Department of Education, Office of Postsecondary Education.

Pajares, F. (1996). Self-efficacy beliefs in academic settings. Review of Educational Research, 66, 543-578.

Pajares, F. (1997). Current directions in self-efficacy research. In M. L. Maehr & P. R. Pintrich (Eds.), Advances in motivation and achievement (Vol. 10, pp. 1-49). Greenwich, CT: JAI.

Pianta, R. C. (2003, March). Professional development and observations of classroom process. Paper presented at the SEED Symposium on Early Childhood Professional Development, Washington, DC.

Powell, M., & Beard, J. W. (1984). Teacher effectiveness: An annotated bibliography and guide to research. New York: Garland.

Pressley, M., & McCormick, C. B. (1995). Advanced educational psychology for educators, researchers, and policymakers. New York: HarperCollins College.

Rickover, H. G. (1960). Education and freedom. New York: E. P. Dutton.

Schunk, D. H. (1984). Self-efficacy perspective on achievement behavior. Educational Psychologist, 19, 48-58.

Shulman, L. S. (1986, February). Those who understand: Knowledge growth in teaching. Educational Researcher, 15(2), 4-14.

Sternberg, R. J., Grigorenko, E., Jarvin, L., & Stemler, S. (2004). An evaluation of the utility of the theory of successful intelligence for predicting the effectiveness of schools as intelligent systems. Retrieved December 12, 2006, from http://www.yale.edu/pace/ resources/Summary%200f%20Results.pdf

Tanaka, J. S. (1993). Multifaceted conceptions of fit in structural equation models. In K. S. Bollen & J. S. Long (Eds.), Testing structural equation models (pp. 10-39). Newbury Park, CA: Sage.

Travers, R. M. W. (Ed.). (1973). Second handbook of research on teaching. Chicago: Rand McNally College.

University of Virginia. (2006). Teaching performance record. Charlottesville, VA: CaseNEX, LLC. Retrieved December 12, 2006, from http://tpr.casenex.com.

Wainer, H. (Ed.). (2004). Value-added assessment [Special issue]. Journal of Educational and Behavioral Statistics, 29(1).

Wittrock, M. (Ed.). (1986). Handbook of research on teaching (3rd ed.). New York: Macmillan

Woolfolk, A. (2004). Educational psychology (9th ed.). Boston: Allyn & Bacon.

Tim Konold

University of Virginia

Brian Jablonski

Lynchburg Public Schools

Anthony Nottingham

University of Virginia

Lara Kessler

Columbia College

Stephen Byrd

Elon University

Scott Imig

University of North Carolina Wilmington

Robert Berry

Robert McNergney

University of Virginia

Note

(1.) These days, the word pupil sounds stilted, but the convention of designating learners in PreK- 12 as pupils and learners in universities as students helps minimize confusion.

Tim Konold is an associate professor and the program director of Research, Statistics, and Evaluation at the University of Virginia, Charlottesville. His research interests include large-scale test use as it pertains to construction, interpretation, classification, errors of measurement, and informant bias.

Brian Jablonski is a faculty member of the Lynchburg Public Schools. His research focuses on special education, including accommodations and instruction.

Anthony Nottingham is an assistant principal at Freedom High School in Woodbridge, Virginia. His research interests include dropout prevention and teacher retention.

Lara Kessler is an assistant professor at Columbia College, Columbia, Georgia. Her research focuses on middle grades and gifted education.

Stephen Byrd is an assistant professor of special education at Elon University, Elon, North Carolina. His research interests include international special education, learning disabilities, and response to intervention.

Scott Imig is an assistant professor in the School of Education at the University of North Carolina, Wilmington. His research interests include teacher education, classroom observation, and teacher supervision. He is currently studying the links between preservice teachers' social justice beliefs and their classroom practices.

Robert Berry, III, is an assistant professor at the University of Virginia, Charlottesville. His research interests include equity issues in mathematics education and understanding the roles that elementary mathematics specialists have on mathematics teaching and learning.

Robert MeNergney is a professor at the University of Virginia, Charlottesville. His research interests include teacher evaluation, online teacher development, and case-method teaching and learning.

Authors' Note: This research was conducted when we were at the University of Virginia; it was made possible in part by a grant from Carnegie Corporation of New York, the Ford Foundation, and the Annenberg Foundation. We thank the members of the Teachers for a New Era Research Advisory Council at the University of Virginia, Dan Fallon, and David Berliner for their helpful criticism of this work. The statements made and views expressed are solely the responsibility of the authors.

Table 1 Strategic Teaching Constructs Focus/capacity: A teacher's attention to the subject matter, goals, and objectives of the lesson. This concept also includes the support system or capacity needed to create the conditions necessary for teaching the particular content--support such as special teaching skills, technology, and books. (17 items) Syntax: The sequence of teaching activities. This concept of strategy in action reveals itself in the steps or procedures teachers implement as instruction unfolds. (17 items) Principles of reaction: The tactics of teaching. Teachers use rules of thumb, often from moment to moment, to gauge student intellectual engagement, motivation, frustration, and the like. These guidelines help teachers, in turn, to fashion responses to what students do. (27 items) Social system: Attention to roles and relationships. The social system deals with the parameters of authority relationships and behavioral norms and sanctions. (25 items) Provisions for evaluation: Attention to formative and/or summative judgments of teaching and of learning. (25 items) Table 2 Correlations and Descriptive Statistics for the Observed Measures of Teaching Behavior, Lesson Difficulty, and Predicted Achievement Teaching Behaviors 1 2 3 4 5 Teaching Behaviors 1. Social system 2. Syntax .25 3. Focus/capacity .38 .58 4. Principals of reaction .56 .53 .55 5. Evaluation .46 .36 .48 .58 Lesson Difficulty 6. Item 1 -.12 -.20 -.12 -.03 -.11 7. Item 2 -.08 -.21 -.09 -.07 -.10 8. Item 3 -.06 -.07 -.03 -.08 -.18 9. Item 4 -.16 -.20 -.11 -.14 -.15 10. Achievement .25 .16 .23 .25 .11 M 50 50 50 50 50 SD 10 10 10 10 10 Lesson Difficulty Achieve- ment 6 7 8 9 10 Teaching Behaviors 1. Social system 2. Syntax 3. Focus/capacity 4. Principals of reaction 5. Evaluation Lesson Difficulty 6. Item 1 7. Item 2 .76 8. Item 3 .51 .52 9. Item 4 .52 .59 .42 10. Achievement -.23 -.25 -.02 -.36 M 1.14 1.05 1.41 0.69 11.93 SD 0.23 0.23 0.26 0.31 2.08 Note: All values rounded to second decimal place for ease of presentation. Table 3 Model Fit Statistics for the Prediction of Adjusted Pupil Achievements From Learning Behavior and Perceived Lesson Difficulty [chi square] df NC TLI CFI RMSEA Model l 39.96 33 1.21 .97 .98 .05 Model 2 81.60 66 1.24 .93 .95 .05 Model 3 95.31 73 1.31 .90 .92 .06 Model 4 101.17 80 1.26 .90 .92 .06 Note: Model 1 = combined groups correlated factor model; Model 2 = general form (unconstrained multigroup model for arts and sciences students vs. education students); Model 3 = constrained factor loading multigroup model for arts and sciences students versus education students; Model 4 = structured mean analysis; NC = normed chi-square statistic; TLI = Tucker-Lewis Index; CFI = comparative fit index; RMSEA = root mean square error of approximation.