Linda Darling-Hammond. 21st Century Education: A Reference Handbook. Editor: Thomas L Good. Sage Publication. 2008.
As the 21st century dawns, it is increasingly clear that schools must become more successful with a wider range of learners if citizens are to acquire the sophisticated skills they need to participate in a knowledge-based society. Whereas 95% of jobs in 1900 and 50% in 1950 were low-skilled jobs requiring only the ability to follow basic procedures designed by others, today such jobs comprise only about 10% of the U.S. economy. At least 70% of jobs require specialized knowledge and skills, including the capacity to design and manage one’s own work, communicate effectively and collaborate with others, research ideas, collect, synthesize, and analyze information, develop new products, and apply many bodies of knowledge to novel problems that arise (Drucker, 1994).
This poses a new set of challenges for education. The kind of teaching needed to help students to think critically, solve complex problems, and master ambitious subject matter content is much more demanding than that needed to impart routine skills (Darling-Hammond et al., 2008). And in an era when the student population is more diverse than ever before, teachers are being asked to achieve these goals for all children, not just the minority who have traditionally been selected into “gifted and talented” or “honors” programs. Teachers must not only be knowledgeable in their content areas, but also skillful in using a range of methods to teach students with very divergent prior knowledge who learn in a wide variety of ways. The question for policy makers, practitioners, and preparers of teachers is, what makes a teacher effective in this challenging context?
The importance of this question is increasingly clear. Recent studies of teacher effects have found that teachers strongly determine differences in student learning, far outweighing the effects of differences in class size and composition (Rivkin, Hanushek, & Kain, 2005; Sanders & Rivers, 1996), and sometimes matching the sizable effects of student background variables like family income and education (Clotfelter, Ladd, & Vigdor, 2007; Ferguson, 1991). Teacher effects appear to be sustained and cumulative; that is, the effects of a very good or poor teacher spill over into later years, influencing student learning for a substantial period of time, and the effects of multiple teachers in a row who are similarly effective or ineffective produce large changes in students’ achievement trajectories.
Furthermore, in the United States, teachers are the most inequitably distributed resource. On any measure of qualifications—extent of preparation, level of experience, certification, content background in the field taught, advanced degrees, selectivity of educational institution, or test scores on college admissions and teacher licensure tests—studies show that students of color, low-income and low-performing students, particularly in urban and poor rural areas, are disproportionately taught by less qualified teachers (Darling-Hammond, 2004; Hanushek, Kain, & Rivkin, 2001; Lankford, Loeb, & Wyckoff, 2002). In some high-minority schools, a majority of teachers are inexperienced and uncertified, and in those with more than 90% students of color, the odds of having a math or science teacher with a certification and a degree in the field taught are less than 50% (Oakes, 1990).
These disparities are largely a function of the nation’s inequitable funding of education. In most states, the wealthiest districts have revenues and expenditures per pupil that are two or three times those of the poorest districts (Educational Testing Service, 1991; Kozol, 2005). Poor rural districts typically spend the least, and urban districts serving students with multiple needs spend much less than surrounding suburbs, where students and families have far fewer challenges. These inequities translate into differentials in salaries and working conditions—resources that greatly affect teacher labor markets.
Because of public attention to the importance of teacher quality for student learning and the unequal access U.S. students have to well-qualified teachers (see, e.g., National Commission on Teaching and America’s Future [NCTAF], 1996), the federal Congress included a provision in No Child Left Behind Act of 2002 that states should create plans to ensure that all students have access to “highly qualified teachers,” defined as teachers with full certification and demonstrated competence in the subject matter field(s) they teach (defined as completing a college major or passing a test in the field). This provision was historic, especially because the students targeted by federal legislation—students who are low-income, low-achieving, new English language learners, or identified with special education needs—have been in many communities least likely to be served by experienced and well-prepared teachers (NCTAF, 1996).
Debates about Teacher Quality
Various lines of research looking at teacher effectiveness since the 1960s have suggested that many kinds of teacher knowledge and experiences may contribute to teacher effects, including teachers’ general academic and verbal ability; subject matter knowledge; knowledge about teaching and learning; teaching experience; and the set of qualifications measured by teacher certification, which typically includes the preceding factors and others (for reviews, see Darling-Hammond, 2000b; Rice, 2003; Wilson, Floden, & Ferrini-Mundy, 2001).
While there is research supporting the importance of each of these traits or areas of knowledge, there are debates about the relative importance of various elements and about how strong the research is supporting different correlates of teacher effectiveness. One reason for these debates is that few studies have examined multiple elements of teacher knowledge, skills, and abilities at the same time. For example, some analysts argue on the basis of studies in the 1960s through 1980s that verbal or general academic ability matters most for teacher effectiveness, as there were a number of studies finding small but significant effects of teachers’ test scores on both general ability and tests of teaching knowledge during this time. However, none of these studies included measures of teacher education or certification that became available in large data sets during the 1990s. Later studies that include measures of teacher preparation find effects of teachers’ knowledge about subject matter and teaching and learning, going beyond general intellectual ability (for a review, see Darling-Hammond & Youngs, 2002).
Particularly contentious has been the debate about whether teacher preparation and certification are related to teacher effectiveness. For example, in his Annual Report on Teacher Quality (U.S. Department of Education [USDOE], 2002), Secretary of Education Rod Paige argued for the redefinition of teacher qualifications to include little preparation for teaching. Stating that current teacher certification systems are “broken,” and that they impose “burdensome requirements” for education course-work comprising “the bulk of current teacher certification regimes” (p. 8), the report suggested that certification should be redefined to emphasize verbal ability and content knowledge and to deemphasize requirements for education coursework, making student teaching and attendance at schools of education optional and eliminating “other bureaucratic hurdles” (p. 19).
Other commentators have also argued that certification of teachers should be abandoned by states in order to remove “regulatory barriers” to teaching (see, e.g., Ballou & Podgursky, 1997; Walsh, 2001). Their arguments are linked to concerns that state requirements for teacher preparation are burdensome (Walsh, 2001, pp. 1-2), that “professionalization” of teaching is an unnecessary barrier to school choice (Ballou & Podgursky, 1997, p. 44), and that increases in public school funding for low-spending urban districts would be unproductive (Hanushek, 1996). For example, Walsh (2001) argued that advocates demanding more fully credentialed teachers for Baltimore, Maryland, schoolchildren were misguided, because teacher certification does not mean that teachers are more effective.
While debates about the value of teachers’ preparation and experience have often been conducted around technical analyses of studies on the topic (see, e.g., Ballou & Podgursky, 1997; Darling-Hammond, 2000a, 2002; Darling-Hammond & Youngs, 2002; Walsh, 2001), they have strong social, political, and economic implications. Evidence on inequities in the distribution of fully qualified teachers has been prominent in a large number of school finance lawsuits, and significant costs could be associated with creating the salaries, working conditions, and other incentives needed to supply qualified teachers to all communities.
A final aspect of the debate is whether teacher effectiveness should be judged based on teachers’ formal qualifications, evaluations of their practice, or on the basis of student achievement, in particular student test scores. Recent legislative proposals have called for using “value-added” methods that examine gains in student test scores as the basis for making employment and merit pay decisions. In what follows, I discuss the evidence regarding these various conceptions of teacher quality, along with distinctions between teacher quality and teaching quality, and their implications for both policy and practice.
Effective Teachers and Teaching
At the outset, it is important to distinguish between the related but distinct ideas of teacher quality and teaching quality. Teacher quality might be thought of as the bundle of personal traits, skills, and understandings an individual brings to teaching, including dispositions to behave in certain ways. The traits desired of a teacher may vary depending on conceptions of and goals for education; thus, it might be more productive to think of teacher qualities that seem associated with what teachers are expected to be and do.
As noted earlier, research on teacher effectiveness, based on teacher ratings and student achievement gains, has found the following qualities important:
- Strong general intelligence and verbal ability that help teachers organize and explain ideas, as well as to observe and think diagnostically;
- Strong content knowledge—up to a threshold level that relates to what is to be taught;
- Knowledge of how to teach others in that area (content pedagogy), in particular how to use hands-on learning techniques (e.g., lab work in science and manipulatives in mathematics) and how to develop higher-order thinking skills;
- An understanding of learners and their learning and development—including how to assess and scaffold learning, how to support students who have learning differences or difficulties, and how to support the learning of language and content for those who are not already proficient in the language of instruction;
- Adaptive expertise that allow teachers to make judgments about what is likely to work in a given context in response to students’ needs. (Darling-Hammond & Bransford, 2005)
Although less directly studied, most educators would include in this list a set of dispositions to support learning for all students, to teach in a fair and unbiased manner, to be willing and able to adapt instruction to help students succeed, to strive to continue to learn and improve, and to be willing and able to collaborate with other professionals and parents in the service of individual students and the school as a whole.
Teaching quality has to do with strong instruction that enables students to learn. Such instruction meets the demands of the discipline, the goals of instruction, and the needs of students in a particular context. Teaching quality is in part a function of teacher quality—teachers’ knowledge, skills, and dispositions—but it is also strongly influenced by the context of instruction. Key to considerations of context are “fit” and teaching conditions. A “high-quality” teacher may not be able to offer high-quality instruction in a context where there is a mismatch in terms of the demands of the situation and his/her knowledge and skills; for example, an able math teacher asked to teach a science class for which s/he is not prepared may teach poorly; a teacher who is prepared and effective at the high school level may be unable to teach small children; and a teacher who is able to teach high-ability students or affluent students well may be quite unable to teach students who struggle to learn or who do not have the resources at home that the teacher is accustomed to assuming are available. Thus, a high-quality teacher in one circumstance may not be able to provide high-quality teaching in another.
A second major consideration in the quality of teaching has to do with the conditions for instruction. If high-quality teachers lack strong curriculum materials, necessary supplies and equipment, reasonable class sizes, and the opportunity to plan with other teachers to create both appropriate lessons and a coherent curriculum across grades and subject areas, the quality of teaching students experience may be suboptimal, even if the quality of teachers is high. Many conditions of teaching are out of the control of teachers and depend on the administrative and policy systems in which they work.
Strong teacher quality may heighten the probability of strong teaching quality but does not guarantee it. Initiatives to develop teaching quality must consider not only how to identify and develop the skills and abilities that are important for teachers, but also how to develop teaching contexts that enable good practice on the part of teachers.
Evidence about Teachers’ Formal Qualifications
There are many ways to conceptualize teachers’ knowledge, skills, and abilities. As the state’s legal vehicle for establishing competence for members of professions, including teaching, licensing or certification is meant to represent the minimum standard for responsible practice. The requirements for licensing include measures of many of the variables noted above, such as basic skills and general academic ability, knowledge about subject matter, knowledge about teaching and learning, and some teaching experience during a student teaching or internship placement.
Since the mid-1980s, states have taken steps to strengthen their licensure requirements, which are now substantially stronger than they were 20 years ago. In most states, candidates for teaching must now earn a minimum grade point average and/or achieve a minimum test score on tests of basic skills, general academic ability, or general knowledge in order to be admitted to teacher education or gain a credential. In addition, they must generally secure a major or minor in the subject(s) to be taught and/or pass a subject matter test, take specified courses in education and, sometimes, pass a test of teaching knowledge and skill. In the course of teacher education and student teaching, candidates are typically judged on their teaching skill, professional conduct, and the appropriateness of their interactions with children.
Despite questions that have been raised about the extent to which licensure or certification reflects qualities that are important to teacher effectiveness, the preponderance of evidence indicates there are significant links between teacher education and licensure measures (including education coursework, credential status, and scores on licensure tests) and student achievement. These relationships have been found at the level of the individual teacher (e.g., Goldhaber & Brewer, 2000; Hawk, Coble, & Swanson, 1985; Monk, 1994); the school (Betts, Reuben, & Danen-berg, 2000; Fetler, 1999; Goe, 2002); the school district (Ferguson, 1991; Strauss & Sawyer, 1986); and the state (Darling-Hammond, 2000a). These multilevel findings reinforce the inferences that might be drawn from any single study.
A review commissioned by USDOE’s Office of Educational Research and Improvement, which analyzed 57 studies that met specific research criteria and were published after 1980 in peer-reviewed journals, concluded that the available evidence demonstrates a relationship between teacher education and teacher effectiveness (Wilson, Floden, & Ferrini-Mundy, 2001). The review shows that empirical relationships between teacher qualifications and student achievement have been found across studies using different units of analysis and different measures of preparation and in studies that employ controls for students’ socioeconomic status and prior academic performance.
Some well-controlled studies have been able to compare the relative influences on student achievement of various aspects of teacher qualifications. Goldhaber and Brewer (2000) concluded, for example, that the effects of teachers’ certification on student achievement exceed those of a content major in the field, suggesting that what licensed teachers learn in the pedagogical portion of their training adds to what they gain from a strong subject matter background:
[We] find that the type (standard, emergency, etc.) of certification a teacher holds is an important determinant of student outcomes. In mathematics, we find the students of teachers who are either not certified in their subject … or hold a private school certification do less well than students whose teachers hold a standard, probationary, or emergency certification in math. Roughly speaking, having a teacher with a standard certification in mathematics rather than a private school certification or a certification out of subject results in at least a 1.3 point increase in the mathematics test. This is equivalent to about 10% of the standard deviation on the twelfth grade test, a little more than the impact of having a teacher with a BA and MA in mathematics. Though the effects are not as strong in magnitude or statistical significance, the pattern of results in science mimics that in mathematics, (p. 139, emphasis added)
This study also found that beginning teachers on probationary certificates (those who were fully prepared and completing their initial 2- to 3-year probationary period) from states with more rigorous certification exam requirements had positive effects on student achievement, suggesting the potential value of recent reforms to strengthen certification.
Three recent, large well-controlled studies, using longitudinal individual-level student data from New York City and Houston, Texas, have found that teachers who enter teaching without full preparation—as emergency hires or alternative route candidates—are less effective than fully prepared beginning teachers working with similar students, especially in teaching reading (Boyd et al., 2006; Darling-Hammond et al., 2005; Kane, Rockoff, & Staiger, 2006). All three studies also found that the effectiveness of alternate route teachers who were completing training while they taught generally improved by their second or third year, when most were certified. These effects are likely due to both the training the recruits had completed and their greater experience. A number of studies have found that teachers become more effective after their first couple of years in the classroom (Kain & Singleton, 1996; Rivkin, Hanushek, & Kain, 2005).
Teachers learn in many ways—from formal preparation, personal experience, on-the-job professional development, and from colleagues. The individual and cumulative effects of various kinds of teacher qualifications were estimated in a large-scale study using North Carolina data to examine learning gains of high school students. This study found that teachers were more effective if they held a standard license (as compared to those who entered without having completed training), had a license in the specific field taught, had higher scores on the teacher licensing test (especially in mathematics), had taught for more than 2 years, had graduated from a more competitive college, and went through the process of National Board certification (Clotfelter, Ladd, & Vigdor, 2007).
While each of these variables was statistically significant in its own right, the combined influence on student achievement of a teacher with low overall qualifications (no experience, low licensure test scores, no prior teacher preparation, certification in a field other than the one taught, no Board certification, no graduate degree, and from an uncompetitive college) as compared to one having most of them was 0.30 standard deviations lower. Using a more conservative measure representing a comparison between teachers whose mix of qualifications were in the top 10% versus those in the bottom 10%, the effect on student achievement of 0.18 standard deviations was larger than the effects of lowering class size by five students’ race and parent education (e.g., the average difference in achievement between a White student with college-educated parents and a Black student with high school-educated parents).
These very large effects suggest the importance of focusing on what teachers have had the opportunity to learn through their general education, subject matter training, and preparation for teaching, as well as their experience and professional learning opportunities such as National Board certification (discussed further below). This and other evidence suggest that it is a mistake to believe that only one or two characteristics of teachers can explain their effects on student achievement. The message from the research is that multiple factors are involved and teachers with a combination of attributes—knowing how to instruct, motivate, manage, and assess diverse students; strong verbal ability; sound subject matter; and knowledge of effective methods for teaching that subject matter—hold the greatest promise for producing student learning.
Clotfelter and colleagues estimated that the combined effects of the formal qualifications they measured might account for about one third of the total teacher effect on student learning. This suggests that what teachers know and do to succeed with students is also determined by other factors and may be revealed by other kinds of measures. Among these are finer-grained measures of qualifications. For example, although master’s degrees in general show little effect, master’s degrees earned in the field a teacher teaches (e.g., a degree in mathematics or mathematics education) are associated with greater effectiveness (Goldhaber & Brewer, 1997). Similarly, coursework in subject-specific teaching methods has a bearing on teacher effectiveness (Monk, 1994) as does preparation in how to work with diverse student populations, in particular, training in cultural diversity, teaching limited English proficient students, and teaching students with special needs (Wenglinsky, 2002).
Specific kinds of preparation have been found to enable teachers to engage in practices that strongly influence student learning. For example, teachers trained to use formative assessment to provide feedback to students and opportunities for them to revise their work have been found in many dozens of studies to have large effect sizes on student learning gains (Black & Wiliam, 1998). Math and science teachers who engage in more hands-on learning, such as the use of manipulatives in math or laboratory experiments in science, and who emphasize higher-order thinking skills produce stronger student achievement (Wenglinsky, 2002). Teachers who teach students specific metacognitive strategies for reading, writing, and mathematical problem solving have been found to produce increased student learning of complex skills (for a review, see Darling-Hammond & Bransford, 2005).
Evidence of Teacher Performance
Although there is a relationship between the easily measurable aspects of teachers’ qualifications and their effectiveness, it makes sense that what teachers do with what they have learned should be even more directly related to their students’ learning. To the extent that teacher preparation or training influences student learning, it is because it influences teacher practices first. In recent years, systematic evaluation of teachers’ practice has emerged as a way of judging teaching quality, in large part because of efforts to develop research-based standards for effective teaching and build these into performance assessments that examine how teachers plan and implement instruction.
Some well-designed performance-based assessments of teaching have been found to detect aspects of teaching that are significantly related to teacher effectiveness, as measured by student achievement gains. These include standardized teacher performance assessments like those used for National Board certification and for beginning teacher licensure in states like Connecticut and California, as well as standards-based teacher evaluation systems used in some local districts. The value of using such assessments is that they can both document broader aspects of teacher effectiveness and can be used to help teachers develop greater effectiveness, as participation in these assessments has been found to support learning both for teachers who are being evaluated and educators who are trained to serve as evaluators.
Teacher Performance Assessments
A standards-based approach to assessing teachers was initially developed and made systematic through the work of the National Board for Professional Teaching Standards, which developed standards for accomplished teaching in more than 30 teaching areas defined by subject matter and developmental level of students. The Board then developed an assessment of accomplished teaching that assembles evidence of teachers’ practice and performance in a portfolio that includes videotapes of teaching, accompanied by commentary, lesson plans, and evidence of student learning. These pieces of evidence are scored by trained raters who are experts in the same teaching field, using rubrics that define critical dimensions of teaching as the basis of the evaluation. Designed to identify experienced accomplished teachers, a number of states and districts use National Board certification as the basis for salary bonuses or other forms of teacher recognition, such as selection as a mentor or lead teacher.
A number of recent studies have found that the National Board certification assessment process identifies teachers who are more effective in raising student achievement than others who have not achieved certification (Bond et al., 2000; Cavaluzzo, 2004; Goldhaber & Anthony, 2005; Smith et al., 2005; Vandevoort, Amrein-Beardsley, & Berliner, 2004). Perhaps equally important, many studies have found that teachers’ participation in the National Board process supports their professional learning and stimulates changes in their practice. Teachers note that the process of analyzing their own and their students’ work in light of standards enhances their abilities to assess student learning and to evaluate the effects of their own actions, while causing them to adopt new practices that are called for in the standards and assessments (Athanases, 1994; Lustick & Sykes, 2006). Teachers report significant improvements in their performance in each area assessed—planning, designing, and delivering instruction, managing the classroom, diagnosing and evaluating student learning, using subject matter knowledge, and participating in a learning community—and observational studies have documented that these changes do indeed occur (Sato, Chung, & Darling-Hammond, in press).
National Board participants often say that they have learned more about teaching from their participation in the assessments than they have learned from any other previous professional development experience. David Haynes’s (1995) statement is typical of many:
Completing the portfolio for the Early Adolescence/Generalist Certification was, quite simply, the single most powerful professional development experience of my career. Never before have I thought so deeply about what I do with children, and why I do it. I looked critically at my practice, judging it against a set of high and rigorous standards. Often in daily work, I found myself rethinking my goals, correcting my course, moving in new directions. I am not the same teacher as I was before the assessment, and my experience seems to be typical, (p. 60)
Following on the work of the National Board, a consortium of more than 30 states, working under the auspices of Council of Chief State School Officers (CCSSO), created the Interstate New Teacher Assessment and Support Consortium (INTASC) standards for beginning teacher licensing. Most states have now adopted these into their licensing systems. In some states, teacher performance assessments for new teachers, modeled after the National Board assessments, are being used either in teacher education, as a basis for the initial licensing recommendation (California, Oregon), or in the teacher induction period, as a basis for moving from a probationary to a professional license (Connecticut).
These assessments require teachers to document their plans and teaching for a unit of instruction, videotape and critique lessons, and collect and evaluate evidence of student learning. Like the National Board assessments, beginning teachers’ ratings on the Connecticut Beginning Educator Support and Training (BEST) assessment have been found to significantly predict their students’ value-added achievement on state tests (Wilson & Hallum, 2006). This finding is especially significant because the lowest-scoring candidates who do not pass the assessment are not allowed to gain a professional license or gain tenure in Connecticut, so the analysis had to deal with a truncated range that did not include most of those teachers. (Those who do not pass have the opportunity to attempt the assessment, but must pass by their third year in teaching to remain in the profession.) About 10% of candidates in Connecticut do not pass the assessment.
These assessments have also been found to help teachers improve their practice. Connecticut’s process of implementing INTASC-based portfolios for beginning teacher licensing involves virtually all educators in the state in the assessment process, either as beginning teachers taking the assessment or as school-based mentors who work with beginners, as assessors who are trained to score the portfolios, or as expert teachers who convene regional support seminars to help candidates learn about the standards. Educators throughout the system develop similar knowledge about teaching and learn how principles of good instruction are applied in classrooms. These processes can have far-reaching effects. By 2010, an estimated 80% of elementary teachers, and nearly as many secondary teachers, will have participated in the new assessment system as candidates, support providers, or assessors (Pecheone & Stansbury, 1996).
A beginning teacher in Connecticut described the power of the process, which requires planning and teaching a unit, and reflecting daily on the day’s lesson to consider how it met the needs of each student and what should be changed in the next day’s plans. He noted: “Although I was the reflective type anyway, it made me go a step further. I would have to say, okay, this is how I’m going to do it differently. It made more of an impact on my teaching and was more beneficial to me than just one lesson in which you state what you’re going to do…. The process makes you think about your teaching and reflect on your teaching. And I think that’s necessary to become an effective teacher”.
The same learning effects are recorded in research on the similar Performance Assessment for California Teachers (PACT) assessment used in California teacher education programs. Beginning teachers observe that they have learned to reflect on and modify their teaching in response to student needs. Faculty and supervisors who score the portfolios using standardized rubric note that this representation of teaching quality helps them improve teacher education quality:
This [scoring] experience … has forced me to revisit the question of what really matters in the assessment of teachers, which—in turn—means revisiting the question of what really matters in the preparation of teachers.
—Teacher education faculty member
[The scoring process] forces you to be clear about “good teaching”; what it looks like, sounds like. It enables you to look at your own practice critically, with new eyes.
When assessments both predict teacher effectiveness and support individual and institutional learning, they may be able to help to create an engine for stimulating greater teacher effectiveness in the system as a whole.
Standards-Based Evaluations of Teaching
Similarly, standards-based teacher evaluations used by some districts have been found to be significantly related to student achievement gains for teachers and to help teachers improve their practice and effectiveness (Milanowski, Kimball, & White, 2004). Like the teacher performance assessments described above, these systems for observing teachers’ classroom practice are based on professional teaching standards grounded in research on teaching and learning. They use systematic observation protocols to examine teaching along a number of dimensions. All of the career ladder plans noted earlier use such evaluations as part of their systems and many use the same or similar rubrics for observing teaching.
In a study of three districts using standards-based evaluation systems, researchers found positive correlations between teachers’ ratings and their students’ gain scores on standardized tests (Milanowski, Kimball, & White, 2004). In the schools and districts studied, assessments of teachers are based on well-articulated standards of practice evaluated through evidence including observations of teaching along with teacher interviews and, sometimes, artifacts such as lesson plans, assignments, and samples of student work.
The Teacher Advancement Program (TAP) offers one well-developed example of a highly structured teacher evaluation system that was developed based on the standards of the National Board and INTASC and the assessment rubrics developed in Connecticut and Rochester (New York), among others. In the TAP system of “instructionally focused accountability,” each teacher is evaluated four to six times a year by master/mentor teachers or principals who are trained and certified evaluators using a system that examines designing and planning instruction, the learning environment, classroom instruction, and teacher responsibilities. The training is a rigorous 4-day process, and trainers must be certified based on their ability to evaluate teaching accurately and reliably. Teachers also study the rubric and its implications for teaching and learning, look at and evaluate videotaped teaching episodes using the rubric, and engage in practice evaluations. After each observation, the evaluator and teacher meet to discuss the findings and to make a plan for ongoing growth. TAP provides ongoing professional development, mentoring, and classroom support to help teachers meet these standards. Teachers in TAP schools report that this system, along with the intensive professional development offered, is substantially responsible for improvements in their practice and the gains in student achievement that have occurred in many TAP schools (Solomon et al., 2007). As described later, data from this extensive teacher evaluation and development system are combined with evidence about schoolwide and individual teacher student achievement gains in making judgments about teachers’ appointment to specific roles in the career ladder.
The set of studies on standards-based teacher evaluation suggest that the more teachers’ classroom activities and behaviors are enabled to reflect professional standards of practice, the more effective they are in supporting student learning—a finding that would appear to suggest the desirability of focusing on such professional standards in the preparation, professional development, and evaluation of teachers.
Evidence of Student Learning
Meanwhile, interest in including evidence of student learning in evaluations of teachers has been growing. After all, if student learning is the primary goal of teaching, it appears straightforward that it ought to be taken into account in determining a teachers’ competence. At the same time, the literature includes many cautions about the problems of basing teacher evaluations substantially on student test scores. In addition to the fact that curriculum-specific tests that would allow gain score analyses are not typically available in many teaching areas, these include concerns about overemphasis on teaching to the test at the expense of other kinds of learning; problems of attributing student gains to specific teachers; and disincentives for teachers to serve high-need students, for example, those who do not yet speak English and those have special education needs (and whose test scores therefore may not accurately reflect their learning). This could inadvertently reinforce current practices in which inexperienced teachers are disproportionately assigned to the neediest students or schools discourage high-need students from entering or staying. At the same time, some innovative career ladder and compensation programs (in Rochester, New York, and Denver, Colorado, for example, as well as the TAP system described earlier) have found ways to include evidence of student learning in teacher evaluations. These are discussed below.
The Use of Value-Added Achievement Test Scores to Evaluate Teachers
Because of a desire to recognize and reward teachers’ contributions to student learning, a prominent proposal is to use value-added student achievement test scores from state or district standardized tests as a key measure of teachers’ effectiveness. The value-added concept is important, as it reflects a desire to acknowledge teachers’ contributions to students’ progress, taking into account where students begin. Furthermore, value-added methods are proving valuable for research on the effectiveness of specific populations teachers (for example, those who are National Board certified or those who have had particular preparation or professional development experiences) and on the outcomes of various curriculum and teaching interventions.
However, there are serious technical and educational challenges associated with using this approach to make strong inferences about individual teacher effectiveness, especially for high-stakes purposes, as opposed to studying the effectiveness of groups of teachers in a research context. Among other things, for example, when researchers are aggregating data about large groups of teachers for research rather than decision-making purposes, they make various assumptions about how to treat missing student data, which students to include, or how to choose among models using different statistical controls that change the results of their estimates. Researchers may be concerned from an intellectual perspective about whether their models are indeed capturing teacher effects (as opposed to student variables or testing artifacts or the results of school practices outside the classroom), but they need not worry about whether their decisions disadvantage particular teachers in the way they would need to if these analyses were to be used to make individual personnel decisions.
Indeed, the emergent strategies being used to analyze student learning data to assess potential teacher effectiveness produce very different results depending on the different decisions researchers make about how to handle the data (e.g., whether to control for student demographic characteristics or school effects, whether and how to interpolate missing data for students, whether to include or exclude special needs learners or new English language learners, whether to use tests that do not measure the specific curriculum a teacher teaches). Leading researchers agree that, while it is useful for research purposes, value-added modeling (VAM) is not appropriate as a primary measure for evaluating individual teachers. Summarizing the results of many studies, including a recent wide-ranging review by the RAND Corporation, Henry Braun (2005) of the Educational Testing Service concluded:
VAM results should not serve as the sole or principal basis for making consequential decisions about teachers. There are many pitfalls to making causal attributions of teacher effectiveness on the basis of the kinds of data available from typical school districts. We still lack sufficient understanding of how seriously the different technical problems threaten the validity of such interpretations, (p. 17)
The problems researchers have identified with using value-added testing models as a primary determinant of teacher effectiveness, especially those drawing on once-a-year large-scale assessments, include the following:
• Teachers’ ratings are affected by differences in the students who are assigned to them. Students are not randomly assigned to teachers—and statistical models cannot fully adjust for the fact that some teachers will have a disproportionate number of students who may be exceptionally difficult to teach (students with poor attendance, who are homeless, who have severe problems at home, etc.) and whose scores on traditional tests are problematic to interpret (e.g., those who have special education needs or who are English language learners). This can create both misestimates of teachers’ effectiveness and disincentives for them to want to teach the students who have the greatest needs.
• VAM requires scaled tests, which most states don’t use. Furthermore, many experts think such tests are less useful than tests that are designed to measure specific curriculum goals. In order to be scaled, tests must evaluate content that is measured along a continuum from year to year. This reduces their ability to measure the breadth of curriculum content in a particular course or grade level. As a result, most states have been moving away from scaled tests and toward tests that measure standards based on specific curriculum content, such as end-of-course tests in high school that evaluate standards more comprehensively (e.g. separate tests in algebra, geometry, algebra 2, and in biology, chemistry, and physics). These curriculum-based tests are more useful for evaluating instruction and guiding teaching, but do not allow VAM.
• VAM models do not produce stable ratings of teachers. Teachers look very different in their measured effectiveness when different statistical methods are used. Different teachers appear effective depending on whether student characteristics are controlled, whether school effects are controlled, and what kinds of students teachers teach (e.g., the proportion of special education students or English language learners). In addition, a given teacher may appear to have differential effectiveness from class to class and from year to year, depending on these things and others. Braun notes that ratings are most unstable at the upper and lower ends of the scale, where many would like to use them to determine high or low levels of effectiveness.
• Most teachers and many students are not covered by relevant tests. Scaled annual tests are not available in most states for teachers of science, social studies, foreign language, music, art, physical education, special education, vocational/technical education, and other électives in any grades, or for teachers in Grades K-3 and nearly all teachers in Grades 9-12. Furthermore, because the scores are unstable, experts recommend at least 3 years of data for a given teacher to smooth out the variability. With many grades and subjects uncovered by scaled tests, and with 3 years of data needed to get a reasonably stable estimate for a teacher (thus excluding first- and second-year teachers), at best only about 30% of elementary teachers and 10% of high school teachers are covered by data in most states.
• Missing data threatens the validity of results for individual teachers. Once teacher and student mobility are factored in, the number of teachers who can be followed in these models is reduced further. In low-income communities, especially, student mobility rates are often extremely high, with a minority of students stable from one year to the next. Although researchers can make assumptions about score values for missing student data for research purposes, these kinds of adjustments are not appropriate for the purposes of making individual teacher judgments.
• Many desired learning outcomes are not covered by the tests that are widely used. Tests in the United States are generally much narrower than assessments used in other high-achieving countries (which feature a much wider variety of more ambitious written, oral, and applied tasks), and scaled tests are narrower than some other kinds of tests. Thus, many important goals of education—including untested areas such as writing, research, science investigations, social studies, and the arts, or skills such as data collection, analysis, and synthesis, or complex problem solving—are not captured by widely used tests.
• It is impossible to fully separate out the influences of students’ other teachers, as well as school conditions, on their apparent learning. Prior teachers have lasting effects, for good or ill, on students’ later learning, and current teachers also interact to produce students’ knowledge and skills. For example, the essay writing a student learns through his history teacher may be credited to his English teacher, even if she assigns no writing; the math he learns in his physics class may be credited to his math teacher. Specific skills and topics taught in one year may not be tested until later years. A teacher who works in a well-resourced school with specialist supports may appear to be more effective than one whose students don’t receive these supports. A teacher who teaches large classes without adequate textbooks or materials may appear to be less effective than one who has a small class size and plentiful supplies. As Braun (2005) notes, “it is always possible to produce estimates of what the model designates as teacher effects. These estimates, however, capture the contributions of a number of factors, those due to teachers being only one of them. So treating estimated teacher effects as accurate indicators of teacher effectiveness is problematic” (p. 17). To understand the influences on student learning, more data about teachers’ practices and context are needed.
Thus, while value-added models are useful for looking at groups of teachers for research purposes—for example, to examine the results of preparation or professional development programs or to look at student progress at the school or district level—and they may provide one measure of teacher effectiveness among several, they are problematic as the primary or sole measure for making evaluation decisions for individual teachers. In the few systems where such measures are used for personnel decisions such as performance pay, they are often used for the entire group of teachers in a school, rather than for individuals. Where they are used, they need to be accompanied by an analysis of the teachers’ students and teaching context, and an evaluation of the teachers’ practices if a judgment about teacher quality is to reflect all the factors that influence teachers’ effectiveness.
Debates about teacher quality have arisen from both technical and political disagreements about what qualifications predict effectiveness and what policies should guide teacher preparation, selection, hiring, retention, and rewards. At the root of some of these debates is the question of whether all students are equally entitled to teachers of comparable qualifications, as well as the question of which qualifications matter most.
A review of the research illustrates that the search for a single measure of teacher quality is rather like the search for the Holy Grail. Studies show that there are many teacher characteristics and qualities which, in combination, predict teaching effectiveness. These include not only individual teacher qualifications, but also features of the teaching environment—such as resources, class sizes, and student characteristics, as well as the fit between the teaching assignment and teachers’ knowledge and skills. When teachers have more knowledge resources on which to draw and more conducive teaching conditions, they are more effective in promoting student learning.
The fact that teachers’ effectiveness is greatly enhanced when they have had many opportunities to learn—including high-quality general education, deepening of both content and pedagogical knowledge, teaching experience, and opportunities to develop specific practices through professional development and assessment—suggests a multifaceted approach to policy development on behalf of stronger teaching. If policies are to support the recruitment of well-educated candidates into high-quality preparation programs that ensure substantial opportunities to learn subject matter and pedagogy, and support both their retention in teaching as well as their ongoing learning focused on effective practices, the overall quality of teaching could be expected to be significantly higher.
The most successful approaches to identifying teacher quality for purposes of recognizing and rewarding effective teachers use multiple measures of performance, typically considering at least three kinds of evidence in combination with one another:
- Knowledge and skills that are associated with desired student outcomes and achievement of school goals, as reflected in specific kinds of content and pedagogical knowledge, certifications, and desired abilities of teachers;
- Teachers’ performance on teaching assessments measuring standards known to be associated with student learning (including national assessments, such as National Board certification, and locally managed standards-based teacher evaluations);
- Contributions to growth in student learning (from classroom assessments and documentation as well as tests, when appropriate).
Finally, some strategies for evaluating teacher quality—such as standards-based teacher evaluations and assessments like those of the National Board for Professional Teaching Standards—have been found not only to measure features of teaching associated with effectiveness, but actually to help develop effectiveness at the same time—not only for the participants but also for those involved in mentoring and assessing these performances. These approaches may be particularly valuable targets for policy investments, as they may provide an engine for developing teaching quality across the profession while providing a useful measure of how teachers contribute to student learning.