Teacher accountability in the United States is in a period of transformation. In July 2012, the 26th state received an Elementary and Secondary Education Act Flexibility Waiver, marking relief for more than half of the states from many of the requirements of the No Child Left Behind Act. In exchange, these states promised to implement rigorous new teacher evaluation systems that, among other things, include measures of student learning growth. Similarly, transforming teacher evaluation was a consistent priority for the United States Department of Education through the award of grants such as Race to the Top, the Teacher Incentive Fund, and School Improvement Grants. To improve their eligibility to access federal funding, and to simultaneously achieve their school improvement goals, since 2009, 36 states plus Washington, DC, and hundreds of school districts have passed teacher evaluation reforms, and 33 states have additionally passed principal evaluation reforms. For many states and districts the question of how to measure student learning as one aspect of measuring teacher effectiveness – in ways that are accurate, amenable to teachers, and do-able for teachers whose grades or subject areas are not systematically tested – has consumed much of their time and resources the last few years.


A meaningful, accurate evaluation system achieves a number of important purposes. As in any field, evaluations provide those managing the organization a clearer sense of each employee’s strengths and weaknesses so that decisions about promotion, professional development, assignment, and when necessary, dismissal can be made in a more thoughtful manner. In schools, there is an additional emphasis on the role of evaluations in providing detailed, constructive feedback to all teachers, including those that are considered generally effective already, with data that can inform continuous improvement in practice. It is now commonly understood that teacher effectiveness is the single most important school-level factor affecting student achievement – with principal effectiveness a close second. It is clear, therefore, that the continuous improvement of teacher and principal effectiveness must be an integral part of any efforts aimed at raising student achievement.


While improvements in educator evaluation are still evolving, the research and policy communities agree that a high quality teacher evaluation system includes several features. First, it assesses teacher effectiveness on multiple performance levels; that is, teachers are placed on a four or five point scale, as opposed to binary ratings that limit the evaluator to choosing between “satisfactory” or “unsatisfactory.” High quality teacher evaluation systems also include multiple measures of effectiveness (see box), and each of these measures must be carefully developed and tested for their validity (e.g., accuracy) and reliability (e.g., consistency). Evaluators must be rigorously trained on using the measures appropriately. Multiple evaluators should spend adequate amounts of time observing teachers on more than one occasion, comparing notes, and sharing detailed written feedback with teachers, while also coaching them to improve in areas of weakness.


Multiple Measures of Teacher Effectiveness


Teacher evaluations may include some combination of the following measures:

  • Classroom observations. Used by evaluators to make consistent judgments of teachers’ instructional practice, classroom observations are the most common measure of teacher effectiveness and vary widely in how they are conducted and what they assess. High quality classroom observation instruments are standards-based and contain well-specified rubrics that delineate consistent assessment criteria for each standard of practice. To be accurate, evaluators should be trained to ensure consistency in scoring.
  • Student growth on standardized tests. Student growth on standardized tests refers to the test score change from one point in time to another point in time. The related concept of value-added measures, refer to student growth measures that includes a pre-test score and a post-test score as well as a number of other variables (e.g., poverty, special needs, etc.) about students that are outside of a teacher’s control yet tend to affect students’ academic growth.
  • Other student growth data. Other student growth data includes information about the change in students’ performance on some measure such as a teacher- or district-developed test over two or more points in time. It may also include growth in terms of behavior, musical performances, or portfolios of student work.
  • Instructional artifacts. Instructional artifacts are used by evaluators to rate lesson plans, teacher assignments, teacher-created assessments, scoring rubrics, or student work on particular criteria, such as rigor, authenticity, intellectual demand, alignment to standards, clarity, and comprehensiveness. Evaluators typically use an evaluation tool or rubric to make judgments about the quality of student artifacts.
  • Teacher portfolios. Portfolios are a collection of materials that exhibit evidence of exemplary teaching practice, school activities, and student progress. They are usually compiled by the teacher him or herself and may include teacher-created lesson or unit plans, descriptions of the classroom context, assignments, student work samples, videos of classroom instruction, notes from parents, and teachers’ analyses of their students learning in relation to their instruction. Similar to portfolios, evidence binders often provide specific requirements for inclusion and require a final teacher led presentation of the work to an evaluation team.
  • Teacher self-assessments. Self-assessments consist of surveys, instructional logs, or interviews in which teachers report on their work in the classroom, the extent to which they are meeting standards, and in some cases the impact of their practice. Self-assessments may include checklists, rating scales, rubrics, and may require teachers to indicate the frequency of particular practices.
  • Student surveys. Student surveys are questionnaires that typically ask students to rate teachers on an extant-scale (e.g., from 1 to 5, where 1 = very effective, and 5 = not at all effective) regarding various aspects of teachers’ practice (e.g., course content, usefulness of feedback, etc.) as well as how much students say they learned or the extent to which they were engaged.
  • Parent surveys. Parent surveys are questionnaires that typically ask parents to rate teachers on an extant-scale (e.g., from 1 to 5, where 1 = very effective, and 5 = not at all effective) regarding various aspects of teachers’ practice (e.g., course content, usefulness of feedback, quality of homework, quality of communication, etc.) as well as the extent to which they are satisfied with the teachers’ instruction (Goe, Bell, & Little, 2008).


A number of reform-minded districts charted an early path implementing comprehensive changes to their evaluation systems. For example, in order to address concerns about the fairness of using student test scores to evaluate teachers, Hillsborough County Public Schools, in Tampa, Florida, decided early on to focus on the growth in test scores between two points in time rather than a static achievement measure captured only once a year. That way, teachers of special education or struggling students would not be at a disadvantage compared to classrooms with more gifted or high-performing students. The district adopted pre- and post-tests in each grade and subject, including over 600 assessments. Meanwhile, TAPTM: The System for Teacher and Student Advancement, adopted by districts across the country, created a system of master teachers and mentor teachers to help alleviate some of the time burden on principals by providing full- or part-time release hours to conduct teacher evaluations; provide extensive feedback and instructional demonstrations; identify context-relevant, research-based instructional strategies; analyze student data; create school-wide academic achievement plans; and interact with parents. Many more examples of new state and district policies on teacher and principal evaluation are available at www.tqsource.org, all of which offer innovative ideas and lessons learned for the benefit of other education leaders around the country.


Nevertheless, creating more robust teacher and principal evaluation systems will not, in isolation, lead to significant improvements in educator quality. For instance, what if some teachers are not willing or not able to improve enough to fully meet students needs, or if there is not a ready supply of excellent teachers and principals to replace those who are consistently not meeting expectations? To ensure that all students receive a great education, education reformers must see these new and improved evaluation systems as the beginning and not the end of a larger, systemic set of initiatives to attract and retain educators. Teacher preparation, compensation, induction and support, strategic recruitment, and the professional environment in schools must all be enhanced. For example, assessing teacher effectiveness should occur through annual evaluations, but also at the time of hiring and as part of the responsibility of the preparation programs that matriculated the new teachers in the first place.


Another critical aspect of redesigning evaluation systems is how to meaningfully involve teachers in the process. Engaging teachers, as well as principals, is essential in order to create evaluations that are well-designed, implemented with fidelity, and sustainable for the long-term. Unfortunately, genuinely engaging teachers in the evaluation redesign process is perhaps the most neglected aspect of the reform process to-date. But resources such as Everyone at the Table: Engaging Teachers in Evaluation Reform (www.EveryoneAtTheTable.org) have been developed to assist school systems with teacher engagement (see box).


Everyone at the Table: Engaging Teachers in Evaluation Reform

Engaging teachers in evaluation reform is an initiative of American Institutes for Research and Public Agenda, with funding from the Bill & Melinda Gates Foundation.

This free online resource center provides an easy-to-use model for widespread teacher-led conversations on evaluation reform that are constructive and solutions-oriented, using structured conversation tools and activities, with the end goal of increasing teacher input into the policies that are developed. It includes:

  • A two-minute video that captures the importance and enthusiasm of education leaders around the country for broader, more genuine involvement of teachers in evaluation reform (www.everyoneatthetable.org/leadersVideo.php)
  • An eight-minute teacher discussion-starter video (www.everyoneatthetable.org/gtt_video.php) that gives teachers the chance to think and talk about the pros and cons of different kinds of evaluation systems.
  • Materials such as moderator’s guides, PowerPoint presentations, and discussion summary templates to help leaders organize discussions with teachers and bring their voices to the table. Everyone at the Table has been used with success in Los Angeles, Detroit, Washington state, and elsewhere. To read their stories and learn more about this innovative approach to teacher engagement around evaluation, visit www.everyoneatthetable.org.


Closing persistent achievement gaps as well as raising achievement for all students will simply not be possible without recruiting and retaining sufficient teachers of the highest quality for every classroom. An effective accountability system must be anchored in a teacher evaluation system that is informed by research and best practice and includes teacher voice in the design and implementation. Of course, transforming teacher accountability systems as one part of a comprehensive approach to educator talent management and development requires thoughtful planning, prioritizing, and resource allocation. Based on financial data collected through the Bill & Melinda Gates Foundation’s initiative to build comprehensive educator evaluation systems, Harvard professor Tom Kane estimates that done well, a high quality teacher evaluation system is likely to consume two percent of a school district’s budget. Given the potential for new evaluation systems to produce data that can truly inform continuous improvements in teacher practice, and feed into an aligned system of educator talent management strategies that attract and retain greater numbers of excellent teachers—the cost may well be worth the investment.



Sabrina W. M. Laine, Ph.D., is Vice President, Education Human Development and the Workforce at the American Institutes for Research (AIR). She oversees numerous efforts to contribute to policy research and resource development related to every aspect of managing and supporting educator talent including recruitment, compensation, evaluation, distribution and professional development. Dr. Laine served as the Director of the National Comprehensive Center for Teacher Quality and is the primary author of the book, Improving Teacher Quality: A Guide for Education Leaders, published by Jossey-Bass in 2011. Dr. Laine earned her doctorate in educational leadership and policy studies from Indiana University.

Ellen Behrstock-Sherratt, Ph.D., is a researcher at AIR where she leads the organization’s compensation reform and educator talent management initiatives. Dr. Sherratt has presented on teacher incentives, Generation Y teachers, human capital management, and equitable teacher distribution and is co-author of the book Improving Teacher Quality: A Guide for Education Leaders. Dr. Sherratt earned her doctoral degree in education from the University of Oxford.