Hericlitus, a Greek philosopher (ca. 535-475 BCE), believed change was central to what transpires in the universe. An expression often attributed to Hericlitus, although there is no record of his having written or uttered it, is that “everything is in a state of flux.” More than 2500 years ago, this Greek thinker clearly grasped the constancy of change.

Hericlitus, a Greek philosopher (ca. 535-475 BCE), believed change was central to what transpires in the universe. An expression often attributed to Hericlitus, although there is no record of his having written or uttered it, is that “everything is in a state of flux.” More than 2500 years ago, this Greek thinker clearly grasped the constancy of change.

The ways we determine the quality of our schools, and particularly the approaches employed by those who accredit schools, have certainly been in a state of flux during the past century. When I began my teaching career back in the early 1950s, schools were accredited very differently than the way they are today. In this brief analysis, I’ll indicate what the chief change has been in the way we evaluate our schools. Thereafter, I will identify another change that must be made in our school-accreditation strategies if we want them to be defensible.

A Preoccupation with Input Variables

My first serious brush with the accreditation of schools did not take place during my years as a beginning teacher in Oregon. Actually, my own high school didn’t go through an accreditation process while I was there, so I never seriously thought about what was involved in the appraisal of schools. However, once I enrolled in a doctoral program at Indiana University, I experienced an instant collision with school accreditation. A professor in one of my very first courses assigned a term project in which his students were to evaluate a real or fictitious school using the appraisal model then employed by the North Central Association (NCA). Our professor supplied us with materials currently used during NCA accreditations, and my fellow students and I were to develop a school-accreditation report based on the evaluative criteria then employed by NCA.

I had never done anything of this sort before. Indeed, I never even knew there was a formal accreditation process used to evaluate schools. It was all new and, I admit, somewhat intimidating. But as I became immersed in the NCA accreditation materials, I quickly discerned that NCA’s accreditations were based almost completely on input variables. Those input variables included such factors as the number of books in a school’s library, the number of instructional hours in a school’s academic year, and the number of degrees and/or staff-development courses completed by a school’s teachers. There was no attention paid—none—to what a school’s students had learned, and this struck me as strange. But who was I, a rookie graduate student, to quarrel with the strategy of what I soon learned was a widely respected school-accreditation association. NCA was, after all, a “North Central Association,” and Indiana was, geographically, a north-central state. I assumed NCA’s preoccupation with input variables was a legitimate way to tackle the evaluation of schools. As I said, I was a rookie.

The appraisal of schools during those years seemed to be predicated on a reasonable notion that if a school provided its students with the right kinds of inputs, it was likely for appropriate outputs to emerge. This sort of means-ends reasoning, of course, is far from absurd. Good inputs, in most realms, typically trigger good outputs. But because good inputs don’t always yield good outputs, changes began to take place in the way we appraised our schools.

A Shift to Output Variables

I suspect it was because of my early brush with the school-accreditation process that I have been attentive through the years to the ways accreditation associations have tried to separate school-wheat from school-chaff. What I’ve witnessed in the past half-century is a steady and decisive shift away from reliance on input variables and a clear move toward the evaluative use of output variables. Gingerly at first, and then with greater conviction, accreditation associations have begun asking schools to show what happens to students as a consequence of a school’s instruction. In short, schools have been asked to come up with evidence of students’ learning to show a school is doing a good instructional job.

Indeed, given the current preoccupation with output variables, it is accurate to characterize today’s school-accreditation process as definitely tilted in the direction of outputs and, more specifically, focused on outputs in the form of student’ test performances. I remember first encountering this shift toward output variables when schools began being accredited on the basis of a “school-improvement” evaluation model. In this approach to accreditation, a school’s staff first identified the kinds of improvements it hoped to bring about in students’ performances, then collected evidence regarding whether those intended outcomes had, in fact, been realized. This sort of school-improvement accreditation strategy clearly revolved around outputs, not inputs.

As I consider the most prominent approaches to the accreditation of schools these days, it seems apparent that the evaluative process now emphasizes what happens to students because of their instruction, and that the most common way to ascertain these effects is through the use of student assessment. This half-century accreditation shift—from looking at inputs to looking at outputs—is a whopping change or, in Hericlitean terms, a nontrivial flux phenomenon.

A Flaw in the Flux

The shift toward appraising schools on the basis of what their students learn instead of how those students have been taught reeks of good sense. Why gamble on whether good inputs yield appropriate outputs? Why not go directly to the outputs themselves? Nonetheless, a serious flaw lurks today in what otherwise represents a reasonable change in the way we evaluate our schools.

In most settings, the dominant form of output-evidence is the performance of students on governmentally imposed accountability tests. Other kinds of assessment data may be at hand, but by far the most important student achievement evidence used are students’ score on external accountability tests. In the U.S., those are the federally approved accountability tests used to satisfy the requirements of the No Child Left Behind Act (NCLB). High scores on a state’s annual NCLB tests signify successful schooling; low scores on those tests indicate the opposite. But are these accountability tests providing accurate evaluative evidence?

Regrettably, almost all of today’s accountability tests are unable to accurately distinguish between effective and ineffective instruction. That’s right, when students score well on these tests, those scores are apt to be more influenced by the composition of a school’s student body than by the quality of instruction those students have received. The vast majority of our current accountability tests, then, are instructionally insensitive. That is, they are unable to accurately detect the effectiveness with which students have been taught.

Roughly half of today’s accountability tests are traditional standardized achievement tests whose measurement mission is to compare test-takers’ performances. These exams often contain too many items closely linked to students’ socioeconomic status (SES) or to those students’ inherited academic aptitudes. (Such items optimize the “score-spread” so necessary for comparative score-interpretation.) Schools serving more affluent students, therefore, will tend to look good on such tests irrespective of the effectiveness with which students were taught. Schools serving less affluent students will tend to perform poorly on such test, no matter how stellar the school’s instruction might be. The remaining accountability tests are “standards-based” exams that attempt to measure students’ mastery of far too many state-approved curricular aims and, therefore, can assess only a sample of these too-numerous state curricular targets. Because many teachers who guess wrong about what’s to be tested each year soon give up on the usefulness of such tests, once more the chief determinant of students’ performances is what students bring to school, not how well the students are taught once they arrive.

Because instructionally insensitive accountability tests provide misleading data for the accreditation of schools, even a well-conceived accreditation strategy focused on outputs is certain to stumble whenever the wrong tests are used.

What To Do?

Faced with the prospect of being evaluated with the wrong kinds of accountability tests, a school’s faculty can do three things. First, an attempt should be made to replace instructionally insensitive accountability tests with instructionally sensitive ones. In Wyoming, this has been done for the state’s NCLB tests. Second, a serious assessment-literacy program should be mounted so that educators, parents, other citizens, and students discover why it is that some accountability tests are instructionally sensitive and some aren’t. And, finally, attempts should be made to supplement the data supplied by the wrong kinds of accountability tests with evidence from more appropriate tests generated at the state, district, or school level. In other words, a local school’s staff should employ multiple measures to indicate students’ achievement. Moreover, by using parents or members of the business community to score students’ responses to many of these tests, schools can provide a more credible picture of students’ actual achievements.

Hericlitus had it right. Change takes place constantly. With respect to the appraisal of schools, it is definitely time to change some of the previously made changes. 

William James Popham was a leading educator in the movement that promoted criterion-referenced measurements and was active and productive in the area of educational test development. His philosophy and beliefs about educational evaluation and assessment can be gleaned from his numerous writings. A partial listing of his textbooks and articles follows: Classroom Assessment: What Teachers Need to Know (1995); Educational Evaluation (3rd edition, 1993); Understanding Statistics in Education, with Kenneth A. Sirotnik (1992); and Criterion-Referenced Measurement (1978). See also the following articles by Popham: “Circumventing the High Costs of Authentic Assessment,” Phi Beta Kappan (February 1993); “Educational Testing in America: What’s Right, What’s Wrong? A Criterion-Referenced Perspective,” Educational Measurement (January 1993); “Two-plus Decades of Educational Objectives,” International Journal of Educational Research (January 1987); “Well-Crafted Criterion-Referenced Tests,” Educational Leadership (February 1978); and, with T.R. Husek, “Implications of Criterion-Referenced Measurement,” Journal of Educational Measurement (January 1969).