High-Stakes Testing: Opportunities and Risks

[1]

Jay P. Heubert, J.D., Ed.D.
Teachers College, Columbia University
Columbia Law School

This paper was written, in part, with support from the National Center on Accessing the General Curriculum, U.S. Department of Education, Office of Special Education Programs (OSEP). Opinions expressed herein are those of the author.

It will be published as a chapter in Pines, M., ed., The Continuing Challenge: Moving the Youth Agenda Forward (Policy Issues Monograph 00-02, Sar Levitan Center for Social Policy Studies). Baltimore, MD: Johns Hopkins University Press. It is posted with permission of the publisher.

A shorter version was published as "Graduation and promotion testing: Potential benefits and risks for minority students, English-language learners, and students with disabilities." (2000, September/October). Poverty and Race 9 (5): 1-2, 5-7. Washington, DC: Poverty and Race Research Action Council.

Introduction

The stated objective of the "standards" movement in American public education is to hold all schools, teachers and students to high standards of teaching and learning [2]. The movement reflects awareness that student proficiencies in literacy and mathematics largely determine success in school and employment (Murnane and Levy, 1996; Sum, 1999).

Accountability can take many forms. It is now common, for example, for schools and school districts to receive favorable or adverse publicity based on student test scores. In some states, school districts or schools are subject to specific rewards or sanctions based on student performance. This paper focuses not on school or teacher accountability but only on tests that have high stakes for individual students. They are "high-stakes" tests because they are used in making decisions about which students will be promoted or retained in grade and which will receive high-school diplomas.

Section 1 below briefly describes the growth and current scope of promotion and graduation testing in the United States. Section 2 explores current controversies regarding the likely effects of promotion and graduation tests on minority students (especially blacks, Latinos, and Native-Americans), low-SES students, English-language learners, and students with disabilities. While many agree that high-stakes testing will affect such students in significant ways, there are disputes over whether the effects will be beneficial or harmful. Section 3 describes some important and broadly accepted norms of appropriate test use, which, if observed, would reduce the negative effects of high-stakes testing. Section four describes some elements of a sound testing program.

Top

1. The extent of graduation and promotion testing in the U.S.

Graduation testing has gone through several stages of development in the U.S., and varies considerably from state to state. In the 1970s and 1980s, a number of states adopted requirements under which students had to pass "minimum competency tests" as a condition of getting high-school diplomas, even if the students had satisfied all other requirements for graduation. In the late 1980s and 1990s - responding in part to A Nation At Risk, a report that warned of "a rising tide of mediocrity" in American public education, and to the rise of today's "standards" movement, which emphasizes high standards for all students - some states replaced minimum competency tests with graduation exams measuring knowledge and skills at the tenth-grade level or higher. At present, about 23 states require students to pass graduation tests (American Federation of Teachers (AFT), 1999), up from eighteen in 1998 (National Research Council (NRC), 1999). The number is expected to increase to 29 by 2003 (Shore et al., 2000). Of the 23, fourteen now set graduation-test standards at the tenth-grade level or higher (AFT, 1999).

In response to concerns about "social promotion," a rapidly growing number of states - thirteen, about twice as many as a year ago - now require students to pass standardized tests as a condition of grade-to-grade promotion (AFT, 1999). In addition, many school districts, particularly in urban areas, have also adopted promotion-test policies. This means that large numbers of the nation's minority students and English-language learners are now subject to state or local promotion-test programs.

Further, under current federal law, students with disabilities and English-language learners - whom many states and school districts have traditionally exempted from large-scale assessments - must now be included in state and local testing programs, with accommodation and alternative assessment where necessary. To serve this objective, states and school districts must not only assess such students but also publish disaggregated data on their performance (Individuals with Disabilities Education Act, 1997; Improving America's Schools Act, 1994). Significantly, federal law takes no position on whether states and districts should use test results to determine whether individual students will receive high-school diplomas or be promoted to the next grade.

Top

2. Effects of high-stakes testing

Many researchers and practitioners believe that standards-based reform and high-stakes testing will have the greatest impact on blacks, Latinos, English-language learners, students with disabilities, and low-SES students. There are serious disputes, however, over whether promotion and graduation testing will help such students or hurt them.

Proponents of standards-based reform and high-stakes testing point out that these students are among those who are most often educated poorly, and who therefore have the most to gain from a movement whose central objective is to hold all schools, teachers and students to high standards of teaching and learning. Meanwhile, critics of high-stakes testing fear that many such children will be harmed by high-stakes tests: that they will disproportionately be retained in grade or denied high-school diplomas - both of which have highly negative consequences for students - because their schools do not expose them to the knowledge and skills that students need to pass the tests.

Both arguments are plausible and, as discussed below, both find support in the literature. The story is complex, however, and the evidence incomplete.

Even on graduation tests that measure basic skills, for example, minority students and students with disabilities usually fail at higher rates than other students, especially in the years after such tests are first introduced. For example, in the 1970s, when minimum competency tests gained popularity, 20 percent of black students, compared with 2 percent of white students - a discrepancy of ten to one - initially failed Florida's graduation tests and were denied high-school diplomas (Debra P. v. Turlington, 1979). And while many students with disabilities were excluded from state graduation-test programs (NRC, 1999), those who did participate failed at rates over 50 percent (McLaughlin, 2000).

For a variety of reasons, failure rates typically decline among all groups in the years after a new graduation test is introduced (Linn, 2000). This was true of the early minimum competency tests; after a few years, for example, black failure rates were far lower than 20 percent. It also appears to be true for graduation tests adopted more recently. Texas, for example, which has a graduation test set at the seventh- or eighth- grade level (Schrag, 2000), reports that pass rates of blacks and Latinos roughly doubled between 1994 and 1998, and that the gap in failure rates between whites, blacks, and Latinos narrowed considerably during that time (Viadero, 2000). Even so, 1998 data from the Texas graduation tests show continuing disparities: cumulative failure rates of 17.6 percent for black students, 17.4 percent for Hispanic students, and 6.7 percent for white students (Natriello and Pallas, 1999).

Data for students with disabilities are harder to find, but they show a similar pattern. On one hand, there is evidence that many students with disabilities do pass state tests in higher numbers over time (Ysseldyke et al., 1998); New York reports, for example, that the number of students with disabilities who passed the state's English Regents exam in 1998-99 was nearly twice as high as the number who took the exam two years earlier (Keller, 2000). On the other hand, 1998 data from fourteen states show gaps that remain quite high: Students with disabilities consistently fail state graduation tests at rates 35 to 40 percentage points higher than those for nondisabled students (Ysseldyke et al., 1998).

An important, largely unanswered question concerns the extent to which improved pass rates on graduation tests actually reflect improved teaching and learning on the part of teachers and students. Such improvements are plainly one explanation, and the most desirable one. During the 1980s, however, when many states reported sharply improved pass rates on graduation tests, scores on the National Assessment of Educational Progress (NAEP) - a highly regarded nationally administered examination - showed little or no improvement in student learning. Indeed, evidence that minimum competency tests were not producing improved student performance on the NAEP is one reason why the current standards movement emphasizes higher standards, and why some states have been raising graduation-test standards. More recent fourth and eighth grade NAEP scores suggest improvements in student mathematics performance - especially for black students, Latino students and low-SES students - during the period 1990-96, particularly in some states (including Texas and North Carolina) that invested heavily in smaller class sizes, preschool programs, and better resources for teachers (Grissmer et al, 2000). Gains reported on state tests continued to exceed the improvements measured by NAEP, however, and it is unclear to what extent improved fourth and eighth grade NAEP scores are due to high-stakes graduation testing rather than to the specific educational interventions just mentioned.

What factors other than improved achievement may explain increased pass rates on state tests? First, it is well known that scores on a test can increase as students become familiar with that test's format, "with or without real improvement in the broader achievement constructs that tests and assessments are intended to measure" (Linn, 2000: 4). Studies show that improvements on a state's tests may not be confirmed when students take other tests that supposedly measure the same knowledge and skills (Koretz et al., 1991; Koretz and Barron, 1998). In such circumstances, increases on state tests could be due in part to "teaching to the test," i.e., focus on subject matter and formats that appear on the test, and students become familiar with that test's format (Mehrens, 1998).

Second, some states may reduce high failure rates, actual or projected, by making the state graduation tests easier or by setting lower cutoff scores that students must achieve to pass. In New York, for example, failure rates on a state test dropped substantially after the state created a temporary "low pass" category for students who were below the state's original passing score. Similarly, increased pass rates in Texas may be due in part to changes in the test that made it easier for students to pass (Schrag, 2000).

Third, if low-achieving students are not part of the test-taking population, then the pass rates of those who remain will be higher - even if the achievement of those who actually take the test has not improved. Thus, reported pass rates should be viewed in the context of such factors as (a) dropout rates; (b) whether states count among dropouts, or include in computing graduation rates, students who choose (or are even encouraged) to leave school to pursue general equivalency diplomas [3]; (c) exemptions of students with disabilities or English-language learners from the test-taking population, which are far higher in some states than in others (Ysseldyke et al., 1998) [4]; and (d) improper testing accommodations that may artificially inflate some students' scores (Sack, 2000; Allington, 2000).

Not surprisingly, there is also a spirited debate about whether graduation testing causes increased dropout rates. On one hand, it appears that many low-achievers start to disengage from school well before graduation tests loom. On the other hand, there are reputable scholars who argue - credibly - that fear of failing a graduation test increases the likelihood that low achievers will leave school (Clarke et al., 2000) [5]. Also, the current climate of accountability places new pressures on schools to increase student pass rates, which in turn can lead to increased and/or understated dropout rates (Schrag, 2000). Unfortunately, this critical issue is complicated by a lack of uniformity among the states in defining and counting dropouts (Viadero, 2000).

Even as these debates continue, other developments are fundamentally changing the landscape. One such development, already noted, is that some states are raising the bar: setting higher standards on state graduation exams. The most ambitious states are adopting graduation tests that reflect "world-class" standards such as those embodied in NAEP.

Based on national NAEP data, about 38 percent of all students would fail tests that reflect such "world-class" standards if they were administered today (Linn, 2000).

For minority students and English-language learners, moreover, there is clear evidence that failure rates on tests embodying "world-class" standards would be extremely high - about 80 percent [6] - at least at first. These predictions are consistent with recent data from Massachusetts, where students have begun taking graduation tests that reflect "world-class" standards. [7] For students with disabilities, it is also reasonable to assume that initial failure rates on such tests would also be very high: in the 75 to 80 percent range. [8]

Second, the proliferation of large-scale promotion testing, which is especially pronounced in large, urban school districts (AFT, 1999), has led to sharply higher rates of retention in grade, especially for black students, Latino students, and English-language learners. In New York City, Chicago, and other cities, hundreds of thousands of students, the vast majority black, Latino, and/or English-language learners, have failed promotion tests and been retained in grade, and it is reasonable to expect that students with disabilities would also be retained in large numbers.

The single strongest predictor of whether students will drop out of school is whether they have been retained in grade. The rapid growth of promotion testing, particularly in our large cities, is therefore likely to create an increasingly large class of students - disproportionately comprised of blacks, Latinos, English-language learners, students with disabilities, and low-SES students - who are at increased risk of dropout by virtue of having been retained in grade one or more times. Those retained in grade even once are much likelier to drop out later than are students not retained, and the effects are even greater for students retained more than once (NRC, 1999; Hauser, 1999; Shepard and Smith, 1989).[9] Moreover, much of the increase in dropout rates shows up only years later, and the harm is thus largely invisible at the time retention occurs. In this sense, retention in grade is somewhat like high blood pressure.

Promotion testing is thus likely to increase, perhaps significantly, the numbers of students who suffer the serious consequences of dropping out. [10] It is also likely to reduce the numbers of students who remain in school long enough to take graduation tests. It would be unfortunate - and hardly evidence of success - if states, school districts, or schools achieved high pass rates on graduation tests because large numbers of low achievers had already left school and were no longer among the test takers. Given the relationships between promotion testing, retention in grade, and increased dropout rates, promotion-testing policies warrant closer attention than they have received thus far.

Promotion and graduation testing may also have unintended consequences for teachers. As noted above, high-stakes testing is intended to raise teacher motivation and effectiveness, and there is evidence that with appropriate professional development, support, resources, and time teaching effectiveness can improve significantly (Elmore, 2000). There is already evidence, however that the negative publicity associated with poor test scores can lead experienced teachers to leave urban schools for the suburbs (See, e.g., Lee, 1998). Plainly, efforts to improve low-performing urban schools - and to educate all children effectively - will be undermined if those schools lose strong teachers.

As noted above, policies that lead to improved teaching and learning are likely to benefit minority students, English-language learners, and students with disabilities even more than they do other students. In New York, Education Commissioner Richard Mills defends stringent graduation-test requirements partly because he hopes they will bring an end to low-track classes, in which students - most of them black students, Latino students and/or English-language learners - typically receive poor quality, low-level instruction. This position is grounded in solid evidence that placement in typical low-track classes is educationally harmful for students (NRC, 1999; Oakes, Gamoran and Page, 1992), and that students will learn more if they are placed in more demanding classes (NRC, 1999; Weckstein, 1999).

Advocates for minority children and low-SES children hope that high standards will provide the political and legal leverage needed to improve resources and school effectiveness so that all children receive the high-quality instruction they need to be able to meet demanding academic standards. Disability-rights groups likewise hope that state standards and tests will drive teachers to upgrade the individualized education programs (IEPs) of students with disabilities, so that IEPs reflect more of the knowledge and skills that nondisabled students are expected to acquire - and here, too, there is evidence that higher expectations and improved instruction lead to improved achievement (Individuals With Disabilities Education Act, 1997; Ysseldyke et al., 1998). Moreover, some proponents of high-stakes testing argue that the fear of negative consequences - retention or diploma denial for students, negative publicity and (in rare instances) adverse personnel action for educators - can a positive force, one that increases the motivation of teachers to teach and students to learn.

Top

3. Standards of appropriate test use: widely accepted, sometimes ignored

Whether graduation testing helps or hurts low achievers depends largely on whether such tests are used to promote high-quality education for all children - the stated objective of standards-based reform - or to penalize students for not having the knowledge and skills that they have not been taught in school.

This is the principal theme that Education Secretary Richard Riley, a strong proponent of standards-based reform, emphasized in his February 22, 2000 "State of American Education" address. Riley called for a "midcourse review" of the standards movement, a step he said was needed "because there is a gap between what we know we should be doing and what we are doing" (Riley, 2000: 6).

Specifically, Secretary Riley said that state standards should be "challenging but realistic….[Y]ou have to help students and teachers prepare for these [high-stakes] tests - they need the preparation time and resources to succeed, and the test must be on matters that they have been taught" (Riley, 2000: 7). He also advised states not to rely on any single measure of students' knowledge in making high-stakes decisions: "All states should incorporate multiple ways of measuring learning" (Riley, 2000, 6).

Not coincidentally, perhaps, these concerns are also reflected in norms of appropriate test use that the testing profession, the National Research Council, and the American Research Association (AERA) have articulated, and only quite recently. The Standards for Educational and Psychological Testing, issued in December 1999 by the American Educational Research Association, the American Psychological Association, and the National Council on Measurement in Education (and referred to here as the Joint Standards), assert that promotion and graduation tests should cover only the "content and skills that students have had an opportunity to learn" (AERA, APA, and NCME, 1999: 146, Standard 13.5). The Congressionally mandated NRC study, High Stakes: Testing for Tracking, Promotion, and Graduation reached a similar conclusion in 1999: "Tests should be used for high-stakes decisions…only after schools have implemented changes in teaching and curriculum that ensure that students have been taught the knowledge and skills on which they will be tested" (NRC, 1999). So does the AERA, which, in a July 2000 Policy Statement Concerning High Stakes Testing, recommends the following "condition[] essential to sound implementation of high-stakes educational testing programs": "When content standards and associated tests are introduced as a reform to…improve current practice, opportunities to access appropriate materials and retraining consistent with the intended changes should be provided before…students are sanctioned for failing to meet the new standards" (AERA, 2000: 2).

Unfortunately, there are often discrepancies between what high-stakes tests measure and what students have been taught. Results of a recent ten-state study led by Andrew Porter suggest that there is surprisingly little overlap between a state's standards and what teachers in the state say they are actually teaching students. The actual overlap ranged from a low of from 5 percent to a high of 46 percent, depending on the subject, grade level, and state (Boser, 2000). If these states use promotion or graduation testing, or are representative of practice elsewhere in the U.S., then some states and school districts appear to be using promotion and graduation tests in a manner inconsistent with widely accepted norms of appropriate test use. Moreover, such discrepancies are likely to be particularly high where minority students, English-language learners, and students with disabilities are concerned,[11] and where students are expected to master "world-class" standards.[12]

Similarly, as noted above, increasing numbers of states and school districts automatically deny promotion or high-school diplomas to students who fail state or local tests, regardless of how well the students have performed on other measures of achievement, such as course grades. Secretary Riley is not alone in believing that states and school districts should weigh information other than test scores in making high-stakes decisions about promotion and graduation. The NRC study (1999: 279) emphasizes that educators should always buttress test score information with "other relevant information about the student's knowledge and skills, such as grades, teacher recommendations, and extenuating circumstances" when making high-stakes decisions about individual students. This is also consistent with the testing profession's Joint Standards, which state that "in elementary or secondary education, a decision or characterization that will have a major impact on a test taker should not automatically be made on the basis of a single test score. Other relevant information… should be taken into account if it will enhance the overall validity of the decision" (APA, AERA, and NCME, 1999: 146, Standard 13.7). Similarly, the AERA Policy Statement (AERA, 2000: 2) provides that "[d]ecisions that affect individual students' life chances or educational opportunities should not be made on the basis of test scores alone. Other relevant information should be taken into account to enhance the overall validity of such decisions."

Why is it so important to use multiple measures in making important decisions about individuals? The answer is that any single measure is inevitably imprecise and limited in the information it provides. Proponents of high-stakes testing sometimes point out the problems associated with exclusive reliance on student grades in making promotion and graduation decisions: there has been considerable grade inflation during the last three decades, for example, and there is considerable variation between teachers, schools, and school districts in what particular grades mean. They are right. But that does not mean that they should be ignored altogether.

For standardized tests, like grades, are limited in what they measure. It is well known, for example, that grades are a far better measure than standardized tests of f student motivation over time, a factor critical to later success in school and in the workplace. Moreover, as these examples illustrate, even the best standardized tests are far less precise than most people realize:

  • First, what are the chances that two students with identical "real achievement" will score more than 10 percentile points apart on the same Stanford 9 test? For two ninth graders who are really at the 45th percentile in math, the answer is 57 percent of the time. In 4th grade reading, the probability is 42 percent.
  • Second, how often will a student who really belongs at the 50th percentile according to national test norms actually score within 5 percentile points of that ranking on a test? The answer is only about 30 percent of the time in mathematics and 42 percent in reading (Viadero, 1999: 3, citing Rogosa, 1999).


Given the imprecision of grades and test scores, judgments based on combinations of both are more accurate and reliable than those based on either by itself. To use either one by itself when both are readily available is like telling one's physician to conduct a physical exam relying only on a thermometer or only on a single blood test. Unfortunately, as Secretary Riley noted, "there is a gap between what we know we should be doing and what we are doing." This is the case in the many states and school districts that make promotion or graduation decisions relying solely on student test scores. Such practices, though widespread, do not seem consistent with norms of appropriate test use.

To complicate matters, there is at present no satisfactory mechanism for ensuring that states and school districts respect even widely accepted norms of appropriate, nondiscriminatory test use. The two existing mechanisms - professional discipline through the associations that produce the Joint Standards, or legal enforcement through the courts or administrative agencies - have complementary shortcomings. Professional associations such as the American Educational Research Association, the American Psychological Association, and the National Council on Measurement in Education have detailed standards, but lack mechanisms for monitoring or enforcing compliance with those standards. For courts and federal civil-rights agencies, the reverse is true; they have complaint procedures and enforcement power, but lack specific, legally enforceable standards on the appropriate use of high-stakes tests. Recognizing the problem, the U.S. Department of Education's Office for Civil Rights has released a draft resource guide that, while not legally binding, aims to promote appropriate use of high-stakes tests.[13]

Top

4. Elements of a sound testing policy

Given these concerns, what are some elements of a sound high-stakes testing policy within the larger context of standards-based reform? First, states should adopt standards for what students should know and be able to do. And while such standards continually evolve, this is something virtually all the states have done (AFT, 1999). Second, policymakers and educators should strive to bring the curriculum into alignment with the state's standards; according to Lauren Resnick, a national leader of the standards movement, many states are still experiencing problems at this stage.[14] A third step is to bring actual instruction into line with the state standards and curriculum. This objective is a challenging one, requiring substantial investments in staff development. Teachers - and administrators, who are increasingly called upon to serve as instructional leaders (Elmore, 2000) - need considerable training, about how to enact the new curriculum, how to identify the aspects of the curriculum that create problems for students, and how best to address students' learning needs. Some schools will also have to upgrade facilities. Here, too, there is evidence of major gaps (Natriello, 1998), including Andrew Porter's recent findings (Boser, 2000) about the limited overlap between state standards and what teachers say they teach.

Note that the steps mentioned thus far do not mention high-stakes testing. There is no reason why states cannot use large-scale assessments to help drive changes in curriculum and instruction, and many states do. But the Joint Standards, the 1999 NRC study, and the July 2000 AERA Policy Statement all assume that the preceding measures will be in place before such instruments become high-stakes tests for students. As noted above, all three say that tests should be used to decide whether individual students will be promoted or given high-school diplomas only after students have been taught the kinds of knowledge and skill that the tests measure. This is not the situation in every state. Often, graduation testing and promotion testing precede the alignment of curriculum and instruction with state standards (Elmore, 2000), and in many cases the tests are not well aligned with state standards: "There is little evidence to suggest that exit exams in current use have been validated properly against the defined curriculum and actual instruction; rather, it appears that many states may not have taken adequate steps to validate their assessment instruments, and that proper studies would reveal important weaknesses" (NRC, 1999: 179, citing Stake, 1998).

The Joint Standards (1999), the NRC study (1999), and the AERA Policy Statement (2000) describe measures a state or school district should take if it elects to use tests for high-stakes purposes. One, just noted, is not to use tests for high-stakes purposes until schools are actually teaching students the relevant knowledge and skills. Second, test users should make sure that a high-stakes test valid for its intended purpose. This may sounds obvious, but it is not something every test user does. Chicago, for example, has gotten national publicity for its use of the Iowa Test of Basic Skills (ITBS) in making promotion decisions, but the district's chief accountability officer has candidly acknowledged that the ITBS is not valid as a measure of which students should be promoted or held back. Third, test developers should take students with disabilities, English-language learners, minority students and other groups into account beginning with initial test development, and should take steps to ensure that the test is equally valid for all major student populations that will take it (NRC, 1999; AERA, APA, NCME, 1999; AERA, 2000).

Fourth, test users should not rely solely on test-score information in making promotion and graduation decisions (NRC, 1999; AERA, APA, NCME, 1999; AERA, 2000). Instead, as colleges do, states and school districts should look at multiple measures of student achievement and readiness, and allow high achievement on one measure to balance lower performance on another.

Further, some states measure not only absolute achievement - in the form of a percentage of students passing a test - but also improvement over time (i.e. higher percentages of students passing a test). And some states measure whether school districts or schools are succeeding in closing the gap between high-achieving and low-achieving students. Each of these measures adds something important. An absolute standard signals that schools set high expectations for all students rather than lower expectations for some. A standard based on improvement recognizes that different students, schools, and school districts start out at different places, and rewards progress. A standard based on whether schools are closing the achievement gap - between white students and minority students, between nondisabled students and students with disabilities, between native English speakers and English-language learners - encourages schools to pay more attention to these very important goals.

Fifth, a test use is inappropriate unless it leads to the best available treatment or placement for students (NRC, 1999). This means that states and school districts should refrain from using test scores (or other information) to justify educational decisions that are demonstrably harmful to students. Based on the weight of research evidence, two placements or treatments that typically harm students are retention in grade and placement in typical low-track classes (NRC, 1999; Hauser, 1999; Oakes, Gamoran, and Page, 1992). Retention and low-track placements are inimical to the goal of helping all students reach high levels of achievement. Both are inconsistent with principles of appropriate test use.

Sixth, the debate over high-stakes testing often frames the issue in "either-or" terms: Either we promote a student who is not ready or we retain him in grade. Either we give someone a diploma or we deny a diploma. Neither alternative is attractive, of course, but there is almost always another, better, approach. Any information schools can use to make a promotion or graduation decision can be used years earlier - before the "gate" is reached - to determine which children are performing poorly and to help get them the support they will need to be able to meet high standards. Teachers typically know, long before a promotion or graduation test, which students will need help if they are to pass. Effective early intervention is critical, as recent research shows (Grissmer et al., 2000).

Seventh, tests by themselves do not improve learning, any more than a thermometer reduces fever. At best, good tests provide information. It is important that this information, along with information from other sources, be available - in an understandable form - to policymakers, educators, parents and students. And it is equally important for all concerned to know which policies and practices are likeliest to produce improved teaching and learning (Elmore, 2000; Grissmer et al, 2000). Educators and parents also need access to the resources that it takes to make the necessary changes in teaching and learning. Unfortunately, it is well known that many school districts and schools lack resources they need to enable all children to reach high levels of achievement (National Academy of Education, 1995; NRC, 1999).

Last but not least, these questions all call for additional research: on what interventions work, on how treatments effective in some settings can be implemented widely, and, not least, on how high-stakes testing policies affect student learning and dropout rates, for students generally and for such important groups as students of color, English-language learners, and students with disabilities.[15]

Top

In conclusion

The standards movement and high-stakes testing present both opportunities and risks to students of color, English-language learners, and students with disabilities. These students are among those who stand to benefit most if all students receive high-quality instruction. Such students are also at great risk, however, especially in states that administer high-stakes promotion and graduation tests before having made the improvements in instruction that will enable all students to meet the standards. Even failure rates far below 75 to 80 percent are plainly unacceptable, for these students and for our entire society.

Educating all students to high levels is something no society has achieved to date, and reaching that objective will obviously be no simple matter. Promotion and graduation tests are one part of this picture, and there are those who question the necessity or desirability of such testing even as it becomes more widespread.

One thing is clear, however: If states and school districts are going to use high-stakes testing, then it is critical that such testing be done well. The basic principles of appropriate test use are relatively clear and enjoy broad support among researchers (NRC, 1999; AERA, APA, NCME, 1999; AERA, 2000).

States and school districts that disregard these principles put their students at risk - and also themselves. The prospect of high failure rates has already produced a political backlash against some states' high-stakes testing programs, and lawsuits are also likely, if only because there exist no alternatives by which to ensure appropriate use of tests that affect students' life chances in such important ways.

The stakes are high.

Top

References

Allington, R. (2000). "Letters: On Special Education Accommodations." Education Week 19 (35): 48.

American Education Research Association (2000). AERA Position Statement Concerning High-Stakes Testing in PreK-12 Education. Available: http//www.aera.net.about/policy/stakes.htm

American Educational Research Association, American Psychological Association, and National Council on Measurement in Education (1999). Standards for Educational and Psychological Testing. Washington, DC: American Psychological Association.

American Federation of Teachers (1999). Making Standards Matter 1999. Washington, DC: American Federation of Teachers.

American Federation of Teachers (1998). Making Standards Matter 1998. Washington, DC: American Federation of Teachers.

Boser, U. (2000). Teaching to the test? Education Week 19 (39), pp. 1, 10.

Clarke, M., W. Haney, and G. Madaus (2000). "High Stakes Testing and High-School Completion." Boston: National Board on Educational Testing and Public Policy 1 (3), 1-11.

Council of Chief State School Officers (1999). Trends in State Student Assessment Programs. Washington, DC: Council of Chief State School Officers.

Cronbach, L. J. (1971). Test validation. In Educational Measurement, 2d Edition, R. L. Thorndike, ed. Washington, DC: American Council on Education.

Debra P. v. Turlington, 474 F. Supp. 244 (M.D. Fla. 1979); aff'd in part and rev'd in part, 644 F.2d 397 (5th Cir. 1981); rem'd, 564 F. Supp. 177 (M.D. Fla. 1983); aff'd, 730 F.2d 1405 (11th Cir. 1984).

Elmore, R. (2000). Building a New Structure For School Leadership. Washington, DC: The Albert Shanker Institute.

Grissmer, D. (2000). Improving Student Achievement: What State NAEP Scores Tell Us. Santa Monica, CA: Rand.

GI Forum v. Texas Education Agency, 87 F. Supp. 2d 667 (W.D. Tex. 2000).

Harvard Educational Review. (1994). Symposium: equity in educational assessment. Harvard Educational Review: 64 (1).

Hauser, R. (1999). "Should We End Social Promotion? Truth and Consequences." In Orfield, G. and M. Kornhaber, eds., Raising Standards or Raising Barriers? Inequality and High Stakes Testing in Education. New York: The Century Fund.

Improving America's Schools Act of 1994, 20 U.S.C. sections 6301 et seq.

Individuals with Disabilities Education Act, 20 U.S.C. section 1401 et. seq. (1997).

Keller, B. (2000). "More N.Y. Special Education Students Passing State Tests." Education Week 19 (31): 33.

Kober, J. and M. Feuer (1996). Title I Testing and Assessment: Challenging Standards for Disadvantaged Children. Washington, DC: National Academy Press.

Koretz, D., R. Linn, S. Dunbar, and L. Shepard (1991). "The Effects of High-Stakes Testing on Achievement: Preliminary Findings About Generalization Across Tests." Paper presented at the annual meeting of the American Educational Research Association, Chicago, IL.

Koretz, D. and S. Barron (1998). The Validity of Gains on the Kentucky Instructional Results Information Systems (KIRIS). Santa Monica, CA: Rand.

Lee, J. (1998). "Using High Stakes Test Results to Give Disadvantaged Kids Access to Outstanding Responsive Teachers." Paper presented at the Harvard Civil Rights Project/Teachers College Conference on High-Stakes Testing and Civil Rights, December 4, 1998, New York.

Linn, R. (2000). "Assessments and accountability." Educational Researcher 29 (2), 4-16.

McLaughlin, M. (2000). "High Stakes Testing and Students with Disabilities." Presentation at the National Research Council Conference on the Role of the Law in Achieving High Standards for All. Washington, DC, June 30.

McNeil, L. and A. Valenzuela (2000). "The Harmful Impact of the TAAS System of Testing in Texas: Beyond the Accountability Rhetoric." In Orfield, G. and M. Kornhaber, eds., Raising Standards or Raising Barriers? Inequality and High Stakes Testing in Education. New York: The Century Fund.

Mehrens, W. A. (1998). Consequences of Assessment: What is the Evidence? Vice Presidential Address for Division D, annual meeting of the American Educational Research Association, San Diego.

Murnane, R. and F. Levy (1996). Teaching the New Basic Skills. New York: The Free Press.

National Academy of Education (1995). Improving Education Through Standards-Based Reform, M. McLaughlin, L. Shepard, and J. O'Day, eds. Washington, DC: National Academy of Education.

National Research Council, Heller, K.A., W.H. Holtzman, and S. Messick, eds. (1982). Placing Children in Special Education: A Strategy for Equity. Committee on Child Development Research and Public Policy, National Research Council. Washington, DC: National Academy Press.

National Research Council, Heubert, J., and R. Hauser, eds. (1999). High Stakes: Testing for Tracking, Promotion, and Graduation. Committee on Appropriate Test Use. Washington, DC: National Academy Press.

Natriello, G. (1998). The New Regents High School Graduation Requirements: Estimating the Resources Necessary to Meet the New Standards. New York: The Community Service Society.

Natriello, G. and A. Pallas (1999). The Development and Impact of High Stakes Testing. Paper presented at the Conference on Civil Rights Implications of High-Stakes Testing, sponsored by the Harvard Civil Rights Project, Teachers College, and Columbia Law School.

Oakes, J., A. Gamoran, and R. Page (1992). "Curriculum Differentiation: Opportunities, Outcomes, and Meanings. Jackson, P., ed., Handbook of Research on Curriculum. New York: MacMillan Publishing Company.

Reese, C. M., Miller, K. E., Mazzeo, J., and Dossey, J. A. (1997). NAEP 1996 Mathematics Report Card for the Nation and the States. Washington, DC: National Center for Education Statistics.

Riley, R. W. (2000). Setting New Expectations. Seventh Annual State of American Education Address. Paper presented at Southern High School (Durham, NC, February 22, 2000).

Sack, J. (2000). "Researchers Warn of Possible Pitfalls in Spec. Ed. Testing." Education Week 19 (32): 12

Schrag, P. (2000). "Too Good to Be True." The American Prospect 4 (11), 46.

Shepard, L.A. (1993). "Evaluating Test Validity." In L. Darling-Hammond (ed.), The Review of Research in Education, 19, 405-450.

Shepard, L. A. and M. L. Smith, eds. (1989). Flunking Grades: Research and Policies on Retention. London: Falmer Press.

Shore, A., G. Madaus, and M. Clarke (2000). Guidelines for Policy Research on Educational Testing. Boston: National Board on Educational Testing and Public Policy 1 (4). 1-7.

Stake, R. (1998, July). "Some Comments on Assessment in U.S. Education." Educational Policy Analysis Archives [on-line serial], 6 (14). Available http://epaa.asu.edu/epaa/v6n14.html

Sum, A. (1999). Literacy in the Labor Force. Washington, DC: National Center for Education Statistics.

Title VI, Civil Rights Act of 1964, 42 U.S.C. section 2000(d).

Title VI regulations, 34 C.F.R. sections 100 et seq.

Viadero, D. (1999). "Stanford Report Questions Accuracy of Tests." Education Week 19 (6): 3.

Viadero, D. (2000). "Testing System in Texas Yet to Get Final Grade." Education Week, May 31: 1.

Weckstein, P. (1999). "School Reform and Enforceable Rights to an Adequate Education." In Law and School Reform: Six Strategies for Promoting Educational Equity, J. Heubert, ed. New Haven: Yale University Press.

Wilgoren, J. (2000). "National Study Examines Reasons Why Pupils Excel." New York Times, July 26: 14.

Ysseldyke, J. E., M. L. Thurlow, K. L. Langenfeld, J. R. Nelson, E. Teelucksingh, and A. Seyfarth. (1998). Educational Results for Students with Disabilities: What Do the Data Tell Us? Minneapolis, MN: National Center on Educational Outcomes.

Top

Footnotes

1. This chapter will appear in Pines, M., ed.. (forthcoming, fall 2000), The Continuing Challenge: Moving the Youth Agenda Forward (Policy Issues Monograph 00-02, Sar Levitan Center for Social Policy Studies). Baltimore, MD: Johns Hopkins University Press.

2. In principle, standards-based reform has three key elements: (1) state standards that identify what students should know and be able to do, (2) efforts to align teaching and learning with the state standards, and (3) student assessments, also aligned with the state standards, the results of which are used to hold school systems, schools, educators and students "accountable" for improvements in teaching and learning (Elmore, 2000).

3. It is well known that the general equivalency diploma, or GED, has far less value than a regular high-school diploma in terms of an individual's future opportunities for education or employment.

4. In 1998, for example, New York and Massachusetts included over 90 percent of students with disabilities in their state assessment programs, compared with 50 percent in Texas (Ysseldyke, et al., 1998).

5. Such fears are presumably greater in states where graduation-test standards are higher.

6. These estimates are based on the proportion of students scoring below "basic" on the NAEP. For example, in 1996, 40 percent of students taking the eighth grade math test scored below "basic," and in the District of Columbia public schools roughly 80 percent scored below "basic" (Linn, 2000, citing Reese et al., 1997).

7. In Massachusetts, roughly 40 percent of white students failed the "MCAS" in 1999, compared with 80 percent of black students and 82 percent of Hispanic students. Passing the MCAS is not now required for graduation, but soon will be.

8. As noted earlier, students with disabilities consistently fail state tests at rates 35 to 40 percentage points higher than those for nondisabled students (Ysseldyke et al., 1998). If the failure rate for nondisabled students is 38 percent, the estimated failure rate for students with disabilities would be in the range of 75 to 80 percent.

9. Retention has other negative consequences as well. Strong evidence indicates that retained students are less well off academically and socially than similar low-performing students who are promoted (NRC, 1999; Hauser, 1999; Shepard and Smith, 1989).

10. These include sharply reduced earnings, reduced prospects for employment and further education, and significantly increased risk of involvement with the criminal justice system.

11. Minority students are often overrepresented among those who do not receive high quality curriculum and instruction, including those assigned to low-track classes. There are also many students with disabilities whose IEPs do not ensure that students receive the instruction they need to pass large-scale promotion and graduation tests, partly because such students have not traditionally been included in large-scale assessment programs. Similarly, many English-language learners have not had the opportunity to acquire the subject-matter knowledge or the levels of English proficiency they need to pass such tests.

12. In most of the nation, much needs to be done before world-class curriculum and instruction will be in place (National Academy of Education, 1995).

13. The draft, dated June 6, 2000, is entitled The Use of Tests When Making High-Stakes Decisions for Students: A Resource Guide for Educators and Policymakers. It draws heavily on the Joint Standards and the 1999 NRC study.

14. Personal conversation with Lauren Resnick, January 7, 2000, Washington, DC.

15. As the NRC study (1999: 281) notes, "[h]igh-stakes testing programs should routinely include a well-designed evaluation component. Policymakers should monitor both the intended and unintended consequences of high stakes assessments on all students and on significant subgroups of students, including minorities, English-language learners, and students with disabilities."

Top

This content was developed pursuant to cooperative agreement #H324H990004 under CFDA 84.324H between CAST and the Office of Special Education Programs, U.S. Department of Education. However, the opinions expressed herein do not necessarily reflect the position or policy of the U.S. Department of Education or the Office of Special Education Programs and no endorsement by that office should be inferred.

Citation

Cite this page as

Heubert, J. P. (2002). High-stakes testing: opportunities and risks for students of color, English-Language Learners, and students with disabilities. Wakefield, MA: National Center on Accessing the General Curriculum. Retrieved [insert date] from http://www.cast.org/publications/ncac/ncac_highstakes.html