Free AccessEditorial

On the Importance of Educational Tests

Stephen G. Sireci

Center for Educational Assessment, College of Education, University of Massachusetts, Amherst, MA, USA

Search for more papers by this author

and

Samuel Greiff

Cognitive Science and Assessment (COSA), University of Luxembourg, Luxembourg

Search for more papers by this author

Published Online:August 12, 2019https://doi.org/10.1027/1015-5759/a000549

Educational tests are an indispensable component in virtually all educational systems across the world for evaluating educational achievement. At the same time, they face regular backlash by teachers, policy makers, educationalists, and scientists alike. Sometimes they are even considered the enemy of good education. We build on this debate and provide explanations for these opposing views, where they come from, how they can be resolved, and, most importantly, how educational tests can be integrated as important tool for improving learning and instruction.

A Common Enemy?

It has been said that nothing brings people closer together than a common enemy. In education, teachers, students, parents, and even the media, have found camaraderie in the common enemy called standardized testing. This archetypal portrayal of educational tests as evil is unfortunate, because standardized tests are actually designed to improve education by providing unbiased and accurate information about how well our children are doing in school. Thus, current calls to “abandon standardized tests” or to encourage students to “opt out” (Hammond, 2015; Rowland-Woods, Wixom, & Aragon, 2015) of tests essentially suggest we bury our heads in the sand and ignore information about how well our children are doing. Given the long history of educational tests, and their ubiquity, an analysis of their criticisms and their benefits might help to move the current dialog from “How can we get rid of standardized tests?” to “How can we properly use educational tests to maximize their benefits?” As scholars in educational and psychological research, we believe educational tests, if used properly, are key components in any high-quality educational system.

What Is a Standardized Test?

In testing, the term “standardized” simply means the test content and administration conditions are the same for everyone, as are the scoring and score reporting processes. All tests that follow these conditions are standardized, whether they are described as performance assessments, multiple-choice tests, diagnostic tests, and so forth. Thus, “standardized” merely refers to the fact that all test takers are given the same test under the same conditions. Contrary to popular belief, the goal of standardization is fairness. “Standard” conditions provide a level playing field for all who take the test. Arguing against standardized tests, therefore, is tantamount to arguing against objectivity.

Different Tests for Different Purposes

In education, standardized tests fall into two general categories: “norm-referenced” and “criterion-referenced.” In norm-referenced testing, a person’s test score is compared to (i.e., referenced to) the performance of other people who took the same test. These other people are the “norm group,” which typically refers to a carefully selected sample of people who previously took the test. Norm-referenced tests often report results in percentiles, such as “Irma’s performance was at the 93rd percentile,” which means she performed as good as or better than 93% of the people in the norm group who took the test. Although such information is helpful for determining how well Irma did compared to others, it does not tell us how well she did with respect to the knowledge and skills the test was measuring. For example, if the norm group consisted of primarily low-achieving students, Irma’s high rank may not be very impressive.

Criterion-referenced tests, on the other hand, describe what a person can or cannot do with respect to the specific skills measured on a test. Thus, examinees’ scores are interpreted in terms of “mastery” of the knowledge and skills tested. For example, most licensure tests determine whether candidates pass or fail based on how well they answer the test questions, not based on whether they did better than others. Thus, 100% of the examinees who take a criterion-referenced test can pass or fail, because their performance is referenced to the content tested, not to other examinees. It is not enough for a medical school student to score in the top 50% of test takers, for example. Her performance on the test needs to illustrate that she has the skills to safely practice medicine.

Achievement tests administered in public schools are other examples of criterion-referenced tests. They allow parents and teachers to determine whether their children are learning what was taught. As part of the “No Child Left Behind Law” in the United States, for example, all public school students are classified into achievement levels such as “proficient” or “advanced” depending on how they performed on statewide achievement tests. These achievement levels are criterion-referenced descriptions of how well students have mastered the material they were to be taught. The score reports from these tests provide millions of parents across the USA with useful, independent information about their children’s’ performance in school. Take away these tests, and all parents have left are the report cards created by their teachers. Report cards are certainly useful, but why revolt against the additional and more comprehensive information provided by standardized tests?

Benefits of Educational Tests

Norm- and criterion-referenced tests both provide useful information. Norm-referenced tests are useful when there are many people competing for a limited number of jobs or openings in a school, or a special program. In such cases, employers, admissions officers, and others need a way of selecting people based on objective, rather than subjective, criteria. College admissions tests such as the ACT and SAT are examples of norm-referenced tests used for admissions decisions. Businesses also frequently use employment tests to select or screen job applicants.

Aside from admissions testing, criterion-referenced tests are more common in most educational settings. For example, many states use tests in math, reading, and science as requirements for high school graduation. Teachers and other experts set the standards for passing these exams based on their expectations about the level of performance on the test that signifies competence in the subject area. Such tests build confidence in students by certifying they have successfully achieved academic goals; the tests also demonstrate schools are not simply graduating students without ensuring they have the knowledge and skills needed for success beyond school. In general, criterion-referenced tests provide important information to educators (and parents) about what students have learned and how well they learned it. For this reason, even norm-referenced tests are incorporating criterion-referenced information into their reporting practices. “College readiness” benchmarks set on the ACT and SAT are two examples of this phenomenon.

International assessments are other examples of the information educational tests can provide, and are highly relevant form an educational policy perspective. For example, the Trends in Mathematics and Science Study (TIMSS) ranks countries based on their performance, but also report the percentages of students who meet proficiency “benchmarks.” The 2015 TIMSS results showed that 4th grade students in the United States finished 14th out of 49 countries (norm-referenced information), but that only 47% of US 4th-graders met the benchmark of “high” performance on the test (TIMSS & PIRLS International Study Center, 2015). Similarly, the 2015 PISA Reading study found that 15-year olds from the USA and Luxembourg finished 24th and 36th out of 70 countries, respectively (OECD, 2016), and that 81% of USA students and 78% of Luxembourg students met the Level 2 proficiency benchmark “at which students begin to demonstrate the reading skills that will enable them to participate effectively and productively in life.” (OECD, 2016, p. 164). Similarly, group results on tests make us aware of achievement gaps and other examples of where educators are falling short of their goals. Without test results, such gaps remain hidden from public debate.

Limitations and Dangers

Educational tests can provide important benefits, but they can also lead to adverse effects, if they are misused. The greatest danger in educational testing is using a test for a purpose other than that for which is was designed. For this reason the Standards for Educational and Psychological Testing define “validity” as “the degree to which evidence and theory support the interpretations of test scores for proposed uses of tests.” (American Educational Research Association, American Psychological Association, & National Council on Measurement in Education, 2014, p. 11). Therefore, it should be clear that an educational test is not inherently “good” or “bad”; its validity depends on how it is used.

There are many examples of the misuse of educational tests. One example is the use of achievement tests in the USA as the primary criterion for evaluating the quality of teachers, which occurs in about 22 of the 50 states (Sireci & Soto, 2016). Given that students are not randomly assigned to teachers, differences between students that have nothing to do with a teacher will be reflected in test scores. For this reason, both the American Statistical Association (2014) and the American Educational Research Association (2015) issued warnings about this practice. Another example is the use of a test as the sole criterion for making admissions decisions. Most colleges use multiple criteria for making admissions decisions such as high school grades, test scores, and other measures of achievement. However, in New York City, Mayor Bill De Blasio pointed out that the competitive high schools that were using an admissions test as the single criterion resulted in only 10 African-American students being enrolled into a class of over 1,000 students (Harris, 2018). Such adverse impact cannot be justified based on the objectivity of the test. On the contrary, decision makers must realize tests provide useful, but limited, information; and are best used in conjunction with other measures of student performance.

Using Educational Tests to Improve Instruction

Given that tests can provide useful information, but can also be misused, it is time to ask, “What is the proper role of standardized tests in education?” We believe the answer lies in (a) integrating instruction and assessment; and (b) being more thoughtful about what tests to administer, when to administer them, and how the results should be used. Tests should be thought of as part of the educational process, rather than as something outside the system. Children should not dread the tests they are “forced to take next week,” but instead look forward to assessments that give them feedback and reinforcement to help them learn. In short, educational tests must be seen as integral part of the instructional process.

An example of integrating assessment and instruction is presented in Figure 1, which is the Curriculum-Instruction-Assessment Triangle (Pellegrino, Chudowsky, & Glaser, 2001). In this triangle, assessment is an equal partner to the learning goals that come from the curriculum, and the instruction designed to meet those goals. If instruction is aligned with the intended curricula, and the knowledge and skills measured by the test are similarly aligned, the test results will provide information teachers and administrators can use to improve teaching, and for parents to know when they should be proud, and when they should intervene.

**Figure 1 The curriculum-instruction-assessment triangle.**

With respect to more responsible test use, we recommend educators and education policy makers dialog about the types of information that are needed, and then design tests tailored to provide that information. Descriptions of intended uses of test scores, and potential misuses, should also be discussed. For tests to be properly integrated into and aligned with instruction, they must be developed to measure content recently taught, and to provide results close to the time when the instruction occurred. Technology offers exciting opportunities to customize tests for students, provide a more engaging testing experience, and deliver tests and their results on the spot. There are many “apps” that provide instruction to students via computers, tablets, and smart phones; assessments should be integrated with these instructional devices.

Conclusion

Educational tests have long demonstrated their utility, but in this 21st century we are also becoming more aware of their limitations. Our challenge is to ensure the tests we use provide the types of information we are looking for, and that the results are not used for purposes for which they have not been validated. Every educational test should prove its worth by providing useful information that benefits students, teachers, parents, or the public. We know how to develop good tests. It is time to know how to use them properly. By matching the intended purpose of a test to its use, and evaluating its impact, we can properly integrate tests into the instructional process, and thus maximize their benefits.

References

American Educational Research Association. (2015). AERA statement on use of value-added models (VAM) for the evaluation of educators and educator preparation programs. Retrieved from http://www.aera.net/Newsroom/NewsReleasesandStatements/AERAIssuesStatementontheUseofValue-AddedModelsinEvaluationofEducatorsandEducatorPreparationPrograms/tabid/16120/Default.aspx First citation in article Google Scholar
American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (2014). Standards for educational and psychological testing. Washington, DC: American Educational Research Association. First citation in article Google Scholar
American Statistical Association. (2014). ASA statement on using value-added models for educational assessment. Retrieved from https://www.amstat.org/asa/files/pdfs/POL-ASAVAM-Statement.pdf First citation in article Google Scholar
Hammond, B. (2015, June 9). Oregon risks losing $140 million for enabling kids to skip Common Core tests, feds warn. The Oregonian. Retrieved from https://www.oregonlive.com/education/2015/06/new_oregon_testing_law_could_j.html First citation in article Google Scholar
Harris, E. A. (2018, June 2). De Blasio proposes changes to New York’s elite high schools. New York Times. Retrieved from https://www.nytimes.com/2018/06/02/nyregion/de-blasio-new-york-schools.html First citation in article Google Scholar
OECD. (2016). Reading performance among 15-year-olds. In OECD. (Ed.), PISA 2015 results (Vol. 1, pp. 145–173). Paris: OECD. https://doi.org/10.1787/9789264266490-8-en First citation in article Google Scholar
Pellegrino, J.Chudowsky, N.Glaser, R. (Eds.). (2001). Knowing what students know: The science and design of educational assessment. Washington, DC: National Academies Press. First citation in article Google Scholar
Rowland-Woods, J., Wixom, M., & Aragon, S. (2015). Assessment opt-out policies: State responses to parent pushback. Washington, DC: Education Commission of the States. Retrieved from http://www.ecs.org/assessment-opt-out-policies-state-responses-to-parent-pushback/ First citation in article Google Scholar
Sireci, S. G., & Soto, A. (2016). Validity and accountability: Test validation for 21st-century educational assessments. In H. BraunEd., Meeting the challenges to measurement in an era of accountability (pp. 149–167). New York, NY: Routledge. First citation in article Google Scholar
TIMSS & PIRLS International Study Center. (2015). Performance at the international benchmarks of mathematics achievement. Retrieved from http://timssandpirls.bc.edu/timss2015/international-results/timss-2015/mathematics/performance-at-international-benchmarks/item-map-and-summary-of-international-benchmarks/ First citation in article Google Scholar

Stephen G. Sireci, College of Education, University of Massachusetts, 813 North Pleasant St., Amherst, MA, 01003, USA, E-mail sireci@acad.umass.edu

Samuel Greiff, Cognitive Science and Assessment (COSA), University of Luxembourg, 2, avenue de l’Université, 4365 Esch sur Alzette, Luxembourg, E-mail samuel.greiff@uni.lu

Volume 35Issue 3May 2019

ISSN: 1015-5759eISSN: 2151-2426

History

ReceivedMay 29, 2019
Published onlineAugust 12, 2019

Licenses & Copyright

PDF download

Verify Phone

Congrats!