Standardized Tests
April 2, 2012


Objectives: After completing this class, you will be able to:

  • Provide a knowledgeable, scientifically based complaint about many of the tests you have taken as a stduent (a fringe benefit of this course)
  • Understand and explain the differences between item reliability, standard setting, and item selection in standardized tests
  • Apply basic concepts in learning theory to the development of test items for standardized tests of knowledge, skills, and ability
  • Apply at least one of three commonly used techniques to test items for reliability and discriminatory power
  • Develop a scoring rubric for tests of skills, knowledge and abilities and apply at least one of three common techniques to set standards for the test
  • Develop reasonably (probably not perfectly) valid, reliable tests for use in training and teaching
  • Know how to measure learning through pre- and post-testing with participants

Assigned Readings

Coniam, D. (2009) Investigating the quality of teacher-produced tests for EFL students and the effects of training in test development principles and practices on improving test quality. System 37(2), 226-242.

Jones, A.T. (2011) Comparing methods for item analysis: The impact of different item-selection statistics on test difficulty. Applied Psychological Measurement 35(7), 566-571.

Cizek, G.J. & Bunch, M.B. (2007). Standard Setting: A Guide to Establishing and Evaluating Performance Standards on Tests. Sage, Thousand Oaks, CA. What Is Standard Setting?, pp. 13-34. e-reserve

Research Methods Reviews

Leighton, J.P., Heffernan, C., Cor, M.K., Gokiert, R.J. & Cui, Y. (2011) An experimental test of student verbal reports and teacher evaluations as a source of validity evidence for test development. Applied Measurement in Education 24(4), 324-348.

Streiner, D.L. (2010) Measure for measure: New developments in measurement and item response theory. Canadian Journal of Psychiatry 55(3), 180-186.

Class Preparation

Schultz, K.S. & Whitney, D.J. (2005) Measurement Theory in Action. Thousand Oaks, CA: Sage Publications. Pp. 171-213. e-reserve Bring this to Class Read pp. 171-179 -- these 8 pages explain how to develop valid test questions (which you may have experienced rarely in your life as a student!) Then read pp. 191-203 (up to where the case studies on item analysis start.

There are quite a few good web-based resources on how to construct and score test. Here are some of them. Feel free to use these now and in the future. I consult them all the time in my professional work.

How to find (and buy, mostly) just about any test ever developed.

Computer Assisted Assessment Center This site provides a lot of detail about question development. It includes specific sections on many different question formats -- multiple choice, true false, short answer, etc.

The Faculty Development Center at the University of Pittsburgh has a good discussion of how to write and grade essay questions and tests.

Kehoe, J. (1995). Basic item analysis for multiple-choice tests. Practical Assessment, Research & Evaluation, 4(10). Retrieved February 8, 2006 from http://PAREonline.net/getvn.asp?v=4&n=10 .

Zurawski, R.M. (1998) Making the most of exams. Procedures for item analysis. National Teaching & Learning Forum 7(6). Retrieved February 8, 2006 from http://www.ntlf.com/html/pi/9811/exams_1.htm

The Scoring Office of Michigan State University provides an excellent discussion of item analysis.

Free, on-line software from California State University for computing item analysis statistics.

The Professional Testing Corporation provides a brief, but excellent discussion of different approaches to establishing passing scores on tests.

Additional Resources

Boodoo, G.M. (1998) Addressing cultural context in the development of performance-based assessments and computer-adaptive testing: preliminary validity considerations. Journal of Negro Education 67 (3), 211-219.

Burisch, M. (1997). Test length and validity revisited. European Journal of Personality 11, 303-315.

Casbergue, R.M. (2010/2011). Assessment and instruction in early childhood education: Early literacy as a microcosm of shifting perspectives. Journal of Education 190(1/2), 13-20.

Downing, S.M. (2003). Item response theory: applications of modern test theory in medical education. Medical Education 37 (8), 739-745.

Downing, S.M. & Haladyna, T.M. (2004) Validity threats: overcoming interference with proposed interpretations of assessment data. Medical Education 38 (3), 327-333.

Hambleton, R.K. & Patsula, L. (1998). Adapting tests for use in multiple languages and cultures. Social Indicators Research 45, 153-171.

Innes, C.R.H., Jones, R.D. & Anderson, T.J. (2009). Performance in normal subjects on a novel battery of driving-related sensory-motor and cognitive tests. Behavior Research Methods 41(2), 284-294.

Messick, S. (1995). Standards of validity and the validity of standards in performance assessment. Educational Measurement: Issues & Practice 14(4), 5-8.

Michalos, A.C., Creech, H., McDonald, C. & Kahlke, P.M.H. (2011). Knowledge, attitudes and behaviours concerning education for sustainable development: Two exploratory studies. Social Indicators Research 100 (3), 391-413.

Morales, M. C. & Saenz, R. (2007). Correlates of Mexican American students' standardized test scores. Hispanic Journal of Behavioral Sciences 29(3), 349-365.

Mucherah, W. & Yoder, A. (2008). Motivation for reading and middle school students' performance on standardized testing in reading. Reading Psychology 29(3), 214-235.

Patz, R.J. (2006). Building NCLB science assessments: Psychometric and practical considerations. Measurement 4(4), 199-239.

Smoline, D.V. (2008). Some problems of computer-aided testing and "interview-like tests." Computers & Education 51(2), 743-756.

Solley, B.A. (2007). On standardized testing: An ACEI position paper. Childhood Education 84(1), 31-37.

Sommer, R. & Sommer, B. (2002) A Practical Guide to Behavioral Research. Tools and Techniques. New York, Oxford University Press. Read Ch. 16, pp. 224-233. Get from Dr. Swisher

Van de Vijver, F. & Hambleton, R.K. (1996). Translating tests: some practical guidelines. European Psychologist 1 (2), 89-99.

Visone, J.D. (2009). The validity of standardized testing in science. American Secondary Education 38(1), 46-61

Wright, R.E. (2010). Standardized testing for outcome assessment: Analysis of the Educational Testing Systems MBA tests.

Research Articles

Bailey, A.J. (2006) What kind of assessment for what kind of geography? Advanced placement human geography. The Professional Geographer 58 (1), 70-77.

Cahan, S. (2001) Schooling and the norming of intelligence test scores. Educational Measurement: Issues & Practice 19 (3), 26-33.

Impara, J.C. & Palke, B.S. Standard setting: an alternative approach. Journal of Educational Measurement 34(4), 353-366.

Jordan, E.R., Atkins, S., van Niekerk, A. & Seedat, M. (2005) The development of an instrument measuring unintentional injuries in young children in low-income settings to serve as an evaluation tool for a childhood home injury prevention program. Journal of Safety Research 36 (3), 269-280.

Kritikos, V., Pharm, B., Pharm, M.,Krass, I. et al. (2005) The validity and reliability of two asthma knowledge questionnaries. Journal of Asthma 42 (9), 795-801.

Landa, R.J. (2005) Assessment of social communication skills in preschoolers. Mental Retardation and Developmental Disabilities Research Reviews 11 (3) 247-252.

LeFebre, J., Smith-Chant, B.L., Fast, L., Skwarchuk, S. et al. (2005) What counts as knowing? The development of conceptual and procedural knowledge of counting from kindergarten through Grade 2. Journal of Experimental Child Psychology 93(4), 285-303.

Lumley, T. & O'Sullivan, B. (2005) The effect of test-taker gender, audience and topic on task performance in tape-mediated assessment of speaking. Language Testing 55, 415-437.

Siingh-Manoux, A., Richards, M. & Marmot, M. (2005) Socioeconomic position across the lifecourse: how does it relate to cognitive function in mid-life?

Sireci, S.G., Scarpati, S.E. & Li, S. (2005) Test accommodations for students with disabilities: an analysis of the interaction hypothesis. Review of Educational Research 75(4), 457-490.

Stiggins, R.J. (2001) The unfulfilled promise of classroom assessment. Educational Measurement: Issues & Practice 20 (3), 5-15.