Standardized Tests
April 2, 2012
|
Objectives: After completing this class, you will be able to:
Assigned Readings Coniam, D. (2009) Investigating the quality of teacher-produced tests for EFL students and the effects of training in test development principles and practices on improving test quality. System 37(2), 226-242. Jones, A.T. (2011) Comparing methods for item analysis: The impact of different item-selection statistics on test difficulty. Applied Psychological Measurement 35(7), 566-571. Cizek, G.J. & Bunch, M.B. (2007). Standard Setting: A Guide to Establishing and Evaluating Performance Standards on Tests. Sage, Thousand Oaks, CA. What Is Standard Setting?, pp. 13-34. e-reserve Research Methods Reviews Leighton, J.P., Heffernan, C., Cor, M.K., Gokiert, R.J. & Cui, Y. (2011) An experimental test of student verbal reports and teacher evaluations as a source of validity evidence for test development. Applied Measurement in Education 24(4), 324-348. Streiner, D.L. (2010) Measure for measure: New developments in measurement and item response theory. Canadian Journal of Psychiatry 55(3), 180-186. Class Preparation Schultz, K.S. & Whitney, D.J. (2005) Measurement Theory in Action. Thousand Oaks, CA: Sage Publications. Pp. 171-213. e-reserve Bring this to Class Read pp. 171-179 -- these 8 pages explain how to develop valid test questions (which you may have experienced rarely in your life as a student!) Then read pp. 191-203 (up to where the case studies on item analysis start. There are quite a few good web-based resources on how to construct and score test. Here are some of them. Feel free to use these now and in the future. I consult them all the time in my professional work. How to find (and buy, mostly) just about any test ever developed. Computer Assisted Assessment Center This site provides a lot of detail about question development. It includes specific sections on many different question formats -- multiple choice, true false, short answer, etc. The Faculty Development Center at the University of Pittsburgh has a good discussion of how to write and grade essay questions and tests. Kehoe, J. (1995). Basic item analysis for multiple-choice tests. Practical Assessment, Research & Evaluation, 4(10). Retrieved February 8, 2006 from http://PAREonline.net/getvn.asp?v=4&n=10 . Zurawski, R.M. (1998) Making the most of exams. Procedures for item analysis. National Teaching & Learning Forum 7(6). Retrieved February 8, 2006 from http://www.ntlf.com/html/pi/9811/exams_1.htm The Scoring Office of Michigan State University provides an excellent discussion of item analysis. Free, on-line software from California State University for computing item analysis statistics. The Professional Testing Corporation provides a brief, but excellent discussion of different approaches to establishing passing scores on tests. |
|
Additional Resources Boodoo, G.M. (1998) Addressing cultural context in the development of performance-based assessments and computer-adaptive testing: preliminary validity considerations. Journal of Negro Education 67 (3), 211-219. Burisch, M. (1997). Test length and validity revisited. European Journal of Personality 11, 303-315. Casbergue, R.M. (2010/2011). Assessment and instruction in early childhood education: Early literacy as a microcosm of shifting perspectives. Journal of Education 190(1/2), 13-20. Downing, S.M. (2003). Item response theory: applications of modern test theory in medical education. Medical Education 37 (8), 739-745. Downing, S.M. & Haladyna, T.M. (2004) Validity threats: overcoming interference with proposed interpretations of assessment data. Medical Education 38 (3), 327-333. Hambleton, R.K. & Patsula, L. (1998). Adapting tests for use in multiple languages and cultures. Social Indicators Research 45, 153-171. Innes, C.R.H., Jones, R.D. & Anderson, T.J. (2009). Performance in normal subjects on a novel battery of driving-related sensory-motor and cognitive tests. Behavior Research Methods 41(2), 284-294. Messick, S. (1995). Standards of validity and the validity of standards in performance assessment. Educational Measurement: Issues & Practice 14(4), 5-8. Michalos, A.C., Creech, H., McDonald, C. & Kahlke, P.M.H. (2011). Knowledge, attitudes and behaviours concerning education for sustainable development: Two exploratory studies. Social Indicators Research 100 (3), 391-413. Morales, M. C. & Saenz, R. (2007). Correlates of Mexican American students' standardized test scores. Hispanic Journal of Behavioral Sciences 29(3), 349-365. Mucherah, W. & Yoder, A. (2008). Motivation for reading and middle school students' performance on standardized testing in reading. Reading Psychology 29(3), 214-235. Patz, R.J. (2006). Building NCLB science assessments: Psychometric and practical considerations. Measurement 4(4), 199-239. Smoline, D.V. (2008). Some problems of computer-aided testing and "interview-like tests." Computers & Education 51(2), 743-756. Solley, B.A. (2007). On standardized testing: An ACEI position paper. Childhood Education 84(1), 31-37. Sommer, R. & Sommer, B. (2002) A Practical Guide to Behavioral Research. Tools and Techniques. New York, Oxford University Press. Read Ch. 16, pp. 224-233. Get from Dr. Swisher Van de Vijver, F. & Hambleton, R.K. (1996). Translating tests: some practical guidelines. European Psychologist 1 (2), 89-99. Visone, J.D. (2009). The validity of standardized testing in science. American Secondary Education 38(1), 46-61 Wright, R.E. (2010). Standardized testing for outcome assessment: Analysis of the Educational Testing Systems MBA tests. |
|
Research Articles Bailey, A.J. (2006) What kind of assessment for what kind of geography? Advanced placement human geography. The Professional Geographer 58 (1), 70-77. Cahan, S. (2001) Schooling and the norming of intelligence test scores. Educational Measurement: Issues & Practice 19 (3), 26-33. Impara, J.C. & Palke, B.S. Standard setting: an alternative approach. Journal of Educational Measurement 34(4), 353-366. Jordan, E.R., Atkins, S., van Niekerk, A. & Seedat, M. (2005) The development of an instrument measuring unintentional injuries in young children in low-income settings to serve as an evaluation tool for a childhood home injury prevention program. Journal of Safety Research 36 (3), 269-280. Kritikos, V., Pharm, B., Pharm, M.,Krass, I. et al. (2005) The validity and reliability of two asthma knowledge questionnaries. Journal of Asthma 42 (9), 795-801. Landa, R.J. (2005) Assessment of social communication skills in preschoolers. Mental Retardation and Developmental Disabilities Research Reviews 11 (3) 247-252. LeFebre, J., Smith-Chant, B.L., Fast, L., Skwarchuk, S. et al. (2005) What counts as knowing? The development of conceptual and procedural knowledge of counting from kindergarten through Grade 2. Journal of Experimental Child Psychology 93(4), 285-303. Lumley, T. & O'Sullivan, B. (2005) The effect of test-taker gender, audience and topic on task performance in tape-mediated assessment of speaking. Language Testing 55, 415-437. Siingh-Manoux, A., Richards, M. & Marmot, M. (2005) Socioeconomic position across the lifecourse: how does it relate to cognitive function in mid-life? Sireci, S.G., Scarpati, S.E. & Li, S. (2005) Test accommodations for students with disabilities: an analysis of the interaction hypothesis. Review of Educational Research 75(4), 457-490. Stiggins, R.J. (2001) The unfulfilled promise of classroom assessment. Educational Measurement: Issues & Practice 20 (3), 5-15. |