Questionnaires, Scales & Indices - Procedures for Getting It RIGHT

Objectives: After completing this module you will be able to:

Apply the principles of item-reduction to the development of questionnaires, scales and indices
Use rigorous procedures like cognitive testing to make sure that your questions (items) can yield both valid and reliable data
Pre-test the reliability, validity and and discriminatory power of the scores produced using both quantitative and qualitative assessment procedures

Required Materials: Item Development

Class Preparation: Bring a list of at least three questions about the REQUIRED and SUGGESTED readings that you want to discuss in class. We will use your questions to guide class discussion. Please think about these questions in depth and bring questions that will encourage your classmates and you to think about the specific procedures involved in the development of instruments, what can "go wrong" in the process, and how we test for inadequacies in the instruments we create.

Required Materials: Testing for Reliability & Validity

How do we make sure we got it right? Refer to pp. 28-57, The Foundations of Social Research, in the textbook by H. Russell Bernard. Everyone read: The Language and Logic of Social Research, p. 28. Seven of you will be assigned one other short section in the textbook to read, typically 3 or 4 pages at the most. See the Week 5 Discussion Board for your assignment. Be prepared to BRIEFLY state the key features of the topic assigned to you. I want a few sentences only because this must be easy for people to find and use during our work this semester. Post your comments to the Week 5 discussion board before class on February 04.

Viswanathan, M. (2005). Measurement Error and Research Design. Sage, Thousand Oaks, CA. Pp. 1-41. e-reserve. I know this is an old reading -- but it is frankly still the best I see for people new to the concepts. We will be using the procedures described in this reading for your group project. Make sure you understand the basics of the procedures described. Use the information in this document in your assignments.

Beaton, D.E., Wright, J.G., Katz, J.N. (2005) Development of the QuickDASH: Comparison of three item-reduction approaches. Journal of Bone and Joint Surgery 87-A(5):1038-1045. I know it sounds odd, but this is the best example I have seen of item reduction. Read for understanding the concepts. I do not expect you to master the statistical procedures. I want you to understand how and why we take measures to reduce the number of items (better for response, for eample) and keep the best items. This is critical to success, especially in the contemporary setting where people do NOT want to spend a lot of time on answering our questions.

Castillo-Diaz, M. & Padilla, J.L. (2013) How cognitive interviewing can provide validity evidence of the response processes to scale items. Social Indicators Research 114(3), 963-975. You will conduct a cognitive interview in Assignment 1. This will be very helpful in constructing your assessment instrument.

Suggested Materials: These are in some ways repetitive of the material we covered last week. I put them here because you need to use these very clear, step-by-step procedures as you develop the index for Assignment 1. They are critical to success.

DeVellis, Robert F. (2003) Scale Development. 2nd Edition. pp. 60-100, "Guidelines in Scale Development." Sage Publications, Thousand Oaks, CA. e-reserve. This chapter in DeVellis has good discussions of the topics covered in my documents Approaches to Measurement, Steps in Instrument Development, and Testing Procedures. If you found my discussions confusing, this is a good place to get some clarification. If you are comfortable with the materials I provided focus on the sections on pp. 66-101. Some sections are critical to your success in the group project. An example is the discussion entitled "Step 3: Determine the Format for Measurement" that begins on page 71. We will cover some of these topics in class, but you have to read DeVellis to be able to take a sophisticated approach to your assignments. Use this publication to guide your work on the Group Project -- consider it a requirement. It complements the discussion in Bernard very well.

Barry, A.E., Chaney, E.H., Stellefson, M.L. & Chaney, J.D. (2011) So you want to develop a survey: Practical recommendations for scale development. American Journal of Health Studies 26(2), 97-105. Barry et al. provide step by step procedures that can be very helpful in your assignments. This reading is briefer than DeVellis, but does not have all of the information you will need to do a good job on your projects. Plese look at the tables before class in particular. They are very good. Table 1, for example, provides five different kinds of scalar response options. Table 2 gives suggestions for making any research instrument "flow." Helping people with the "flow problem" is one of my main contributions when I work with people as a methodologist. Table 3 makes basic recommendations for item development and Table 4 provides an excellent list of diagnostic questions you should ask yourself about your items. These questions can also be used as the basis for much of your cognitive test for all three assignments.

Additional Useful Materials

Afolabi, Olukayode Ayooluwa (2017) Indigenous emotional intelligence scale: Development and validation. Psychological Thought 10(1), 138-154. doi:10.5964/psyct.v10i1.184 This is a good example and discussion of why it is important to consider context in developing research instruments and why we cannot assume that instruments that provide reliable and valid data in one context will do so in another.

Bendixen, M. & Ottesen Kennair, L.E. (2017) When less is more: Psychometric properties of Norwegian short-forms of the Ambivalent Sexism Scales (ASI and AMI) and the Illinois Rape Myth Acceptance (IRMA) Scale. Scandinavian Journal of Psychology 58, 541-554 . DOI: 10.1177/0013164416658325 There is a continual quest to develop shortened forms of "long" instruments that have demonstrated good ability to generate reliable, valid data because of the problem of non-response or drop-out during completion of lengthy instruments. This is one example of an attempt to develop a "short form."

Bentley-Edwards, K.L. & Stevenson, H.W. Jr. (2016) The multidimensionality of racial/ethnic socialization: Scale construction for the cultural and racial experiences of socialization (CARES). Journal of Child & Family Studies 25(1), 96-108. This article provides insights into how constructs can change meaning over time due to societal changes, creating a need to develop new or revise existing instruments, even those that have provided reliable, valid data in the past.

Berg, C.J., Nehl, E., Sterling, K., Buchanan, T. et al. (2011) The development and validation of a scale assessing individual schemas used in classifying a smoker: Implications for research and practice. Nicotine & Tobacco Research 13(12), 1257-1265. Ignore the topic -- smoking. Focus on the use of discriminant and convergent validity. Note that there are examples of several kinds of statistical tests you can use to test for reliability and validity. Shows how to use demographic characteristics to test for the contextual appropriateness of an instrument.

Deng, L & Chan, W. (2017) Testing the difference between reliability coefficients Alpha and Omega. Educational & Psychological Measurement 77(2), 185-203. DOI: 10.1177/0013164416658325 Focuses on the use of various measures of reliability with a good discussion of Cronbach's alpha. Dijkstra, W. & Ongena, Y. (2006). Question-answer sequences in survey-interviews. Quality & Quantity 40(6), 983-1011. DOI 10.1007/s11135-005-5076-4. This is a nice piece that examines why respondents do not answer questions as we "expect them to." Some good ideas you can use for all of your projects.

Freund, P.A., Tietjens, M. & Strauss, B. (2013) Using rating scales for the assessment of physical self-concept: Why the number of response categories matters. Measurement in Physical Education & Exercise Science 17, 249-263. DOI: 10.1080/1091367X.2013.807265 Discusses item response theory which you need to include in your discussion in Assignment 1. This is a good discussion of the issue of how many response categories to include.

Galasinski, D. & Kozlowska, O. (2013) Interacting with a questionnaire: Respondents' constructions of questionnaire completion. Quality & Quantity 47(6), 3509-3520. Very good piece that takes us beyond cognitive testing to understand the processes that people use as they try to answer our questions.

Garb, H.N., Wood, J.M. & Fiedler, E.R. (2011) A comparison of three strategies for scale construction to predict a specific behavioral outcome. Assessment 18(4), 399-411. I honestly provide you with only one of several ways of assessing the validity and reliability of scores produced by an instrument. This article compares and contrasts three ways of doing so, only the first of which I have included in my instructions for assignments. You may want to use or of the other two in your semester project. To be quite honest, I selected the internal assessment because it was "doable" in the context of a one-semester course.

Hohne, W. & Ongena, Y. (2006) Investigating cognitive effort and response quality of question formats in web surveys using paradata. Field Methods 29(4):365-382. DOI: 10.1177/1525822X17710640

Joo, Min-Ho and Dennen, Vanessa P. (2017) Measuring university students' group work contribution: Scale development and validation. Small Group Research 48(3), 288-310. DOI: 10.1177/1046496416685159 I suspect the topic may be interesting given that you are doing group work. However, I selected this reading because it provides a very detailed discussion of how to use statistical tests for validity and discriminatory power.

Kelly, P., Fitzsimons, C. & Bakeer, G. (2016) Should we reframe how we think about physical activity and sedentary behaviour measurement? Validity and reliability reconsidered. International Journal of Behavioral Nutrition and Physical Activity. 13:32. DOI 10.1186/s12966-016-0351-4

Pelli Paiva, P.C., Neves de Paiva, H., Messias de Oliveira Filho, P., Lamouanier, J.A., Ferreira e Ferreira, E., Conceicao Ferreira, R., Kawachi, I. & Zarzar, M. (2014) Development and validation of a social capital questionnaire for adolescent students (SCQ-AS). PloS ONE 9:(8):e103785. DOI: 10.1371/journal.pone.0103785

Priede, C. & Farrall, S. (2011) Comparing results from different styles of cognitive interviewing: "verbal probing" vs. "thinking aloud." International Journal of Social Research Methodology 14(4), 271-287. There are lots of specific techniques one can use in cognitive interviewing, but this article provides a good explanation of two quite distinct general approaches.

Revilla, M.A., Saris, W.E. & Krosnick, J.A. (2014) Choosing the number of categories in agree-disagree scales. Sociological Methods & Research 43(1), 73-97. DOI: 10.1177/0049124113509605 This article discusses some of the issues involved in the "Likert-type response" approach to measurement. I personally find this approach cumbersome and overused and the approach is criticized by many for the high intellectual demand it places on respondents. This reading specifically addresses how many response categories, which has to do with the intellectual demand issue.

Saylor, R. (2013) Concepts, measures, and measuring well: An alternative outlook. Sociological Methods & Research 42(3):354-391. DOI: 10.1177/0049124113500476. A nice analysis of the failure to consider the first key steps in measurement when we focus on making sure we are measuring the right things.

Xu, H. & Tracey, T.J.G. (2017) Use of multi-group confirmatory factor analysis in examining measurement invariance in counseling psychology research. European Journal of Counselling Psychology. 6(1):75-82. DOI:10.5964/ejcop.v6i1.120

Zhang, W. & Watanabe-Galloway. (2014) Using mixed methods effectively in prevention science: Designs, procedures and examples. Prevention Science 14:654-662. DOI 10.1007/s11121-013-0415-5