How Many Response Options are Enough?

A frequent question during item development is how many response options should be presented for each item. The answer: it depends.

Ideally, each item includes only plausible responses to the question posed—the correct response(s) and plausible but incorrect responses. Plausible response options help to provide valuable information about the examinee’s abilities. When there are not enough reasonable yet incorrect response options, it is difficult to include several plausible response options. As cited by Rodriguez (2005), Haladyna and Downing (1988) define a plausible (i.e., functional) distractor “as one that has (a) a significant negative point-biserial correlation with the total test score, (b) a negatively sloping item characteristic curve, and (c) a frequency of response greater than 5% for the total group” (p. 5). In other words, plausible distractors should: (a) not be positively related to the total test score, (b) be selected more frequently by candidates with low ability as measured by the exam and selected less frequently by candidates with high ability as measured by the exam, and (c) be chosen by at least 5% of the candidates taking the exam.

The current recommendation (Haladyna & Rodriguez, 2013) is that three response options are appropriate when there is a correct response and only two plausible but incorrect response options. If more plausible response options exist, they should be included; however, every plausible response does not need to be included. Haladyna and Downing (1989) concluded the important factor is “not the number of distractors but the quality of distractors” (p. 59).

Item performance for items with three response options is similar to those with four or more response options if the additional options are not plausible. (For further information, see Rodriguez [2005]). According to Haladyna and Rodriguez (2013), “As most test takers have an option elimination strategy that is part of their testwiseness, random guessing in the presence of complete ignorance is very rare” (p. 67). Measurement precision is not degraded with the use of three response options. Adding a fourth non-plausible option will not draw examinees and therefore will not provide additive information. Further, fewer response options for items could enable test developers to increase the number of items on the test form, which potentially increases exam reliability (Haladyna & Rodriguez, 2013).

We recommend that the number of response options reflect the inclusion of the key (i.e., correct answer) and as many incorrect but plausible response options as is reasonable and appropriate without exhausting every possible plausible response option. Spending item development time on crafting implausible response options to look plausible decreases time available for developing additional items and does not add value. Having a different number of response options for different items on an exam is appropriate and acceptable.

Instead of having a set “rule” for the number of response options to be included with items for a particular exam, it is more efficient to focus on including plausible and attractive but incorrect response options for items allowing three, four, five, or more response options as appropriate.

For specific research on this topic, consider the following sources:
Haladyna, T. M., & Downing, S. M. (1988). Functional distractors: Implications for test-item writing and test design. Paper presented at the annual meeting of the AREA, New Orleans, LA.

Haladyna, T. M., & Downing, S. M. (1989). Validity of a taxonomy of multiple-choice item writing rules. Applied Measurement in Education, 2, 37–50.

Haladyna, T. M., & Downing, S. M. (1993). How many options is enough for a multiple-choice item? Educational and Psychological Measurement, 53, 999–1010.

Haladyna, T. M., & Rodriguez, M. C. (2013). Developing and validating test items. New York, NY: Routledge.

Rodriguez, M. C. (2005). Three options are optimal for multiple-choice items: A meta-analysis of 80 years of research. Educational Measurement: Issues and Practice, 24(2), 3–13.

PREVIOUS

NEXT

Recent Posts

A Simple Method to Detect Score Similarity and Practical Implications for Its Use

Effortlessly Set Candidate Communication Language Preferences with CertMetrics-Credly Integration

Alpine Testing Solutions Welcomes Stephen Price as New Chief Executive Officer

Badging Options for Alpine Testing Solutions’ Partner Programs Continue to Expand