A result of a classical test theory (CTT) analysis might be that a particular item was not psychometrically fit (say, because its discrimination parameter was negative). If preliminary scores have not been released, then any decision by the instructor regarding the use of the item in the overall scores will likely have little apparent impact to the student. In fact, a student's response might be that the instructor was wise to weed out "tricky" items to make the overall scores more reliable and valid. However, if scores were released prior to psychometric analysis, a student might be surprised at their revised score. Let's look at an example to illustrate the point.
Say an instructor has a 10 item instrument, with items Q1 through Q10. Amanda gets all of the items correct, so she receives 10/10 = 100%. Babak misses item Q1, so he receives 9/10 = 90%. Chanda misses items Q2, so she also receives 9/10 = 90%. Note these are preliminary scores because the instructor has not yet performed a psychometric analysis.
Assume Q1 is deemed psychometrically unfit. The instructor has two options: (1) toss out the item, or (2) give everyone credit for the item. Let's see how each of these decisions plays out with our three students.
For Amanda, if the item is tossed out, she still has a perfect 9/9 = 100%. Giving her credit on an item for which she has already received credit would not change her 10/10 = 100%.
For Babak, if the item is tossed out, he now has 9/9 = 100% because Q1 was the item he missed. If he is given credit for the missed item, he would earn 10/10 = 100%.
For Chanda, if the item is tossed out, she now has 8/9 = 88.9% because Q1 was not the item she missed. Giving her credit on an item for which she has already received credit would not change her 9/10 = 90% score.
Now, let's consider the consequences. If the instructor released the preliminary scores, and then the revised scores based on the "toss out" decision, Amanda would stay at 100%, Babak would increase from 90% to 100%, and Chanda would decrease from 90% to 88.9%. Babak has reason to be pleased, but Chanda might think the instructor's decision was unfair because she fell below the 90% criterion that often determines the difference between assignment of a grade of A or a grade of B.
If the instructor releases the revised scores based on the "give everyone credit" decision, Amanda would stay at 100%, Babak would increase from 90% to 100%, and Chanda would stay at 90%. Again, Babak has reason to be pleased with the instructor’s decision, but Chanda might think the instructor's decision was unfair because his score increased but her score stayed the same.
Tossing out psychometrically unfit items may negatively impact the scores of some students, whereas giving everyone credit may artificially elevate the scores of other students. Neither decision is “correct.” Nevertheless, instructors should carefully weigh their decision to release preliminary scores, as well as what to do after analyzing the items.
More importantly, any impact is
reduced as the number of items increases (left as an exercise for the reader).
This property along with an increase in internal consistency (e.g., as measured
by Chronbach’s alpha) with an increase in items should counteract complaints
from students that a test is too long.
Crocker, L., & Algina, J. (1986). Introduction to classical and modern test theory. Belmont, CA: Wadsworth.
De Ayala, R. J. (2009). The theory and practice of item response theory. New York: Guilford Press.
Joint Committee on Standards for Educational and Psychological Testing of the American Educational Research Association, the American Psychological Association, and the National Council on Measurement in Education. (1999). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.
Lord, F. (1983). Unbiased estimators of ability
parameters, of their variance, and of their parallel-forms reliability. Psychometrika,
Classical item analysis at the Instructional
Assessment Resources (IAR).
Iteman is stand-alone software designed to provide detailed item and test analysis reports using classical test theory (CTT). $
Lertap is an Excel-based CTT item, test, and survey analysis application.
Xcalibre is stand-alone software for item response theory (IRT) analysis of assessment data.
- CLD Staff
- Course Design Overview
- - Higher Order Thinking
- - Today's Learners
- - Cognition
- - Teaching Strategies
- Assessment Overview
- - Strategies
- - Measurement Tools
- - Reliability and Validity
- - Using Data
- - MCQs Best Practices
- - MCQs from Students
- Instructional Technology Overview
- - Teaching Aids
- - Resources
- Psychometrics Overview
- - Consequences
- - References & Resources
- Evaluation Overview
- Statistics Overview
- Faculty Resources
- - Archive Presentations
- Quality Enhancement Plan
- Contact Us