How can scaled scores be compared across time?

This article is intended only for customers and partners using the Assessment Series.

Item Response Theory (IRT) methods allow assessment developers to map assessments within a grade back to the same scaled score metric. The assessments take into account the difficulty level of each assessment, so students’ scores on each assessment can be directly compared to their scores on another assessment in the series. In other words, if a student’s score increases significantly from the Pre-Assessment to the Post-Assessment and the assessments were given under consistent conditions, that change can be more directly attributed to an increase in the student’s reading ability. 

Student scores can also fluctuate slightly between each administration of an assessment because of a number of factors (e.g., their level of concentration on a given day), so small changes in student scores from the Pre- to the Post-Assessment are not indicative of either growth or decline in student ability. Changes in scores are considered meaningful if they are beyond the standard error of measurement for each grade level. Standard errors vary by grade level, and CommonLit’s data displays indicate whether or not a student’s change in score was smaller than the standard error for that grade level. For instance, if the standard error is 5 points, and a student scored 10 points higher on the Post-Assessment than the Pre-Assessment, that represents meaningful growth. However, if the student only scored 3 points higher on the Post-Assessment, that change would not be meaningful.

Students who score at the high end of the scale on the Pre-Assessment may not see big changes in their scores simply because they already demonstrated a strong understanding of the material measured by the assessment. It is also important to take into account the time between test administrations. If students take two assessments very close together, they might not have learned enough between tests for their new skills to be reflected in their scores. In order to measure growth, CommonLit recommends that each assessment be scheduled at least three months apart for full-year classes and 1 month apart for semester-long classes. Assessments cannot be scheduled less than one month apart.