This article is intended only for customers and partners using the Assessment Series.
The CommonLit Assessment Series features three benchmark assessments for grades 3-12 that have been developed to align with key reading skills and standards. The assessments provide teachers, coaches, and administrators with checkpoints on student progress throughout the academic year.
What do the assessments cover?
Each CommonLit Assessment is composed of three reading passages; typically, two informational passages and a literary passage. Each passage is accompanied by a set of multiple-choice comprehension questions to measure students’ reading abilities. For each assessment, the development team evaluates how well the questions on the assessment cover key skills and standards to ensure content validity. In other words, the team developing the assessments ensures the questions cover the important skills defined by state ELA standards.
What are scaled scores?
CommonLit reports students’ scores on a scale that runs from 150-250, with higher scores indicating higher student performance. To create scaled scores, the difficulty of items and student ability are estimated using Item Response Theory (IRT) methods, and then these student ability estimates are translated to the range of scaled scores. This process ensures that students’ scores reflect their ability levels given the difficulty level of the assessment they took; it allows assessment scores from different assessments to be directly compared because each assessment has been mapped back to the same scale.
Scaled scores help teachers and administrators to identify which students may have struggled with the assessment(s) and which students performed well. These assessments also provide a way to measure students’ growth once they have taken two assessments in the series. The difference in students’ scores is discussed further in the “How can scores be compared across time?” section.
How should student scores be used?
CommonLit’s Assessment Series is designed to give educators a measure of their students’ reading ability levels at the time of the test. Because higher scaled scores indicate higher ability levels, students’ scores can be used to identify which students are high performers and which students may need more support or intervention.
A number of factors outside of the classroom can impact how well a student performs on an assessment (e.g., how well they slept, what they ate, social factors; Bandalos, 2018; Crocker & Algina, 2008), so students’ scores should be viewed as an estimate of their performance rather than a single source of truth. Students’ assessment scores serve as a checkpoint for teachers and administrators and are best used in conjunction with other information (e.g., other reading assessment scores, data from CommonLit lessons assigned throughout the school year) to make informed decisions about student progress.
Students’ assessment scores should not be used for other purposes, including to evaluate teacher performance. Students' scores on CommonLit Assessments are simply snapshots of their performance, so they should be used within the scope of assessing students’ knowledge and skills.
CommonLit recommends that each assessment in the series is administered in a consistent proctoring format and that students are encouraged to put forth their best effort. Students’ effort levels can affect whether or not their scores are accurate representations of their ability levels.
How can scores be compared across time?
Item Response Theory (IRT) methods allow assessment developers to map assessments within a grade back to the same scaled score metric. The assessments take into account the difficulty level of each assessment, so students’ scores on each assessment can be directly compared to their scores on another assessment in the series. In other words, if a student’s score increases significantly from the Pre-Assessment to the Post-Assessment and the assessments were given under consistent conditions, that change can be more directly attributed to an increase in the student’s reading ability.
Student scores can also fluctuate slightly between each administration of an assessment because of a number of factors (e.g., their level of concentration on a given day), so small changes in student scores from the Pre- to the Post-Assessment are not indicative of either growth or decline in student ability. Changes in scores are considered meaningful if they are beyond the standard error of measurement for each grade level. Standard errors vary by grade level, and CommonLit’s data displays indicate whether or not a student’s change in score was smaller than the standard error for that grade level. For instance, if the standard error is 5 points, and a student scored 10 points higher on the Post-Assessment than the Pre-Assessment, that represents meaningful growth. However, if the student only scored 3 points higher on the Post-Assessment, that change would not be meaningful.
Students who score at the high end of the scale on the Pre-Assessment may not see big changes in their scores simply because they already demonstrated a strong understanding of the material measured by the assessment. It is also important to take into account the time between test administrations. If students take two assessments very close together, they might not have learned enough between tests for their new skills to be reflected in their scores. In order to measure growth, CommonLit recommends that each assessment be scheduled at least three months apart for full-year classes and 1 month apart for semester-long classes. Assessments cannot be scheduled less than one month apart.
What is a percentile ranking and how should it be interpreted?
A percentile ranking provides a measure of how a student performed on the assessment relative to other students who took the same assessment at the same grade level and is based on students’ scaled scores. If a student’s percentile ranking is listed as 80, it means that the student performed as well as or better than 80% of all CommonLit students who took that assessment at that grade level. Students with high percentile rankings can be said to have performed well relative to other students while students with low percentiles performed lower than other students.
Please note that percentile rankings are only calculated after a sufficient number of students have taken the assessment and therefore might not be available right away when a student receives a score. They also may fluctuate slightly as more students take that assessment.
In addition to providing insight into how a student performed on a specific assessment, percentile rankings can also be useful when looking at students’ growth. If a student grew significantly from the Pre-Assessment to the Post-Assessment, but their percentile ranking stayed the same, their growth was similar to that of other CommonLit students at that grade level.
What other information is included in the Assessment Series data displays?
CommonLit’s data displays include details about individual students’ performance on each assessment and their growth from assessment to assessment.
For each assessment, teachers can see details about how each of their students performed, including their scaled scores and percentile rankings. CommonLit uses performance thresholds to group students as below & approaching grade level, on grade level, and above grade level on teachers’ data reports. These groupings can be used to identify students who may need additional support. For more information on these performance groups, see the section “How does CommonLit define Below & Approaching Grade Level, On Grade Level, and Above Grade Level?”
Teachers can also see data about how their students performed on each of the assessed standards. Because CommonLit’s assessments are intentionally short, each standard is only assessed a few times. This prevents the use of IRT methods for scoring standards performance on a consistent scale. Instead, student performance is reported as the percentage of questions associated with each standard answered correctly. The difficulty levels of questions will fluctuate across assessments, leading to changes in how well students perform on the standard not due to their growth in overall reading ability. Therefore, students’ performance on each standard should always be considered alongside other standards data.
CommonLit also provides data displays to help teachers understand how their students’ performance changes from the Pre-Assessment to the Post-Assessment as well as how their classes’ performance changed overall.
What does an asterisk next to a student’s score mean?
CommonLit strives to provide accurate and reliable information about student performance to teachers and administrators. Consequently, students who do not complete at least 80% of an assessment’s questions will be flagged using an asterisk. CommonLit’s scoring procedure treats not answering a question the same as getting a question wrong, so a student who misses a lot of questions will have a low score, even if they may have answered those missed questions correctly. When interpreting these students’ assessment scores, teachers and administrators should use other data points to determine whether or not the scores accurately reflect these students’ abilities.
How do I know that scores from CommonLit’s assessments are valid?
CommonLit works hard to ensure that scores produced by the Assessment Series assessments are both reliable and valid. Before any students take CommonLit’s assessments, the assessment development team carefully screens passages to ensure they meet CommonLit’s high standards and do not contain any content that could potentially be biased toward certain groups of students. The team then writes and tests questions in the field to make sure they meet strict performance criteria. Previous research has shown that CommonLit’s assessment scores developed by the team were highly associated with standardized state test scores, like the Florida Standards Assessment (FSA). Additional research is being conducted to evaluate how well students’ scores on the Assessment Series predict performance on standardized state test scores.
Internally, CommonLit’s data analysts run a series of statistical analyses of student assessment results to ensure that students’ scores on the assessment are consistent and can be trusted. A reliability estimate above 0.8 suggests that students are answering questions correctly or incorrectly in consistent and predictable patterns, based on the difficulty of the questions. CommonLit’s Assessments consistently surpass that benchmark.
To estimate the difficulty and reliability of assessments, a sample of over 115,000 students from 3rd to 12th grade completed the assessment questions. The students who completed the assessments were from all 50 states and representative of the general population of US students on factors tracked by CommonLit. Responses from at least 1,500 students per set of assessment questions were collected, and the data were then screened for quality prior to running analyses.
How does CommonLit define Below or Approaching Grade Level, On Grade Level, and Above Grade Level?
What do the three performance categories mean?
The three performance categories on CommonLit Assessments are: Below or Approaching Grade Level, On Grade Level, and Above Grade Level. To establish these categories, a panel of teachers for each grade discussed the literacy skills and abilities students would be expected to have at the end of the academic year based on the Common Core English Language Standards. Then, the panels mapped these skills and abilities to ranges of scaled scores on CommonLit’s assessments. More information about this process can be found in the “How did CommonLit come up with the definitions for the performance categories?” section.
Given that student scores demonstrate their skills and abilities, students will fall into a performance category based on their score on each assessment. Students with very high scores will fall into the “Above Grade Level” category because their scores suggest that they have surpassed what they should be able to do at the end of the year for that particular grade level. Students who are marked as “On Grade Level” have demonstrated that they currently know what they should know at the end of the year. Students who are marked as “Below or Approaching Grade Level” may need additional support to develop the skills they should have by the end of the year.
How did CommonLit come up with the definitions for the performance categories?
For grades 3-12, individual student performance standard settings were conducted during the summer of 2022. Panels of teachers with diverse expertise, experience, and demographics were recruited for each grade to participate in a Modified Angoff standard setting workshop series. During the standard setting process, the panelists determined what proficiencies and skills students would need to have to be considered Below or Approaching Grade Level, On Grade Level, and Above Grade Level, and created student profiles for each level of performance. Panelists then used these profiles to complete a rating exercise to map student performance expectations for each performance group to an operational assessment. The results of this exercise were then used to determine how scaled score performance on CommonLit assessments aligns with the student performance categories.