The CommonLit Assessment Series features three benchmark assessments for grades 3-12 that have been developed to align with key reading skills and standards. The assessments provide teachers, coaches, and administrators with checkpoints on student progress throughout the academic year.
What do the assessments cover?
Each CommonLit Assessment is composed of three reading passages; typically, two informational passages and one literary passage. Each passage is accompanied by a set of multiple-choice comprehension questions to measure students’ reading abilities. For each assessment, the development team evaluates how well the questions on the assessment cover key skills and standards to ensure content validity. In other words, the team developing the assessments ensures the questions cover the important skills defined by state ELA standards.
What are scaled scores?
CommonLit reports students’ scores on a scale that runs from 150-250, with higher scores indicating higher student performance. To create scaled scores, the difficulty of items and student ability are estimated using Item Response Theory (IRT) methods, and then these student ability estimates are translated to the range of scaled scores. This process ensures that students’ scores reflect their ability levels given the difficulty level of the assessment they took; it allows assessment scores from different assessments to be directly compared because each assessment has been mapped back to the same scale.
Scaled scores help teachers and administrators to identify which students may have struggled with the assessment(s) and which students performed well. These assessments also provide a way to measure students’ growth once they have taken both a Pre-Assessment and a Post-Assessment. The difference in students’ scores between the Pre-Assessment and Post-Assessment is discussed further in the “How can scores be compared across time?” section.
How should students' scaled scores be used?
CommonLit’s Assessment Series is designed to give educators a measure of their students’ reading ability levels at the time of the test. Because higher scaled scores indicate higher ability levels, students’ scores can be used to identify which students are high performers and which students may need more support or intervention.
A number of factors outside of the classroom can impact how well a student performs on an assessment (e.g., how well they slept, what they ate, social factors; Bandalos, 2018; Crocker & Algina, 2008), so students’ scores should be viewed as an estimate of their performance rather than a single source of truth. Students’ assessment scores serve as a checkpoint for teachers and administrators and are best used in conjunction with other information (e.g., other reading assessment scores, data from CommonLit lessons assigned throughout the school year) to make informed decisions about student progress.
Students’ assessment scores should not be used for other purposes, including to evaluate teacher performance. Students' scores on CommonLit Assessments are simply snapshots of their performance, so they should be used within the scope of assessing students’ knowledge and skills.
CommonLit recommends that each assessment in the series is administered in a consistent proctoring format and that students are encouraged to put forth their best effort. Students’ effort levels can affect whether or not their scores are accurate representations of their ability levels.
How can scaled scores be compared across time?
Item Response Theory (IRT) methods allow assessment developers to map assessments within a grade back to the same scaled score metric. The assessments take into account the difficulty level of each assessment, so students’ scores on the Pre-Assessments and Post-Assessments can be directly compared. In other words, if a student’s score increases significantly from the Pre-Assessment to the Post-Assessment and the assessments were given under consistent conditions, that change can be more directly attributed to an increase in the student’s reading ability.
Student scores can also fluctuate slightly between each administration of an assessment because of a number of factors (e.g., their level of concentration on a given day), so small changes in student scores from the Pre- to the Post-Assessment are not indicative of either growth or decline in student ability. Changes in scores are considered meaningful if they are beyond the standard error of measurement for each grade level. Standard errors vary by grade level, and CommonLit’s data displays indicate whether or not a student’s change in score was smaller than the standard error for that grade level. For instance, if the standard error is 5 points, and a student scored 10 points higher on the Post-Assessment than the Pre-Assessment, then that represents meaningful growth. However, if the student only scored 3 points higher on the Post-Assessment, then that change would not be significant.
Students who score at the high end of the scale on the Pre-Assessment may not see big changes in their scores simply because they already demonstrated a strong understanding of the material measured by the assessment. It is also important to take into account the time between test administrations. If students take two assessments very close together, they might not have learned enough between tests for their new skills to be reflected in their scores. CommonLit recommends Post-Assessments be scheduled at least three months after Pre-Assessments for full-year classes and 1.5 months after Pre-Assessments for semester-long classes. Post-Assessments cannot be scheduled less than one month after the Pre-Assessment.
What is a percentile ranking and how should it be interpreted?
A percentile ranking provides a measure of how a student performed on the assessment relative to other students who took the same assessment at the same grade level and is based on students’ scaled scores. If a student’s percentile ranking is listed as 80, it means that the student performed as well as or better than 80% of all CommonLit students who took that assessment at that grade level. Students with high percentile rankings can be said to have performed well relative to other students while students with low percentiles performed lower than other students.
Please note that percentile rankings are only calculated after a sufficient number of students have taken the assessment and therefore might not be available right away when a student receives a score. They also may fluctuate slightly as more students take that assessment.
In addition to providing insight into how a student performed on a specific assessment, percentile rankings can also be useful when looking at students’ growth from the Pre-Assessment to the Post-Assessment. If a student grew significantly from the Pre-Assessment to the Post-Assessment, but their percentile ranking stayed the same, their growth was similar to that of other CommonLit students at that grade level.
What other information is included in the Assessment Series data displays?
CommonLit’s data displays include details about individual students’ performance on each assessment and their growth from the Pre-Assessment to the Post-Assessment.
For each assessment, teachers can see details about how each of their students performed, including their scaled scores and percentile rankings. CommonLit uses percentiles to group students as high, medium, low, and very low performers on teachers’ data reports. These groupings can be used to identify students who may need additional support. It is important to note that these groupings are based on each student’s performance relative to other students and do not indicate how they perform relative to their grade level expectations.
Teachers can also see data about how their students performed on each of the assessed standards. Because CommonLit’s assessments are intentionally short, each standard is only assessed a few times. This prevents the use of IRT methods for scoring standards performance on a consistent scale. Instead, student performance is reported as the percentage of questions associated with each standard answered correctly. The difficulty levels of questions will fluctuate across assessments, leading to changes in how well students perform on the standard not due to their growth in overall reading ability. Therefore, students’ performance on each standard should always be considered alongside other standards data.
CommonLit also provides data displays to help teachers understand how their students’ performance changes from the Pre-Assessment to the Post-Assessment as well as how their classes’ performance changed overall.
What does an asterisk next to a students’ scaled score mean?
CommonLit strives to provide accurate and reliable information about student performance to teachers and administrators. Consequently, students who do not complete at least 80% of an assessment’s questions will be flagged using an asterisk. CommonLit’s scoring procedure treats not answering a question the same as getting a question wrong, so a student who misses a lot of questions will have a low score, even if they may have answered those missed questions correctly. When interpreting these students’ assessment scores, teachers and administrators should use other data points to determine whether or not the scores accurately reflect these students’ abilities.
How should the Mid-Year Assessment data be interpreted?
The Mid-Year Assessment is designed to give teachers a snapshot of students’ reading performance in the middle of the academic year or term. It is not intended to measure students’ growth from the Pre-Assessment. Instead, Mid-Year Assessment results are designed to give detailed information about how your students are performing at this point in the year.
The Mid-Year Assessment data display includes the same data about student performance as the Pre-Assessment, including percentiles to show how students performed compared to other students taking the same CommonLit Mid-Year Assessment. Mid-Year Assessment scores and percentile rankings can be used in conjunction with data from the Pre-Assessment and other reading assessments to identify which students are doing well and who might need additional support.
Why is a student’s scaled score listed as a range?
In some cases, CommonLit’s IRT scoring methods require presenting a range of potential scores for students that achieve the highest possible score on an assessment. This means that if a student scored the highest possible score on that assessment, their ability level lies somewhere between the highest score on that assessment and the highest possible scaled score (e.g., a score between 245-250).
How do I know CommonLit’s assessments are valid?
CommonLit works hard to ensure that scores produced by the Assessment Series assessments are both reliable and valid. Before any students take CommonLit’s assessments, the assessment development team carefully screens passages to ensure they meet CommonLit’s high standards and do not contain any content that could potentially be biased toward certain groups of students. The team then writes and tests questions in the field to make sure they meet strict performance criteria. Previous research has shown that CommonLit’s assessment scores developed by the team were highly associated with standardized state test scores, like the Florida Standards Assessment (FSA). Additional research is being conducted to evaluate how well students’ scores on the Assessment Series predict performance on standardized state test scores.
Internally, CommonLit’s data analysts run a series of statistical analyses of student assessment results to ensure that students’ scores on the assessment are consistent and can be trusted. A reliability estimate above 0.8 suggests that students are answering questions correctly or incorrectly in consistent and predictable patterns, based on the difficulty of the questions. CommonLit’s Pre- and Post-Assessments consistently surpass that benchmark.
To estimate the difficulty and reliability of assessments, a sample of over 90,000 students from 3rd-12th grade completed the assessment questions. The students who completed the assessments were from all 50 states and representative of the general population of U.S. students on factors tracked by CommonLit. Responses from at least 1,000 students per set of assessment questions were collected, and the data was then screened for quality prior to running analyses.