Like this Product? Get CRESST News Every Month!

Don't miss out on the latest CRESST news including our FREE reports and products. Subscribe to the monthly CRESST E-newsletter right now!

We only use this address for the monthly CRESST E-Newsletter and will ask for confirmation before adding you to our list.

No thanks | Don't ask again


Please note that CRESST reports were called "CSE Reports" or "CSE Technical Reports" prior to CRESST report 723.

#848 – --
#847 – Measuring the Causal Effect of the National Math + Science Initiative’s College Readiness Program
Richard S. Brown and Kilchan Choi


This study employs a potential outcomes modeling approach to estimate the causal effect of the National Math + Science Initiative’s College Readiness Program on Advanced Placement test taking and qualifying score earning for three recent cohorts of schools. Results indicate substantial and significant increases in both AP test taking and qualifying score earning for all students. In addition, significant effects for AP test taking and qualifying score earning over baseline were found for female students and minority students when analyzed separately. This study provides evidence of the effectiveness of a College Readiness Program that is having a significant and important impact on preparing more students to succeed in math and science careers and improve the future of math and science education in this country.

#846 – The Implementation and Effects of the Literacy Design Collaborative (LDC): Early Findings in Sixth-Grade Advanced Reading Courses
Joan L. Herman, Scott Epstein, Seth Leon, Yunyun Dai, Deborah La Torre Matrundola, Sarah Reber, and Kilchan Choi


The Bill and Melinda Gates Foundation invested in the Literacy Design Collaborative (LDC)
as one strategy to support teachers’ and students’ transition to the Common Core State
Standards (CCSS) in English language arts. This report provides an early look at the
implementation of LDC in sixth-grade Advanced Reading classes in a large Florida district,
and the effectiveness of the intervention in this setting. The study found that teachers
understood LDC and implemented it with fidelity and that curriculum modules were well
crafted. Teachers also generally reported positive attitudes about the effectiveness of LDC
and its usefulness as a tool for teaching CCSS skills. Although implementation results were
highly positive, quasi-experimental analyses employing matched control group and
regression discontinuity designs found no evidence of an impact of LDC on student
performance on state reading or district writing assessments. Furthermore, students generally
performed at basic levels on assessments designed to align with the intervention, suggesting
the challenge of meeting CCSS expectations. Exploratory analyses suggest that LDC may
have been most effective for higher achieving students. However understandable, the findings
thus suggest that, in the absence of additional scaffolding and supports for low-achieving
students, LDC may be gap enhancing.

#845 – The Implementation and Effects of the Mathematics Design Collaborative (MDC): Early Findings From Kentucky Ninth-Grade Algebra 1 Courses
Joan L. Herman, Deborah La Torre Matrundola, Scott Epstein, Seth Leon, Yunyun Dai, Sarah Reber, and Kilchan Choi


With support from the Bill and Melinda Gates Foundation, researchers and experts in mathematics education developed the Mathematics Design Collaborative (MDC) as a strategy to support the transition to Common Core State Standards in math. MDC provides short formative assessment lessons known as Classroom Challenges for use in middle and high school math classrooms. UCLA CRESST’s study of ninth-grade Algebra 1 classrooms in Kentucky implementing MDC showed strong support from teachers for the intervention and a statistically significant positive impact on student scores on the PLAN Algebra assessment, as compared to similar students statewide in Kentucky.

#844 – Semi-Parametric Item Response Functions in the Context of Guessing
Carl F. Falk, Li Cai


We present a logistic function of a monotonic polynomial with a lower asymptote, allowing additional flexibility beyond the three-parameter logistic model. We develop a maximum marginal likelihood based approach to estimate the item parameters. The new item response model is demonstrated on math assessment data from a state, and a computationally efficient strategy for choosing the order of the polynomial is demonstrated and tested.

#843 – Is There A Magnet School Effect? Using Meta-Analysis To Explore Variation In Magnet School Success
Jia Wang, Jonathan D. Schweig, Joan L. Herman


Magnet schools are one of the largest sectors of choice schools in the United States. In this study, we explored whether there is heterogeneity in magnet school effects on student achievement by examining the effectiveness of 24 recently funded magnet schools in 5 school districts across 4 states. We used a two-step analysis: First, separate magnet school effects were estimated using a propensity score matched regression approach to address selection bias. Second, the magnet effects were synthesized across schools using a multi-level random-effects meta-analytic framework. Results indicated that there is significant variation in magnet school effects on student outcomes, with some magnet schools showing positive effects, and others showing negative effects. This variation can be explained by program implementation and magnet support.

#842 – Student Growth Percentiles Based on MIRT: Implications of Calibrated Projection
Scott Monroe, Li Cai, and Kilchan Choi


This research concerns a new proposal for calculating student growth percentiles (SGP, Betebenner, 2009).  In Betebenner (2009), quantile regression (QR) is used to estimate the SGPs.  However, measurement error in the score estimates, which always exists in practice, leads to bias in the QR-based estimates (Shang, 2012).  One way to address this issue is to estimate the SGPs using a modeling framework that can directly account for the measurement error.  Multidimensional IRT (MIRT) is one such framework, and the one utilized here.  To maximize the generality of the approach, the SNP-MIRT model (Monroe, 2014), which estimates the shape of the latent variable density, is used to obtain model parameter estimates.  These estimates are then used with the calibrated projection linking methodology (Thissen, Varni, et al., 2011, Thissen, Liu, Magnus, & Quinn, 2014, Cai, in press-a, Cai, in-press-b) to produce SGP estimates.  The methods are compared using simulated and empirical data.

#841 – The Effects of Math Video Games on Learning: A Randomized Evaluation Study with Innovative Impact Estimation Techniques
Gregory K.W.K Chung, Kilchan Choi, Eva L. Baker, and Li Cai


A large-scale randomized controlled trial tested the effects of researcher-developed learning games on a transfer measure of fractions knowledge. The measure contained items similar to standardized assessments. Thirty treatment and 29 control classrooms (~1500 students, 9 districts, 26 schools) participated in the study. Students in treatment classrooms played fractions games and students in the control classrooms played solving equations games. Multilevel multidimensional item response theory modeling of the outcome measure produced scaled scores that were more sensitive to the instructional treatment than standard measurement approaches. Hierarchical linear modeling of the scaled scores showed that the treatment condition performed significantly higher on the outcome measure than the control condition. The effect (d = 0.58) was medium to large (Cohen, 1992).

#840 – Limited-Information Goodness-of-Fit Testing of Diagnostic Classification Item Response Theory Models
Mark Hansen, Li Cai, Scott Monroe, and Zhen Li


Although diagnostic classification models have become increasingly popular in educational and psychological measurement, one noted limitation has been a lack of established methods for evaluating their goodness-of-fit. This is a significant problem, since inferences made on the basis of these models (including classification of examinees) may be invalid when the models are badly misspecified. This study examines the potential utility of two indices for testing the fit of these models: Maydeu-Olivares and Joe's (2006) M2 and Chen and Thissen's (1997) local dependence (LD) chi-square. These two indices have been previously applied to standard item response theory models. Here we evaluate their performance when applied to diagnostic classification models. Both were found to have good calibration under a wide range of data generating conditions and were sensitive to some--but not all--types of misspecification examined. The study suggests that M2 and the LD index may be quite useful for detecting and characterizing diagnostic classification model misfit.

#839 – A New Statistic for Evaluating Item Response Theory Models for Ordinal Data
Li Cai and Scott Monroe


We propose a new limited-information goodness of fit test statistic C2 for ordinal IRT models. The construction of the new statistic lies formally between the M2 statistic of Maydeu-Olivares and Joe (2006), which utilizes first and second order marginal probabilities, and the M2* statistic of Cai and Hansen (2013), which collapses the marginal probabilities into means and product moments. Unlike M2*, C2 may be computed even when the number of items is small and the number of categories is large. It is as well calibrated as the alternatives
and can be more powerful than M2. When all items are dichotomous, C2 becomes equivalent to M2*, which is also equivalent to M2. We analyze empirical data from a patient-reported outcomes measurement development project to illustrate the potential differences in substantive conclusions that one may draw from the use of different statistics for model fit assessment.