Like this Product? Get CRESST News Every Month!

Don't miss out on the latest CRESST news including our FREE reports and products. Subscribe to the monthly CRESST E-newsletter right now!

We only use this address for the monthly CRESST E-Newsletter and will ask for confirmation before adding you to our list.

No thanks | Don't ask again


Please note that CRESST reports were called "CSE Reports" or "CSE Technical Reports" prior to CRESST report 723.

#840 – Limited-Information Goodness-of-Fit Testing of Diagnostic Classification Item Response Theory Models
Mark Hansen, Li Cai, Scott Monroe, and Zhen Li
#839 – A New Statistic for Evaluating Item Response Theory Models for Ordinal Data
Li Cai and Scott Monroe


We propose a new limited-information goodness of fit test statistic C2 for ordinal IRT models. The construction of the new statistic lies formally between the M2 statistic of Maydeu-Olivares and Joe (2006), which utilizes first and second order marginal probabilities, and the M2* statistic of Cai and Hansen (2013), which collapses the marginal probabilities into means and product moments. Unlike M2*, C2 may be computed even when the number of items is small and the number of categories is large. It is as well calibrated as the alternatives
and can be more powerful than M2. When all items are dichotomous, C2 becomes equivalent to M2*, which is also equivalent to M2. We analyze empirical data from a patient-reported outcomes measurement development project to illustrate the potential differences in substantive conclusions that one may draw from the use of different statistics for model fit assessment.

#838 – Identifying Common Mathematical Misconceptions from Actions in Educational Video Games
Deirdre Kerr


Educational video games provide an opportunity for students to interact with and explore complex representations of academic content and allow for the examination of problem-solving strategies and mistakes that can be difficult to capture in more traditional environments. However, data from such games are notoriously difficult to analyze. This study used a three-step process to examine mistakes students make while playing an educational video game about the identification of fractions. First, cluster analysis was used to identify common misconceptions in the game. Second, a survey was given to determine if the identified in-game misconceptions represented real-world misconceptions. Third, a second educational video game was analyzed to determine whether the same misconceptions would be identified in both games. Results indicate that the in-game misconceptions identified in this study represented real-world misconceptions and demonstrate that similar misconceptions can be found in different representations.

#837 – Dynamic Bayesian Network Modeling of Game Based Diagnostic Assessments
Roy Levy


Digital games offer an appealing environment for assessing student proficiencies, including skills and misconceptions in a diagnostic setting. This paper proposes a dynamic Bayesian network modeling approach for observations of student performance from an educational video game. A Bayesian approach to model construction, calibration, and use in facilitating inferences about students on the fly is described.

#836 – Automatically Scoring Short Essays for Content
Deirdre Kerr, Hamid Mousavi, and Markus R. Iseli


The Common Core assessments emphasize short essay constructed response items over multiple choice items because they are more precise measures of understanding. However, such items are too costly and time consuming to be used in national assessments unless a way is found to score them automatically. Current automatic essay scoring techniques are inappropriate for scoring the content of an essay because they rely on either grammatical measures of quality or machine learning techniques, neither of which identifies statements of meaning (propositions) in the text. In this report, we explain our process of (1) extracting meaning from student essays in the form of propositions using our text mining framework called SemScape, (2) using the propositions to score the essays, and (3) testing our system’s performance on two separate sets of essays. Results demonstrate the potential of this purely semantic process and indicate that the system can accurately extract propositions from student short essays, approaching or exceeding standard benchmarks for scoring performance.

#835 – The Effect of In-Game Errors on Learning Outcomes
Deirdre Kerr and Gregory K.W.K. Chung


Student mathematical errors are rarely random and often occur because students are applying procedures that they believe to be accurate. Traditional approaches often view such errors as indicators of students’ failure to understand the construct in question, but some theorists view errors as opportunities for students to expand their mental model and create a deeper understanding of the construct. This study examines errors in an educational video game that are indicative of two specific misunderstandings students have about fractions (unitizing and partitioning) in order to determine whether the occurrence of those errors makes students more likely to learn from the game or more likely to be confused by the game. Analysis indicates that students who made unitizing errors were more likely to be confused by the game while students who made partitioning errors were more likely to learn from the game.

#834 – Estimation of a Ramsay-Curve Item Response Theory Model by the Metropolis-Hastings Robbins-Monro Algorithm
Scott Monroe and Li Cai


In Ramsay curve item response theory (RC-IRT, Woods & Thissen, 2006) modeling, the shape of the latent trait distribution is estimated simultaneously with the item parameters. In its original implementation, RC-IRT is estimated via Bock and Aitkin’s (1981) EM algorithm, which yields maximum marginal likelihood estimates. This method, however, does not produce the parameter covariance matrix as an automatic byproduct upon convergence. In turn, researchers are limited in when they can employ RC-IRT, as the covariance matrix is needed for many statistical inference procedures. The present research remedies this problem by estimating the RC-IRT model parameters by the Metropolis- Hastings Robbins-Monro (MH-RM, Cai, 2010) algorithm. An attractive feature of MH-RM is that the structure of the algorithm makes estimation of the covariance matrix convenient. Additionally, MH-RM is ideally suited for multidimensional IRT, whereas EM is limited by the “curse of dimensionality.” Based on the current research, when RC-IRT or similar semi- nonparametric IRT models are eventually generalized to include multiple latent dimensions, MH-RM would appear to be the logical choice for estimation.

#833 – Estimation of Contextual Effects through Nonlinear Multilevel Latent Variable Modeling with a Metropolis-Hastings Robbins-Monro Algorithm
Ji Seung Yang and Li Cai


The main purpose of this study is to improve estimation efficiency in obtaining full- information maximum likelihood (FIML) estimates of contextual effects in the framework of nonlinear multilevel latent variable model by adopting the Metropolis-Hastings Robbins- Monro algorithm (MH-RM; Cai, 2008, 2010a, 2010b). Results indicate that the MH-RM algorithm can produce FIML estimates and their standard errors efficiently, and the efficiency of MH-RM was more prominent for a cross-level interaction model, which requires five dimensional integration. Simulations, with various sampling and measurement structure conditions, were conducted to obtain information about the performance of nonlinear multilevel latent variable modeling compared to traditional hierarchical linear modeling. Results suggest that nonlinear multilevel latent variable modeling can more properly estimate and detect a contextual effect and a cross-level interaction than the traditional approach. As empirical illustrations, two subsets of data extracted from Programme for International Student Assessment (PISA, 2000; OECD, 2000) were analyzed.

#832 – Treatment Confounded Missingness: A Comparison of Methods for Addressing Censored or Truncated Data in School Reform Evaluations
Jordan H. Rickles, Mark Hansen, and Jia Wang


In this paper we examine ways to conceptualize and address potential bias that can arise when the mechanism for missing outcome data is at least partially associated with treatment assignment, an issue we refer to as treatment confounded missingness (TCM). In discussing TCM, we bring together concepts from the methodological literature on missing data, mediation, and principal stratification. We use a pair of simulation studies to demonstrate the main biasing properties of TCM and test different analytic approaches for estimating treatment effects given this missing data problem. We also demonstrate TCM and the different analytic approaches with empirical data from a study of a traditional high school that was converted to a charter school. The empirical illustration highlights the need to investigate possible TCM bias in high school intervention evaluations, where there is often an interest in studying the effects of an intervention or reform on both school persistence and academic achievement.

#831 – Automatic Short Essay Scoring Using Natural Language Processing to Extract Semantic Information in the Form of Propositions
Deirdre Kerr, Hamid Mousavi, and Markus R. Iseli


The Common Core assessments emphasize short essay constructed-response items over multiple-choice items because they are more precise measures of understanding. However, such items are too costly and time consuming to be used in national assessments unless a way to score them automatically can be found. Current automatic essay-scoring techniques are inappropriate for scoring the content of an essay because they either rely on grammatical measures of quality or machine learning techniques, neither of which identify statements of meaning (propositions) in the text. In this report, we introduce a novel technique for using domain-independent, deep natural language processing techniques to automatically extract meaning from student essays in the form of propositions and match the extracted propositions to the expected response. The empirical results indicate that our technique is able to accurately extract propositions from student short essays, reaching moderate agreement with human rater scores.