Reports
Please note that CRESST reports were called "CSE Reports" or "CSE Technical Reports" prior to CRESST report 723.
#759 – Evaluation of the WebPlay Arts Education Program: Findings from the 2006–07 School Year
Noelle Griffin, Jinok Kim, Youngsoon So, Vivian Hsu
CRESST Report 759, 2009
Summary
This report presents results from the second year of CRESST’s three-year evaluation of the WebPlay program. WebPlay is an online-enhanced arts education program for K–12 students. The evaluation occurred during the three-year implementation of the program in Grades 3 and 5 in California schools; this report focused on results from the second year of program implementation, 2006–07. Results show that WebPlay participation was significantly related to positive educational engagement/attitude. In terms of California Standards Test (CST) English Language Arts (ELA) scores, despite no overall WebPlay effects, a significant difference was found for limited English proficiency (LEP) students. The results support that a well-designed, theater-based education can improve student engagement; and that it may have academic benefits in language arts content, particularly for those students who are struggling with English proficiency.
#603 – Impact of Student Language Background on Content-Based Performance: Analyses of Extant Data
Jamal Abedi
CSE Report 603, 2003
Summary
We analyzed existing test data and student background data from four different school sites nationwide to examine whether standardized test results may be confounded by the lack of language proficiency of English language learners. Several analyses comparing the performance of limited English proficient (LEP) students and their non-LEP classmates revealed major differences. A Disparity Index was created to measure the performance gap between LEP and non-LEP students on tests with varying levels of language demand. The more linguistically complex the nature of the test, the greater was the Disparity Index of non-LEP students’ results over LEP students’. This may suggest that high-language-load test items in assessments of content such as math and science may act as a source of measurement error. LEP students tended to have lower internal consistency scores on standardized assessments. Again, this suggests that item language load may interfere with testing the intended constructs. Using multiple regression, multivariate analysis of variance, and canonical correlation, we found that the more language load in a test, the stronger the confounding between LEP status and content-based performance on that test. Structural models for LEP student results demonstrated a lower statistical fit among test items, as well as between items and the total test scores. The factor loadings were generally lower for LEP students, and the correlations between the latent content-based variables were weaker as well. Results of our analyses indicate that: 1. English language proficiency level is associated with performance on content-based assessments. 2. There is a performance gap in content assessment between LEP students and non-LEP students. 3. The performance gap between LEP students and non-LEP students increases as the language load of the assessment tools increases. 4. Test items high in language complexity may be sources of measurement error. 5. Performance on content-based assessments may be confounded with English language proficiency level.
#519 – Student Assessment and Student Achievement in the California Public School System
Joan Herman, Richard S. Brown, and Eva Baker
CSE Report 519, 2000
Summary
More than fifteen years ago, a prominent national commission declared us a nation at educational risk, noting "a rising tide of mediocrity that threatens our very future as a nation" (National Commission on Excellence in Education, 1983). A decade later, California received its own special wake-up call when results from the 1990 and 1992 National Assessment of Educational Progress (NAEP) state-by-state comparisons revealed that California students were scoring near the bottom nationally in eighth-grade mathematics and fourth-grade reading. What of the situation today? How are California's students faring? Are our students making progress toward the rigorous standards that have been established for their performance? Are our schools improving? Are they better preparing our students for future success? As we strive toward excellence, who is being helped most and who not by California's educational system?
Answers to these seemingly simple, bottom-line questions are complex to formulate, made more so by the history and current status of the state's assessment system, the nature of other available indicators of educational quality, and the imprecision of all assessments. In this report, the authors provide a context for examining the progress of students and schools by reviewing California's recent testing history and the state's progress in creating a sound, standards-based assessment system. Then, they review available data about student performance, examining how schools are doing and the factors that most influence assessment results, and close with a discussion of the goals of accountability and standards by which such systems should be judged.
#455 – On The Validity Of Concept Map-Base Assessment Interpretations: An Experiment Testing The Assumption Of Hierarchical Concept Maps in Science
Maria Araceli Ruiz-Primo, Richard Shavelson, and Susan Elise Shultz
CSE Report 455, 1997
Summary
In recent years, concepts maps have been increasingly considered as a supplement to traditional multiple-choice tests for classroom and large-scale assessment use. But little research exists to guide important decisions about concept map use as an assessment. In this study, CRESST/Stanford University researchers investigated the (1) impact of imposing a hierarchical structure on students' representations of their knowledge, (2) consistency of raters in scoring concept maps, and (3) whether concept maps and multiple-choice tests provide similar information about students' declarative knowledge.
Using random assignment of high school classes to two chemistry topics, the researchers found high levels of rater agreement (above .90) on scoring the concept map tasks. Correlations between multiple-choice tests and concept map scores across types of scores were all positive and moderate (r=.31 on average). suggesting that both types of assessments measure overlapping but somewhat different aspects of student knowledge. Results were inconclusive on whether or not different concept mapping techniques would produce the same information about student knowledge.
#781 – Evaluation of Seeds of Science/Roots of Reading: Effective Tools For Developing Literacy Through Science in the Early Grades-Light Energy Unit
Pete Goldschmidt, Hyekyung Jung
CRESST Report 781, January 2011
Summary
This evaluation focuses on the Seeds of Science/Roots of Reading: Effective Tools for Developing Literacy through Science in the Early Grades (Seeds/Roots) model of science-literacy integration. Quantitative results indicate that the Seeds/Roots intervention resulted in statistically and substantively higher student performance in science content, vocabulary, and writing. Qualitative results indicate that teachers overwhelmingly found the Seeds/Roots unit usable, effective, and engaging.
#400 – Portfolio-Driven Reform: Vermont Teachers' Understanding of Mathematical Problem Solving and Related Changes in Classroom Practice
Brian M. Stecher and Karen J. Mitchell
CSE Report 400, 1995
Summary
The Vermont portfolio assessment program, conclude the authors of the report, has had substantial positive effects on fourth-grade teachers' perceptions and practices in mathematics.
"Through the Vermont state training materials and network meetings," say CRESST/RAND researchers Brian Stecher and Karen Mitchell, "teachers have incorporated problem solving into their curriculum and have gained a greater insight into teaching problem-solving skills."
However, the researchers add that teachers do not yet share a common understanding of mathematical problem solving and have not reached agreement on the most essential skills to be taught. The emphasis on the scoring rubrics, for example, has helped teachers focus on some of the important and observable aspects of students' problem solving, but may have caused teachers to neglect other important problem-solving skills not addressed in the scoring rubrics. Additionally, significant variability exists in teaching methods-some teachers "preteach" portfolio tasks by assigning similar, simpler problems prior to student work on portfolio pieces so that assessment problems are not overly novel or difficult for students. Differential help may threaten the validity of portfolio scores for comparison of students, classrooms or schools.
The Vermont Department of Education, conclude the authors, should orient existing professional teacher development programs towards increasing teachers' basic understanding of mathematical problem solving and related instructional practices.
Effects of Introducing Classroom Performance Assessments on Student Learning is one of the first empirical examinations of the link between student achievement and performance assessment. Achievement results were compared for both treatment and control schools, where the schools were matched on demographics and socioeconomic factors. Assessments from the Maryland State Department were selected as independent measures of student performance in both sets of classrooms because they are still relatively standardized test-like compared to many performance assessments, but markedly different from traditional standardized tests.
"Our concluding advice," write the researchers, "is that reformers take seriously the current rhetoric about `delivery standards' and the need for sustained professional development to implement a thinking curriculum. The changes that did occur...confirm our beliefs that many more students can develop conceptual understandings presently exhibited by only the most able students-if only they are exposed to relevant problems and given the opportunity-to-learn."
#529 – Validating Standards-Referenced Science Assessments
Bokhee Yoon and Michael J. Young
CSE Report 529, 2000
Summary
As standards with accompanying assessments are being proposed and developed in various states and large districts as instruments for raising academic achievement, the validity of the standards-referenced assessments in shaping educational reform demands attention. In this paper, we examine the construct validity of the New Standards middle school Science Reference Examination focusing on evidence related to the internal and external structure of the assessment, the reliability of the assessment scores, and generalizability of the assessment results. The data were taken from the field test of spring 1998. Results related to the internal structure of the assessment suggest that although the assessment tasks measured a single common factor, this did not detract from the usefulness of scientific thinking or science concept subscores for instructional purposes. With respect to the external structure of the assessment, moderate correlations between the New Standards total scores, and the Stanford Achievement Test (9th edition) and the Otis-Lennon School Aptitude Test (7th edition) scores provided evidence that the scores from these assessments rank student performance in similar ways. However, these correlations do not indicate that the assessments are measuring the same construct. For evidence for the reliability of the assessment scores and decisions based on them, the results of the generalizability studies imply that reader variance could be made negligible by training readers with well-defined scoring rubrics. The high rates of decision consistency and accuracy at different total score cutpoints provide evidence that the New Standards Science Reference Examination could be used reliably to classify student performance on the basis of a total test score. For subscores, providing one cutpoint with a reference point to meet the standards would be instructionally informative.
#810 – Relationships Among and Between ELL Status, Demographic Characteristics, Enrollment History, and School Persistence
Jinok Kim
CRESST Report 810, December 2011
Summary
This report examines enrollment history, achievement gaps, and persistence in school for ELL students and reclassified ELL students as compared to non-ELL students. The study uses statewide individual-level data sets merged from students’ entry to exit in a state public school system for graduate cohorts of 2006, 2007, and 2008. Analytic methods include multilevel logistic regression in which students are nested within districts to study correlates of dropouts. The results reconfirmed other literature showing large achievement and socio economic gaps between ELL and non-ELL students. Results also show that after accounting for academic achievement, behavioral issues, background, and district contexts, the longer a student is designated as an ELL, the more likely he or she is to drop out of school.
#788 – IES Integrated Learning Assessment Final Report
David Silver, Mark Hansen, Joan Herman, Yael Silk, and Cynthia L. Greenleaf
CRESST Report 788, March 2011
Summary
The main purpose of this study was to examine the effects of the Reading Apprenticeship professional development program on several teacher and student outcomes, including effects on student learning. A key part of the study was the use of an enhanced performance assessment program, the Integrated Learning Assessment (ILA), to measure student content understanding. The ILA instruments included multiple components that assessed student content knowledge, reading comprehension, metacognition, use of reading strategies, and writing skills in applied knowledge. An analysis of student scores using the ILA found little or no significant effects from the Reading Apprenticeship program on class-level student outcomes. However, the researchers found a significant positive effect on teachers' literacy instruction.
#608 – Effectiveness and Validity of Accommodations for English Language Learners in Large-Scale Assessments
Jamal Abedi, Mary Courtney, and Seth Leon
CSE Report 608, 2003
Summary
As the population of English language learners (ELLs) in U.S. public schools continues to grow, issues concerning their instruction and assessment are steadily among the top national priorities in education. The goal of this study was to examine the effectiveness, validity, and feasibility of selected language accommodations for ELL students on large-scale science assessments. In addition, student background variables were studied to judge the impact of such variables on student test performance.
Both ELL and non-ELL students in Grades 4 and 8 were tested in science under accommodation or under a standard testing condition. Language accommodation strategies (Customized English Dictionary, Bilingual/English Glossary, and Linguistic Modification of test items) were selected based on frequency of usage, nationwide recognition, feasibility, and first-language literacy factors. Students were sampled from different language and cultural backgrounds. We also included a measure of English reading proficiency to control for any initial differences in reading ability.
The effectiveness of accommodation for Grade 8 students was different from the findings for Grade 4 students. In Grade 8, the Linguistic Modification accommodation helped ELL students increase their performance while the accommodated performance of non-ELL students was unchanged. A non-significant impact of the linguistically modified test on the non-ELL group assures the validity of this accommodation. As for feasibility, this accommodation requires up-front preparation, but is easy to implement in the field; therefore, it is feasible for large-scale assessments.
In general, accommodations did not have a significant impact on students’ performance in Grade 4. We believe this may be because of the lower language demand in the lower grades. With an increase in the grade level, more> complex language may interfere with content-based assessment. Though language factors still have an impact on the assessment of ELL students in lower grades, other factors such as poverty and parent education may be more powerful predictors of students’ performance in lower grades. Another consideration is that Grade 4 students may be less familiar with glossary and dictionary use, as well as less exposed to science.
The lack of significant impact on Grade 4 non-ELL students is an encouraging result because it suggests that the accommodation did not alter the construct under measurement.