Reports
Please note that CRESST reports were called "CSE Reports" or "CSE Technical Reports" prior to CRESST report 723.
#723 – Recommendations for Building a Valid Benchmark Assessment System: Interim Report to the Jackson Public Schools
David Niemi, Julia Vallone, Jia Wang, Noelle Griffin
CRESST Report 723, 2007
Summary
Many districts and schools across the U. S. have begun to develop and administer assessments to complement state testing systems and provide additional information to monitor curriculum, instruction and schools. In advance of this trend, the Jackson Public Schools (JPS) district has had a district benchmark testing system in place for many years. To complement and enhance the capabilities of district and school staff, the Stupski Foundation and CRESST (National Center for Research on Evaluation, Standards, and Student Testing at UCLA) worked out an agreement for CRESST to provide expert review and recommendations to improve the technical quality of the district’s benchmark tests. This report (which represents the first of two deliverables on this project) focuses on assessment development and is consistent with the district goal of increasing the predictive ability of the assessments for students’ state test performance, as well as secondary goals.
#396 – How "Messing About" with Performance Assessment in Mathematics Affects What Happens in Classrooms
Roberta J. Flexer, Kate Cumbo, Hilda Borko, Vicky Mayfield, Scott F. Marion
CSE Report 396, 1995
Summary
When provided adequate staff development and administrative support, teachers will adopt performance assessment and new instructional methods into their classroom, conclude the authors in How "Messing About" with Performance Assessment in Mathematics Affects What Happens in Classrooms. The researchers conducted an in-depth qualitative study in three urban Denver schools, tape recording, transcribing and coding 15 mathematics workshops and interviewing all project teachers. "In short, the introduction of performance assessment," says Roberta Flexer, "provides teachers with richer instructional goals than mere computation and raises their expectations of what their students can accomplish in mathematics and what [teachers] can learn about their students." The researchers also found that teachers significantly shifted their instructional practices when exposed to performance assessment. Even some of the most text-dependent teachers began to change the way they taught mathematics: "We found holes in the [mathematics] text book," said one teacher, "so we used a variety of resources in order to build a unit around probability and statistics." Many teachers felt that students were learning more, even if this learning was not necessarily borne out by performance on the Maryland performance tasks. "...I just think they understand it [mathematics] more," said one teacher, "it is not just rote memorization-that they really know what it means when you say 20 times 80 even if they don't know the answer...There is a much deeper understanding." Teachers felt, at times, overwhelmed in trying to implement new assessments and instruction into both reading and mathematics. Further, it remains unknown if the changes will be long term. But the study provides further evidence that performance assessment can lead to an integration of instruction with assessment, more hands-on and problem-based activities aligned with the NCTM standards, and greater academic challenges for both teachers and students.
#766 – Examining the Effectiveness and Validity of Glossary and Read-Aloud Accommodations for English Language Learners in a Math Assessment
Mikyung Kim Wolf, Jinok Kim, Jenny C. Kao, Nichole M. Rivera
CRESST Report 766, November 2009
Summary
Glossary and reading aloud test items are often listed as allowed in many states' accommodation policies for ELL students, when taking states' large-scale mathematics assessments. However, little empirical research has been conducted on the effects of these two accommodations on ELL students' test performance. Furthermore, no research is available to examine how students use the provided accommodations. The present study employed a randomized experimental design and a think-aloud procedure to delve into the effects of the two accommodations. A total of 605 ELL and non-ELL students from two states participated in the experimental component and a subset of 68 ELL students participated in the think-aloud component of the study. Results showed no significant effect of glossary, and mixed effects of read aloud on ELL students' performance. Read aloud was found to have a significant effect for the ELL sample in one state, but not the other. Significant interaction effects between students' prior content knowledge and accommodations were found, suggesting the given accommodation was effective for the students who had acquired content knowledge. During the think-aloud analysis, students did not actively utilize the provided glossary, indicating lack of familiarity with the accommodation. Implications for the effective use of accommodations and future research agendas are discussed.
To cite from this report, please use the following as your APA reference:
Wolf, M. K., Kim, J., Kao, J. C., & Rivera, N. M. (2009). Examining the effectiveness and validity of glossary and read-aloud accommodations for English language learners in a math assessment (CRESST Report 766). Los Angeles: University of California, National Center for Research on Evaluation, Standards, and Student Testing (CRESST).
#385 – The Evolution of a Portfolio Program: The Impact and Quality of the Vermont Program in Its Second Year (1992-1993)
Daniel Koretz, Brian Stecher, Stephen Klein, and Daniel McCaffrey
CSE Report 385, 1994
Summary
Part of an ongoing evaluation of the Vermont portfolio assessment program by RAND/CRESST researchers, this reports presents recent analyses of the reliability of Vermont portfolio scores, and the results of school principal interviews and teacher questionnaires. The message, especially from Vermont teachers, say the researchers, remains mixed. Math teachers, for example, have modified their curricula and teaching practices to emphasize problem solving and mathematical communication skills, but many feel they are doing so at the expense of other areas of the curriculum. About one-half of the teachers report that student learning has improved, but an equal number feel that there has been no change. Additionally, teachers reported great variation in the implementation of portfolios into their classroom, including the amount of assistance provided to students. "One in four teachers," found the authors, "does not assist his or her own students in revisions, and a similar proportion does not permit students to help each other. Seventy percent of fourth-grade teachers and thirty-nine percent of eighth-grade teachers forbid parental or other outside assistance." Consequently, students who receive more support from teachers, parents and other students, may have a significant advantage over students who receive little or no outside help. Reliability problems continue. "The degree of agreement," wrote the authors, "among Vermont's portfolio raters was much lower than among raters in studies with other types of constructed response measures." The authors suggest that one cause of the low reliability was the diversity of tasks within each portfolio. Because teachers and students are free to select their own pieces, performance on the tasks is much more difficult to assess than if the work were standardized. Despite these problem areas, support for the portfolio program remains high. Teachers, for example, expressed strong support for expanding portfolios to all grade levels. Seventy percent of principals said that their schools had extended portfolio usage beyond the original Vermont state mandate.
#703 – The Nature and Impact of Teachers’ Formative Assessment Practices
Joan L. Herman, Ellen Osmundson, Carlos Ayala, Stephen Schneider, Mike Timms
CSE Report 703, 2006
Summary
Theory and research suggest the critical role that formative assessment can play in student learning. The use of assessment in guiding instruction has long been advocated: Through the assessment of students’ needs and the monitoring of student progress, learning sequences can be appropriately designed, instruction adjusted during the course of learning, and programs refined to be more effective in promoting student learning goals. Moving toward more modern pedagogical conceptions, assessment moves from an information source on which to base action to part and parcel of the teaching and learning process. The following study provides food for thought about the research methods needed to study teachers’ assessment practices and the complexity of assessing their effects on student learning. On the one hand, our study suggests that effective formative assessment is a highly interactive endeavor, involving the orchestration of multiple dimensions of practice, and demands sophisticated qualitative methods for study. On the other, detecting and understanding learning effects in small samples, even with the availability of comparison groups, poses difficulties to say the least.
#823 – On the Road to Assessing Deeper Learning: The Status of Smarter Balanced and PARCC Assessment Consortia
Joan Herman, and Robert Linn
CRESST Report 823, January 2013
Summary
Two consortia, the Smarter Balanced Assessment Consortium (Smarter Balanced) and the Partnership for Assessment of Readiness for College and Careers (PARCC), are currently developing comprehensive, technology-based assessment systems to measure students’ attainment of the Common Core State Standards (CCSS). The consequences of the consortia assessments, slated for full operation in the 2014/15 school year, will be significant. The assessments themselves and their results will send powerful signals to schools about the meaning of the CCSS and what students know and are able to do. If history is a guide, educators will align curriculum and teaching to what is tested, and what is not assessed largely will be ignored. Those interested in promoting students’ deeper learning and development of 21st century skills thus have a large stake in trying to assure that consortium assessments represent these goals.
Funded by the William and Flora Hewlett Foundation, UCLA’s National Center for Research on Evaluation, Standards, and Student Testing (CRESST) is monitoring the extent to which the two consortia’s assessment development efforts are likely to produce tests that measure and support goals for deeper learning. This report summarizes CRESST findings thus far, describing the evidence- centered design framework guiding assessment development for both Smarter Balanced and PARCC as well as each consortia’s plans for system development and validation. This report also provides an initial evaluation of the status of deeper learning represented in both consortia’s plans.
Study results indicate that PARCC and Smarter Balanced summative assessments are likely to represent important goals for deeper learning, particularly those related to mastering and being able to apply core academic content and cognitive strategies related to complex thinking, communication, and problem solving. At the same time, the report points to the technical, fiscal, and political challenges that the consortia face in bringing their plans to fruition.
#820 – Validating Measures of Algebra Teacher Subject Matter Knowledge and Pedagogical Content Knowledge
Rebecca E. Buschang, Gregory K.W.K. Chung, Girlie C. Delacruz, and Eva L. Baker
CRESST Report 820, September 2012
Summary
The purpose of this study was to validate inferences about scores of one task designed to measure subject matter knowledge and three tasks designed to measure aspects of pedagogical content knowledge. Evidence for the validity of inferences was based on two expectations. First, if tasks were sensitive to expertise, we would find group differences. Second, tasks that measured similar types of knowledge would correlate strongly, and tasks that measured different types of knowledge would correlate weakly. We recruited and assessed four groups of participants including 46 experienced algebra teachers (2+ years experience), 17 novice algebra teachers (0-2 years experience), 10 teaching experts, and 13 subject matter experts. Results indicate one task differentiated among levels of expertise and measured several aspects of knowledge needed to teach algebra. Results also highlight that future studies should use a combination of tasks to accurately measure different aspects of teacher knowledge.
#391 – A First Look: Are Claims for Alternative Assessment Holding Up?
Joan Herman, Davina Klein, Tamela Heath, and Sara Wakai
CSE Report 391, 1994
Summary
Drawing on data from student surveys, demographic data, interviews with students and teachers, and structured classroom observations of students, CRESST researchers studied teachers and students who participated in the 1993 California Learning Assessment System (CLAS) test in mathematics. Among the key findings - alternative assessments stimulate student thinking and problem-solving. Students understand that something different and more rigorous is required in open-ended vs. multiple-choice questions. "This is not to say," write CRESST researchers Joan Herman, Davina Klein, Tamela Heath, and Sara Wakai, "that students `like' open-ended items more than multiple choice ones." In fact, add the authors, students prefer multiple-choice problems, perhaps because they are familiar with these types of problems and because they think they perform better on standardized tests. But the results tend to support the idea that students learn from performance assessments. One of the major research questions was whether or not students in different types of schools have equal opportunities-to-learn (OTL) the material being assessed. Researchers surveyed students on a variety of OTL indicators including if they had adequate access to calculators, opportunities to work on problems that can be solved in more than one way or problems that required them to explain their thinking. Surprisingly, the researchers found that urban school students had equal access to many OTL resources such as calculators and a curriculum that went beyond standard "drill and kill" instruction. More problematic was the finding that urban students tended to have more questions about key concepts in mathematical thinking and less access to current textbooks than their suburban counterparts. Finally, interviews and surveys indicated that suburban students clearly felt better prepared than either urban or rural students for the CLAS assessments. The authors note that their results are preliminary and say that the next part of this study will include actual student performance on the CLAS tests. "We will be looking more closely," say the researchers, "at the interrelationships among and between student demographics, instructional practices, attitudes, and performance."
#352 – Collaborative Group Versus Individual Assessment in Mathematics: Group Processes and Outcomes
Noreen Webb
CSE Report 352, 1993
Summary
Several states, such as Connecticut and California, are attempting to incorporate group assessment into their large-scale testing programs. One intention of such efforts is to use scores from group assessments as indicators of individual performance. However, a key technical question for such assessments is "to what extent do scores on a group assessment actually represent individual performance or knowledge." This study by UCLA professor and CRESST researcher Noreen Webb sheds some light on this substantial technical question. Webb gave two seventh-grade classes an initial mathematics test as a group assessment, where exchange of information and assistance was common. Several weeks later, she administered a nearly identical individual test to the same students where assistance was not permitted. The results showed that some students' performance dropped significantly from the group assessment to the individual test. These students apparently depended on the resources of the group in order to get correct answers and when the same resources were not available during the individual test, many of the students were not able to solve the problem. Webb concluded: "Scores from a group assessment may not be valid indicators of some students' individual competence. Furthermore, achievement scores from group assessment contexts provide little information about group functioning." Webb's study suggests that states or school districts who intend to assign individual scores based on group assessments may want to seriously rethink their intentions.
#672 – Assessing Academic Rigor in Mathematics Instruction: The Development of the Instructional Quality Assessment Toolkit
Melissa Boston and Mikyung Kim Wolf
CSE Report 672, 2006
Summary
The development of an assessment tool to measure the quality of instruction is necessary to provide an informative accountability system in education. Such a tool should be capable of characterizing the quality of teaching and learning that occurs in actual classrooms, schools, or districts. The purpose of this paper is to describe the development of the Academic Rigor in Mathematics (AR-Math) rubrics of the Instructional Quality Assessment Toolkit and to share the findings from a small pilot study conducted in the Spring of 2003. The study described in this paper examined the instructional quality of mathematics programs in elementary classrooms in two urban school districts. The study assessed the reliability of the AR-Math rubrics, the ability of the AR-Math rubrics to distinguish important differences between districts, the relationships between rubric dimensions, and the generalizability of the assignment collection. Overall, exact reliability ranged from poor to fair, though 1-point reliability was excellent. Even with the small sample size, the rubrics were capable of detecting difference in students’ opportunities to learn mathematics in each district. The paper concludes by suggesting how the AR-Math rubrics might serve as professional development tools for mathematics teachers.