Reports
Please note that CRESST reports were called "CSE Reports" or "CSE Technical Reports" prior to CRESST report 723.
#517 – The Role of Classroom Assessment in Teaching and Learning
Lorrie Shepard
CSE Report 517, 2000
Summary
Historically, because of their technical requirements, educational tests of any importance were seen as the province of statisticians and not that of teachers or subject matter specialists. Researchers conceptualizing effective teaching did not assign a significant role to assessment as part of the learning process. The past three volumes of the Handbook of Research on Teaching, for example, did not include a chapter on classroom assessment nor even its traditional counterpart, tests and measurement. Achievement tests were addressed in previous handbooks but only as outcome measures in studies of teaching behaviors. In traditional educational measurement courses, preservice teachers learned about domain specifications, item formats, and methods for estimating reliability and validity. Few connections were made in subject matter methods courses to suggest ways that testing might be used instructionally. Subsequent surveys of teaching practice showed that teachers had little use for statistical procedures and mostly devised end-of-unit tests aimed at measuring declarative knowledge of terms, facts, rules, and principles (Fleming & Chambers, 1983).
The purpose of this chapter is to develop a framework for understanding a reformed view of assessment, where assessment plays an integral role in teaching and learning. If assessment is to be used in classrooms to help students learn, it must be transformed in two fundamental ways. First, the content and character of assessments must be significantly improved. Second, the gathering and use of assessment information and insights must become a part of the ongoing learning process. The model I propose is consistent with current assessment reforms being advanced across many disciplines (e.g., International Reading Association/National Council of Teachers of English Joint Task Force on Assessment, 1994; National Council for the Social Studies, 1991; National Council of Teachers of Mathematics, 1995; National Research Council, 1996). It is also consistent with the general argument that assessment content and formats should more directly embody thinking and reasoning abilities that are the ultimate goals of learning (Frederiksen & Collins, 1989; Resnick & Resnick, 1992). Unlike much of the discussion, however, my emphasis is not on external accountability assessments as indirect mechanisms for reforming instructional practice; instead, I consider directly how classroom assessment practices should be transformed to illuminate and enhance the learning process. I acknowledge, though, that for changes to occur at the classroom level, they must be supported and not impeded by external assessments.
#782 – Year 3 ASK/FOSS Efficacy Study
Ellen Osmundson, Yunyun Dai, and Joan Herman
CRESST Report 782, January 2011
Summary
In this interim report, CRESST researchers examine the effect on teaching and student learning between several different science curricula. Using randomly assigned treatment and control groups of 3rd and 4th grade teachers, this interim report provides important lessons for the upcoming 4th year study.
#484 – Instructional Validity, Opportunity to Learn and Equity: New Standards Examinations for the California Mathematics Renaissance
Bokhee Yoon and Lauren Resnick
CSE Report 484, 1998
Summary
In this report, CRESST researchers examined the relationship between professional development opportunities for teachers, the kinds of instruction offered to students, and student performance on the New Standards Mathematics Reference Examination. By comparing teachers (and their students) who had participated in the California Mathematics Renaissance professional development program with teachers and students elsewhere the researchers were able to evaluate both the effectiveness of the Renaissance program and the instructional validity of the Reference Examination.
#823 – On the Road to Assessing Deeper Learning: The Status of Smarter Balanced and PARCC Assessment Consortia
Joan Herman, and Robert Linn
CRESST Report 823, January 2013
Summary
Two consortia, the Smarter Balanced Assessment Consortium (Smarter Balanced) and the Partnership for Assessment of Readiness for College and Careers (PARCC), are currently developing comprehensive, technology-based assessment systems to measure students’ attainment of the Common Core State Standards (CCSS). The consequences of the consortia assessments, slated for full operation in the 2014/15 school year, will be significant. The assessments themselves and their results will send powerful signals to schools about the meaning of the CCSS and what students know and are able to do. If history is a guide, educators will align curriculum and teaching to what is tested, and what is not assessed largely will be ignored. Those interested in promoting students’ deeper learning and development of 21st century skills thus have a large stake in trying to assure that consortium assessments represent these goals.
Funded by the William and Flora Hewlett Foundation, UCLA’s National Center for Research on Evaluation, Standards, and Student Testing (CRESST) is monitoring the extent to which the two consortia’s assessment development efforts are likely to produce tests that measure and support goals for deeper learning. This report summarizes CRESST findings thus far, describing the evidence- centered design framework guiding assessment development for both Smarter Balanced and PARCC as well as each consortia’s plans for system development and validation. This report also provides an initial evaluation of the status of deeper learning represented in both consortia’s plans.
Study results indicate that PARCC and Smarter Balanced summative assessments are likely to represent important goals for deeper learning, particularly those related to mastering and being able to apply core academic content and cognitive strategies related to complex thinking, communication, and problem solving. At the same time, the report points to the technical, fiscal, and political challenges that the consortia face in bringing their plans to fruition.
#392 – Generalizability of New Standards Project 1993 Pilot Study Tasks in Mathematics
Robert Linn, Elizabeth Burton, Lizanne DeStefano, and Matthew Hanson
CSE Report 392, 1995
Summary
Students may have to take as many as 9-17 "long" performance assessment tasks if educators are to be confident that student performance matches true ability in a given domain, according to this important new CRESST study. Because a long task typically requires students to give complex, multifaceted responses requiring one-to-three hours to administer, the time and cost implications are significant. The performance tasks analyzed are from the New Standards Project, a joint project of the National Center on Education and the Economy and the Learning Research and Development Center. Robert Linn, Elizabeth Burton, Lizanne DeStefano, and Matthew Hanson, conducted the CRESST study. Using a generalizability analysis of New Standards tasks, the CRESST researchers analyzed two primary sources of measurement error that typically lead to unreliability in measurement of student performance: performance tasks and raters, and the interactions of pupils with tasks or raters. Because the New Standards raters were carefully trained and monitored, consistency in rating was generally very high. The greatest error therefore, was due to tasks. Essentially, student performance varied greatly from one performance task to another, suggesting that the tasks may be measuring different skills or that the skills were not measured well by the different tasks. The results confirm findings from several other studies. States or school districts that administer just a few performance tasks and then report individual student scores, may face unacceptably large measurement error. The authors make recommendations that may help resolve some problems. "Since each task," write the authors, "requires an hour or more to administer, a strategy needs to be developed either for combining some shorter tasks with long tasks or for collecting information about student performance over more extended periods of time." The authors add that researchers in the New Standards Project are pursuing both strategies.
#359 – Issues in Innovative Assessment for Classroom Practice: Barriers and Facilitators
Pamela Aschbacher
CSE Report 359, 1993
Summary
As proven by the British experience, we cannot assume that new innovative assessments will be immediately understood and embraced by American teachers. Implementing performance assessments may demand new roles for teachers and students and require a radical paradigm shift among educators--from a focus on content coverage to outcomes achieved. This paper, utilizing an action research approach, describes the findings of CRESST researchers who observed, interviewed, and surveyed teachers involved in implementing alternatives assessments into their classrooms. Probably the most fundamental barrier to developing and implementing sound performance assessments was the pervasive tendency of teachers to think about classroom activities rather than student outcomes. Teachers who used portfolios, for example, focused on what interesting activities might be documented in the portfolios rather than what goals would be achieved as a result of these instructional activities. The study revealed other basic barriers in the development and implementation of alternative assessments, including teacher assessment anxiety, lack of teacher time and training, and teachers' reluctance to change.
#703 – The Nature and Impact of Teachers’ Formative Assessment Practices
Joan L. Herman, Ellen Osmundson, Carlos Ayala, Stephen Schneider, Mike Timms
CSE Report 703, 2006
Summary
Theory and research suggest the critical role that formative assessment can play in student learning. The use of assessment in guiding instruction has long been advocated: Through the assessment of students’ needs and the monitoring of student progress, learning sequences can be appropriately designed, instruction adjusted during the course of learning, and programs refined to be more effective in promoting student learning goals. Moving toward more modern pedagogical conceptions, assessment moves from an information source on which to base action to part and parcel of the teaching and learning process. The following study provides food for thought about the research methods needed to study teachers’ assessment practices and the complexity of assessing their effects on student learning. On the one hand, our study suggests that effective formative assessment is a highly interactive endeavor, involving the orchestration of multiple dimensions of practice, and demands sophisticated qualitative methods for study. On the other, detecting and understanding learning effects in small samples, even with the availability of comparison groups, poses difficulties to say the least.
#368 – Cross-Scorer and Cross-Method Comparability and Distribution of Judgments of Student Math, Reading, and Writing Performance: Results From the New Standards Project Big Sky Scoring Conference
Lauren Resnick, Daniel Resnick, and Lizanne DeStefano
CSE Report 368, 1993
Summary
Partially funded by CRESST, the New Standards Project is an effort to create a state- and district-based assessment and professional development system that will serve as a catalyst for major educational reform. In 1992, as part of a professional development strategy tied to assessment, 114 teachers, curriculum supervisors, and assessment directors met to score student responses from a field test of mathematics and English language arts assessments. The results of that meeting, the Big Sky Scoring Conference, were used to analyze comparability across scorers and comparability across holistic and anaholistic scoring methods. Interscorer reliability estimates," wrote the researchers, "for reading and writing were in the moderate range, below levels achieved with the use of large-scale writing assessment or standardized tasks. Low reliability limits the use of [the] 1992 reading and writing scores for making judgments about student performance or educational programs," concluded the researchers. However, interscorer reliability estimates for math tasks were somewhat higher than for literacy. For six out of seven math tasks, reliability coefficients approached or exceeded acceptable levels. Use of anaholistic and holistic scoring methods resulted in different scores for the same student response. The findings suggest that the large number and varied nature of participants may have jeopardized the production of valid and reliable data. "Scorers reported feeling overwhelmed and overworked after four days of training and scoring," wrote the researchers. Despite these difficulties, evidence was provided that scoring of large-scale performance assessments can be achieved when ample time is provided for training, evaluation, feedback and discussion; clear definitions are given of performance levels and the distinctions between them; and well-chosen exemplars are used.
#661 – Upgrading America’s Use of Information to Improve Student Performance
Margaret Heritage, John Lee, Eva Chen, and Debbie LaTorre
CSE Report 661, 2005
Summary
This report presents a description of web-based decision support tool, the Quality School Portfolio (QSP), developed at the National Center for Research on Evaluation, Standards, and Student Testing (CRESST) at UCLA, a discussion of the professional development to support the implementation of QSP, findings from an evaluation research study of the implementation, and recommendations for a next generation of QSP.
#418 – Assessment and Instruction in the Science Classroom
Gail P. Baxter, Anastasia D. Elder, and Robert Glaser
CSE Report 418, 1996
Summary
Findings from this study of fifth grade students provided further evidence that critical differences exist between students who think and reason well with their knowledge and those who do not. In the study, students received six mystery boxes and were asked to identify the contents by making the components into circuits. The research team found that students who displayed consistently high levels of learning and understanding were able to describe a comprehensive plan for an experiment. Further, these same students demonstrated an efficient approach to problem-solving which included the use of scientific principles. In contrast, lower-performing students invoked a trial-and-error strategy of "hook something up and see what happens" to guide their experiments.
Only 20% of the students performed at high levels, suggesting that even low ability students could complete a problem without understanding the processes or principles involved. The researchers concluded that "Strategies for how to represent problems must be taught as well as strategies for how to solve problems." They suggest that teachers use performance assessments, such as this science experiment, to integrate instruction, assessment, and high levels of student learning.