Like this Product? Get CRESST News Every Month!

Don't miss out on the latest CRESST news including our FREE reports and products. Subscribe to the monthly CRESST E-newsletter right now!

We only use this address for the monthly CRESST E-Newsletter and will ask for confirmation before adding you to our list.



No thanks | Don't ask again

Reports

Please note that CRESST reports were called "CSE Reports" or "CSE Technical Reports" prior to CRESST report 723.

#716 – Should Grade Retention Decisions Be Indicators-based or Model-driven?
Jia Wang, Haiwen Wang

Summary
This study evaluates a large urban district's standards-based promotion policy decisions against a model-driven classification. Hierarchical logistic regression was used to explore factors related to grade retention at both the student and school level. Statistical results indicate that using students' next year achievement test scores as criteria, the model-driven classification of retention is better than the districts' standards-based promotion policiy decisions. Students who were predicted by the model to be retained, but were not retained, had lower test scores than students who were actually retained. School districts may incorporate this model-based approach into their current promotion and retention decision-making as an additional measure to improve the precision of their decisions.

#739 – Improving Formative Assessment Practice with Educational Information Technology
Terry P. Vendlinski, David Niemi, Jia Wang, Sara Monempour

Summary
This report describes a web-based assessment design tool, the Assessment Design and Delivery System (ADDS), that provides teachers both a structure and the resources required to develop and use quality assessments. The tool is applicable across subject domains. The heart of the ADDS is an assessment design workspace that allows teachers to decide the attributes of an assessment, as well as the context and type of responses the students will generate, as part of their assessment design process. Although the tool is very flexible and allows the above steps to be done in any order (or skipped entirely), our goal was to streamline and scaffold the process for teachers by organizing all the materials for them in one place and to provide resources they could use or reuse to create assessments for their students. The tool allows teachers to deliver the assessments to their students either online or on paper. Initial results from our first teacher study suggest that teachers who used the tool developed assessments that were more cognitively demanding of students and addressed the "big ideas" rather than disassociated facts of a domain.

#720 – Developing Academic English Language Proficiency Prototypes for 5th Grade Reading: Psychometric and Linguistic Profiles of Tasks
An Extended Executive Summary

Alison L. Bailey, Becky H. Huang, Hye Won Shin, Tim Farnsworth, Frances A. Butler

Summary
Within an evidentiary framework for operationally defining academic English language proficiency (AELP), linguistic analyses of standards, classroom discourse, and textbooks have led to specifications for assessment of AELP. The test development process described here is novel due to the emphasis on using linguistic profiles to inform the creation of test specifications and guide the writing of draft tasks. In this report, we outline the test development process we have adopted and provide the results of studies designed to turn the drafted tasks into illustrative prototypes (i.e., tried out tasks) of AELP for the 5th grade. The tasks use the reading modality; however, they were drafted to measure the academic language construct and not reading comprehension per se. That is, the tasks isolate specific language features (e.g., vocabulary, grammar, language functions) occurring in different content areas (e.g., mathematics, science, and social studies texts). Taken together these features are necessary for reading comprehension in the content areas. Indeed, students will need to control all these features in order to comprehend information presented in their textbooks. By focusing on the individual language features, rather than the subject matter or overall meaning of a text, the AELP tasks are designed to help determine whether a student has sufficient antecedent knowledge of English language features to be able to comprehend the content of a text.

The work reported here is the third and final stage of an iterative test development process. In previous National Center for Research on Evaluation, Standards, and Student Testing (CRESST) work, we conducted a series of studies to develop specifications and create tasks of AELP. Specifically, we first specified the construct by synthesizing evidence from linguistic analyses of ELD and content standards, textbooks (mathematics, science, and social studies), and teacher talk in classrooms, resulting in language demand profiles for the 5th grade. After determining task format by frequency of assessment types in textbooks, we then created draft tasks aligned with the language profiles.

The goals of the current effort were to take these previously drafted tasks and create prototypes by trying out the tasks for the first time with 224 students from native English and English language learner (ELL) backgrounds. Students across the 4th-6th grades, as well as native-English students, are included in the studies because native speakers and adjacent grades provide critical information about the targeted language abilities of mainstream students at the 5th grade level. Phase 1 (n= 96) involved various tryouts of 101 draft tasks to estimate duration of administration, clarity of directions, whole-class administration procedures, and an opportunity to administer verbal protocols to provide further information about task accessibility and characteristics. Phase 2, the pilot stage, involved administration of 40 retained tasks (35 of which were modified as a result of Phase 1) to students in whole-class settings (n=128). Analyses included item difficulty and item discrimination. The rationale for retaining or rejecting tasks is presented along with psychometric/linguistic profiles documenting the evolution of example effective and ineffective prototype tasks. The final chapter of the report reflects on the lessons learned from the test development process we adopted and makes suggestions for further advances in this area.

#730 – Creating Accurate Science Benchmark Assessments to Inform Instruction
Terry P. Vendlinski, Sam Nagashima, Joan L. Herman

Summary
Current educational policy highlights the important role that assessment can play in improving education. State standards and the assessments that are aligned with them establish targets for learning and promote school accountability for helping all students succeed; at the same time, feedback from assessment results is expected to provide districts, schools, and teachers with important information for guiding instructional planning and decision making. Yet even as No Child Left Behind (NCLB) and its requirements for adequate yearly progress put unprecedented emphasis on state tests, educators have discovered that annual state tests are too little and too late to guide teaching and learning. Recognizing the need for more frequent assessments to support student learning, many districts and schools have turned to benchmark testing—periodic assessments through which districts can monitor students’ progress, and schools and teachers can refine curriculum and teaching—to help students succeed. We report in this document a collaborative effort of teachers, district administrators, professional developers, and assessment researchers to develop benchmark assessments for elementary school science. In the sections which follow we provide the rationale for our work and its research question, describe our collaborative assessment development process and its results, and present conclusions.

#763 – The Effects of POWERSOURCE Intervention on Student Understanding of Basic Mathematical Principles
Julia Phelan, Kilchan Choi, Terry Vendlinski, Eva L. Baker, Joan L. Herman

Summary
This report describes results from field-testing of POWERSOURCE formative assessment alongside professional development and instructional resources. The researchers at the National Center for Research, on Evaluation, Standards, & Student Testing (CRESST) employed a randomized, controlled design to address the following question: Does the use of POWERSOURCE strategies improve 6th-grade student performance on assessments of the key mathematical ideas relative to the performance of a comparison group? Sixth-grade teachers were recruited from 7 districts and 25 middle schools. A total of 49 POWERSOURCE and 36 comparison group teachers and their students (2,338 POWERSOURCE, 1,753 comparison group students) were included in the study analyses. All students took a pretest of prerequisite knowledge and a transfer measure of tasks drawn from international tests at the end of the study year. Students in the POWERSOURCE group used sets of formative assessment tasks. POWERSOURCE teachers had exposure to professional development and instructional resources. Results indicated that students with higher pretest scores tended to benefit more from the treatment as compared to students with lower pretest scores. In addition, students in the POWERSOURCE group significantly outperformed control group students on distributive property items and the effect was larger as pretest scores increased. Results, limitations and future directions are discussed.


To cite from this report, please use the following as your APA reference:

Phelan, J., Choi, K., Vendlinski, T., Baker, E. L., & Herman, J. L. (2009). The effects of POWERSOURCE intervention on student understanding of basic mathematical principles (CRESST Report 763). Los Angeles: University of California, National Center for Research on Evaluation, Standards, and Student Testing (CRESST).

#795 – Progress Report Year 4: National Center for Research on Evaluation, Standards, and Student Testing (CRESST) The Development and Impact of POWERSOURCE©
Eva L. Baker

Summary
The POWERSOURCE© intervention is intended to be a powerful formative assessment strategy that can be integrated with any on-going mathematics curriculum. The key goal is to to improve teachers' knowledge and practice and, in turn, student learning in middle school mathematics. In this 4th year study, the authors provide updated results from the 2006-07 experimental (randomized) field test of POWERSOURCE© and present findings from the 2007-08 school year on student and teacher outcomes. The authors found that a short targeted POWERSOURCE© intervention on key mathematical principles had a positive student performance impact on a transfer measure of related content. POWERSOURCE© had more impact on the higher-performing students than the lower-performing students. They also found a stronger effect on more difficult mathematics items.

#760 – Third Year Report: Evaluation of the Artful Learning Program
Noelle C. Griffin, Judy N. Miyoshi

Summary
The National Center for Research on Evaluation, Standards, and Student Testing (CRESST) at UCLA was contracted to undertake a three-year external evaluation of the Artful Learning program, an arts-based school improvement model developed from the work and philosophy of the late composer Leonard Bernstein. This is the third-year report of evaluation findings, with a primary focus on Artful Learning participants in the 2003–2004 school year. The purpose of this report is to provide information about the implementation and impact of the program at current participating school sites, as well as place these findings within the context of the overall findings from the three-year evaluation as a whole. Multiple quantitative and qualitative data collection methods were employed throughout this evaluation. Overall, the findings suggest that the Artful Learning program was a useful tool for teachers with a variety of previous teaching experience, district and state contextual demands, grade/content areas taught, and student populations. Teacher satisfaction with the professional development components of the program were high, although assessment was an area singled out as needing additional support. Recommendations, drawing from all three years of the evaluation, are also discussed.

#717 – School Improvement Under Test-Driven Accountability: A Comparison of High- and Low-Performing Middle Schools in California
Heinrich Mintrop, Tina Trujillo

Summary
Based on in-depth data from nine demographically similar schools, the study asks five questions in regard to key aspects of the improvement process and that speak to the consequential validity of accountability indicators: Do schools that differ widely according to system performance criteria also differ on the quality of the educational experience they provide to students? Are schools that have posted high growth on the state’s performance index more effective organizationally? Do high-performing schools respond more productively to the messages of their state accountability system? Do high- and low-performing schools exhibit different approaches to organizational learning and teacher professionalism? Is district instructional management in an aligned state accountability system related to performance?

We report our findings in three results papers (Mintrop & Trujillo, 2007a, 2007b; Trujillo & Mintrop, 2007) and this technical report. The results papers, in a nutshell, show that, across the nine case study schools, one positive performance outlier differed indeed in the quality of teaching, organizational effectiveness, response to accountability, and patterns of organizational learning. Across the other eight schools, however, the patterns blurred. We conclude that, save for performance differences on the extreme positive and negative margins, relationships between system-designated performance levels and improvement processes on the ground are uncertain and far from solid. The papers try to elucidate why this may be so.

This final technical report summarizes the major components of the study design and methodology, including case selection, instrumentation, data collection, and data analysis techniques. We describe the context of the study as well as descriptive data on our cases and procedures.

#742 – Exploring Data Use and School Performance in an Urban Public School District
Joan L. Herman, Kyo Yamashiro, Sloane Lefkowitz, Lee Ann Trusela

Summary
This study examined the relationship between data use and achievement at 13 urban Title I schools. Using multiple methods, including test scores, district surveys, school transformation plans, and four case study site visits, the researchers found wide variation in the use of data to inform instruction and planning. In some cases, schools were overwhelmed with the amount of data or were not convinced that alternating test score data from two different tests provided dependable information. The researchers did not find a substantial link between data use and achievement, which may have been a result of the small sample size or different implementation methods between schools. Teachers and principals recommended important needs for more timely data delivery, individual versus group data reports, and better training in assessment and data analysis.

#392 – Generalizability of New Standards Project 1993 Pilot Study Tasks in Mathematics
Robert Linn, Elizabeth Burton, Lizanne DeStefano, and Matthew Hanson

Summary
Students may have to take as many as 9-17 "long" performance assessment tasks if educators are to be confident that student performance matches true ability in a given domain, according to this important new CRESST study. Because a long task typically requires students to give complex, multifaceted responses requiring one-to-three hours to administer, the time and cost implications are significant. The performance tasks analyzed are from the New Standards Project, a joint project of the National Center on Education and the Economy and the Learning Research and Development Center. Robert Linn, Elizabeth Burton, Lizanne DeStefano, and Matthew Hanson, conducted the CRESST study. Using a generalizability analysis of New Standards tasks, the CRESST researchers analyzed two primary sources of measurement error that typically lead to unreliability in measurement of student performance: performance tasks and raters, and the interactions of pupils with tasks or raters. Because the New Standards raters were carefully trained and monitored, consistency in rating was generally very high. The greatest error therefore, was due to tasks. Essentially, student performance varied greatly from one performance task to another, suggesting that the tasks may be measuring different skills or that the skills were not measured well by the different tasks. The results confirm findings from several other studies. States or school districts that administer just a few performance tasks and then report individual student scores, may face unacceptably large measurement error. The authors make recommendations that may help resolve some problems. "Since each task," write the authors, "requires an hour or more to administer, a strategy needs to be developed either for combining some shorter tasks with long tasks or for collecting information about student performance over more extended periods of time." The authors add that researchers in the New Standards Project are pursuing both strategies.