Like this Product? Get CRESST News Every Month!

Don't miss out on the latest CRESST news including our FREE reports and products. Subscribe to the monthly CRESST E-newsletter right now!

We only use this address for the monthly CRESST E-Newsletter and will ask for confirmation before adding you to our list.



No thanks | Don't ask again

Reports

Please note that CRESST reports were called "CSE Reports" or "CSE Technical Reports" prior to CRESST report 723.

#722 – On the Roles of External Knowledge Representations in Assessment Design
Robert J. Mislevy, John T. Behrens, Randy E. Bennett, Sarah F. Demark, Dennis C. Frezzo, Roy Levy, Daniel H. Robinson, Daisy Wise Rutstein, Valerie J. Shute, Ken Stanley, Fielding I. Winters

Summary
People use external knowledge representations (EKRs) to identify, depict, transform, store, share, and archive information. Learning how to work with EKRs is central to becoming proficient in virtually every discipline. As such, EKRs play central roles in curriculum, instruction, and assessment. Five key roles of EKRs in educational assessment are described:

  1. An assessment is itself an EKR, which makes explicit the knowledge that is valued, ways it is used, and standards of good work.

  2. The analysis of any domain in which learning is to be assessed must include the identification and analysis of the EKRs in that domain.

  3. Assessment tasks can be structured around the knowledge, relationships, and uses of domain EKRs.

  4. "Design EKRs" can be created to organize knowledge about a domain in forms that support the design of assessment.

  5. EKRs from the discipline of assessment design can guide and structure the domain analyses noted in (2), task construction (3), and the creation and use of design EKRs noted in (4).

The third and fourth roles are discussed and illustrated in greated detail, through the perspective of an "evidence-centered" assessment design framework that reflects the fifth role. Connections with automated task construction and scoring are highlighted. Ideas are illustrated with two examples: "generate examples" tasks and simulation-based tasks for assessing computer network design and troubleshooting skills.

#721 – Closing the Gap? A Comparison of Changes Over Time in White-Black and White-Hispanic Achievement Gaps on State Assessments Versus State NAEP
Varick Erickson, Andrew Ho, Deborah Holtzman, Andrew Jaciw, Brian Lukoff, Xuejun Shen, Xin Wei, Edward Haertel

Summary
When a state test and National Assessment of Educational Progress (NAEP) are both measuring the same construct, the achievement gaps between subgroups on both tests should be the same. However, if a teacher or school engages in “teaching to the test” then student performance may improve on one test but not on another. We hypothesized that teaching to the test could have consequences for changes in achievement gaps over time because, for a variety of reasons, students in low-achieving schools or classrooms may be more likely to receive instruction narrowly focused on increasing their test scores. Our analysis proceeded by examining (at the state level) gaps between White students (the “reference” group) and either Black or Hispanic students (a “focal” group). The clearest conclusion from our state-by-state analyses of state and NAEP test data is that the pattern of gap changes varies widely both between and within states. Further, gap changes came in a variety of forms, and not all types of gap reduction are equally desirable.

#720 – Developing Academic English Language Proficiency Prototypes for 5th Grade Reading: Psychometric and Linguistic Profiles of Tasks
An Extended Executive Summary

Alison L. Bailey, Becky H. Huang, Hye Won Shin, Tim Farnsworth, Frances A. Butler

Summary
Within an evidentiary framework for operationally defining academic English language proficiency (AELP), linguistic analyses of standards, classroom discourse, and textbooks have led to specifications for assessment of AELP. The test development process described here is novel due to the emphasis on using linguistic profiles to inform the creation of test specifications and guide the writing of draft tasks. In this report, we outline the test development process we have adopted and provide the results of studies designed to turn the drafted tasks into illustrative prototypes (i.e., tried out tasks) of AELP for the 5th grade. The tasks use the reading modality; however, they were drafted to measure the academic language construct and not reading comprehension per se. That is, the tasks isolate specific language features (e.g., vocabulary, grammar, language functions) occurring in different content areas (e.g., mathematics, science, and social studies texts). Taken together these features are necessary for reading comprehension in the content areas. Indeed, students will need to control all these features in order to comprehend information presented in their textbooks. By focusing on the individual language features, rather than the subject matter or overall meaning of a text, the AELP tasks are designed to help determine whether a student has sufficient antecedent knowledge of English language features to be able to comprehend the content of a text.

The work reported here is the third and final stage of an iterative test development process. In previous National Center for Research on Evaluation, Standards, and Student Testing (CRESST) work, we conducted a series of studies to develop specifications and create tasks of AELP. Specifically, we first specified the construct by synthesizing evidence from linguistic analyses of ELD and content standards, textbooks (mathematics, science, and social studies), and teacher talk in classrooms, resulting in language demand profiles for the 5th grade. After determining task format by frequency of assessment types in textbooks, we then created draft tasks aligned with the language profiles.

The goals of the current effort were to take these previously drafted tasks and create prototypes by trying out the tasks for the first time with 224 students from native English and English language learner (ELL) backgrounds. Students across the 4th-6th grades, as well as native-English students, are included in the studies because native speakers and adjacent grades provide critical information about the targeted language abilities of mainstream students at the 5th grade level. Phase 1 (n= 96) involved various tryouts of 101 draft tasks to estimate duration of administration, clarity of directions, whole-class administration procedures, and an opportunity to administer verbal protocols to provide further information about task accessibility and characteristics. Phase 2, the pilot stage, involved administration of 40 retained tasks (35 of which were modified as a result of Phase 1) to students in whole-class settings (n=128). Analyses included item difficulty and item discrimination. The rationale for retaining or rejecting tasks is presented along with psychometric/linguistic profiles documenting the evolution of example effective and ineffective prototype tasks. The final chapter of the report reflects on the lessons learned from the test development process we adopted and makes suggestions for further advances in this area.

#719 – Impact of Different Performance Assessment Cut Scores on Student Promotion
Jia Wang, David Niemi, and Haiwen Wang

Summary
The goals of the current effort were to take these previously drafted tasks and create prototypes by trying out the tasks for the first time with 224 students from native English and English language learner (ELL) backgrounds. Students across the 4th-6th grades, as well as native-English students, are included in the studies because native speakers and adjacent grades provide critical information about the targeted language abilities of mainstream students at the 5th grade level. Phase 1 (n= 96) involved various tryouts of 101 draft tasks to estimate duration of administration, clarity of directions, whole-class administration procedures, and an opportunity to administer verbal protocols to provide further information about task accessibility and characteristics. Phase 2, the pilot stage, involved administration of 40 retained tasks (35 of which were modified as a result of Phase 1) to students in whole-class settings (n=128). Analyses included item difficulty and item discrimination. The rationale for retaining or rejecting tasks is presented along with psychometric/linguistic profiles documenting the evolution of example effective and ineffective prototype tasks. The final chapter of the report reflects on the lessons learned from the test development process we adopted and makes suggestions for further advances in this area.

#718 – Examining the Generalizability of Direct Writing Assessment Tasks
Eva Chen, David Niemi, Jia Wang, Haiwen Wang, Jim Mirocha

Summary
This study investigated the level of generalizability across a few high quality assessment tasks and the validity of measuring student writing ability using a limited number of essay tasks. More specifically, the research team explored how well writing prompts could measure student general writing ability and if student performance from one writing task could be generalized to other similar writing tasks. A total of four writing prompts were used in the study, with three tasks being literature-based and one task based on a short story. A total of 397 students participated in the study and each student was randomly assigned to complete two of the four tasks. The research team found that three to five essays were required to evaluate and make a reliable judgment of student writing performance.

#717 – School Improvement Under Test-Driven Accountability: A Comparison of High- and Low-Performing Middle Schools in California
Heinrich Mintrop, Tina Trujillo

Summary
Based on in-depth data from nine demographically similar schools, the study asks five questions in regard to key aspects of the improvement process and that speak to the consequential validity of accountability indicators: Do schools that differ widely according to system performance criteria also differ on the quality of the educational experience they provide to students? Are schools that have posted high growth on the state’s performance index more effective organizationally? Do high-performing schools respond more productively to the messages of their state accountability system? Do high- and low-performing schools exhibit different approaches to organizational learning and teacher professionalism? Is district instructional management in an aligned state accountability system related to performance?

We report our findings in three results papers (Mintrop & Trujillo, 2007a, 2007b; Trujillo & Mintrop, 2007) and this technical report. The results papers, in a nutshell, show that, across the nine case study schools, one positive performance outlier differed indeed in the quality of teaching, organizational effectiveness, response to accountability, and patterns of organizational learning. Across the other eight schools, however, the patterns blurred. We conclude that, save for performance differences on the extreme positive and negative margins, relationships between system-designated performance levels and improvement processes on the ground are uncertain and far from solid. The papers try to elucidate why this may be so.

This final technical report summarizes the major components of the study design and methodology, including case selection, instrumentation, data collection, and data analysis techniques. We describe the context of the study as well as descriptive data on our cases and procedures.

#716 – Should Grade Retention Decisions Be Indicators-based or Model-driven?
Jia Wang, Haiwen Wang

Summary
This study evaluates a large urban district's standards-based promotion policy decisions against a model-driven classification. Hierarchical logistic regression was used to explore factors related to grade retention at both the student and school level. Statistical results indicate that using students' next year achievement test scores as criteria, the model-driven classification of retention is better than the districts' standards-based promotion policiy decisions. Students who were predicted by the model to be retained, but were not retained, had lower test scores than students who were actually retained. School districts may incorporate this model-based approach into their current promotion and retention decision-making as an additional measure to improve the precision of their decisions.

#715 – Changes in the Black-White Test Score Gap in the Elementary School Grades
Daniel Koretz, Young-Suk Kim

Summary
In a pair of recent studies, Fryer and Levitt (2004a, 2004b) analyzed the Early Childhood Longitudinal Study – Kindergarten Cohort (ECLS-K) to explore the characteristics of the Black-White test score gap in young children. They found that the gap grew markedly between kindergarten and the third grade and that they could predict the gap from measured characteristics in kindergarten but not in the third grade. In addition, they found that the widening of the gap was differential across areas of knowledge and skill, with Blacks falling behind in all areas other than the most basic. They raised the possibility that Black and Whites may not be on “parallel trajectories” and that Blacks, as they go through school, may never master some skills mastered by Whites.

This study re-analyzes the ECLS-K data to address this last question. We find that the scores used by Fryer and Levitt (proficiency probability scores, or PPS) do not support the hypothesis of differential growth of the gap. The patterns they found reflect the nonlinear relationships between overall proficiency, ϑ , and the PPS variables, as well as ceiling effects in the PPS distributions. Moreover, ϑ is a sufficient statistic for the PPS variables, and therefore, PPS variables merely re-express the overall mean difference between groups and contain no information about qualitative differences in performance between Black and White students at similar levels of ϑ . We therefore carried out differential item functioning (DIF) analyses of all items in all rounds of the ECLS-K through grade 5 (Round 6), excluding only the fall of grade 1 (which was a very small sample) and subsamples in which there were too few Black students for reasonable analysis. We found no relevant patterns in the distribution of the DIF statistics or in the characteristics of the items showing DIF that support the notion of differential divergence, other than in kindergarten and the first grade, where DIF favoring Blacks tended to be on items tapping simple skills taught outside of school (e.g., number recognition), while DIF disfavoring Blacks tended to be on material taught more in school (e.g., arithmetic). However, there were exceptions to this. Moreover, because of its construction and reporting, the ECLS-K data were not ideal for addressing this question, and data better suited to the purpose might show differential divergence across areas of knowledge and skill. The paper concludes by advising secondary analysts examining this question to be wary of aspects of test design that may influence the results and to be sensitive to likely variations in findings across databases.

#714 – Exploring the Intellectual, Social and Organizational Capitals at LA's BEST
Denise Huang, Judy Miyoshi, Deborah La Torre, Anne Marshall, Patricia Perez, Cynthia Peterson

Summary
This exploratory study sets out to investigate how LA’s BEST, a non‐profit after school organization providing services for at –risk students, leverages the organizational, social, and intellectual capitals to enhance student engagement. Six LA’s BEST sites were selected to participate in this qualitative study. A grounded theory approach was employed and both interviews and focus groups were conducted with key LA’s BEST program personnel and participants, as well as day school personnel, parents, and community members. To place our findings into context with our study population, Maslow’s Theory on the Hierarchy of Needs (1954) was introduced. The findings revealed that in leveraging their intellectual, social, and organizational capitals, LA’s BEST has provided an important level of support for the students. In addition, LA’s BEST has realized that fostering and maintaining social capital is a continuous task calling for the efforts of “communities of practice.” As a learning organization, LA’s BEST has accepted this challenge and has expanded their efforts to continue learning and growing.

#713 – The Practical Relevance of Accountability Systems for School Improvement: A Descriptive Analysis of California Schools
Heinrich Mintrop, Tina Trujillo

Summary
In search for the practical relevance of accountability systems for school improvement, we ask whether practitioners traveling between the worlds of system-designated highand low-performing schools would detect tangible differences by observing concrete behaviors, looking at student work, or inquiring about teacher, administrator, or student perceptions. Would they see real differences in educational quality? Would they find schools that are truly more effective? In this study, we compare nine exceptionally high and low performing urban middle schools within the California accountability system. Traversing the nine schools, our travelers would learn that schools that grew on the state performance indicator tended to generate internal commitment for the accountability system. They eschewed the coercive aspects of accountability, maintained a climate of open communication, and considered the system as an impetus for raising expectations and work standards. On the instructional side, this commitment translated into the forceful implementation of structured language arts and literacy programs that were aligned with the accountability system. If our travelers expected to encounter visible signs of an overall higher quality of students’ educational experience in the highperforming schools, they would be disappointed. Rather they would have to settle on a much narrower definition of quality that homes in on attitudes and behaviors that are quite proximate to the effective acquisition of standards-aligned and test-relevant knowledge.