Medium 9781945349171

Making Classroom Assessments Reliable and Valid: How to Assess Student Learning

Views: 227
Ratings: (0)

Making Classroom Assessments Reliable and Valid by Robert J. Marzano will convince you that classroom assessments should become the primary method for formally measuring student learning over other types of assessment in education. Read about the key advantages of classroom assessments over interim, end-of-course, and state assessments in how to assess student learning and measure growth over time.

Marzano also addresses the validity and reliability of classroom assessments and how to improve those metrics before bringing them to their rightful place in K-12 assessments. This book outlines how to revamp validity and reliability to match technical advances made in classroom assessment, instead of matching large-scale assessment’s traditional standards.

Using this book, teachers, schools, and districts can design classroom assessments that are equally if not more reliable and valid than traditional large-scale assessments.

How this book will convince you to use classroom assessments:

  • Consider the history of large-scale assessments in US education and the purpose of standardized testing.
  • Inspect the importance of and future role of classroom assessment.
  • Explore the three mathematical models of reliability, as well as the three major types of validity.
  • Understand the principles of assessment for learning and the importance of measuring students’ individual and comparative growth.
  • Use the provided formulas to create classroom assessments that match traditional interim or end-of-year assessments in reliability and validity.

Contents:
Introduction: The Role of Classroom Assessment
Chapter 1: Discussing the Classroom Assessment Paradigm for Validity
Chapter 2: Designing and Scoring Parallel Assessments
Chapter 3: Discussing the Classroom Assessment Paradigm for Reliability
Chapter 4: Measuring Growth for Groups of Students
Chapter 5: Transforming the System Using the New Classroom Assessment Paradigms
Appendix

List price: $28.99

Your Price: $23.19

You Save: 20%

Remix
Remove
 

6 Chapters

Format Buy Remix

Chapter 1: Discussing the Classroom Assessment Paradigm for Validity

ePub

chapter 1

Discussing the Classroom Assessment Paradigm for Validity

Validity is certainly the first order of business when researchers or educators design CAs. The concept of validity has evolved over the years into a multifaceted construct. As mentioned previously, the initial conception of a test’s validity was that it measures what it purports to measure. As Henry E. Garrett (1937) notes, “the fidelity with which [a test] measures what it purports to measure” (p. 324) is the hallmark of its validity. By the 1950s, though, important distinctions emerged about the nature and function of validity. Samuel Messick (1993) explains that since the early 1950s, validity has been thought of as involving three major types: (1) criterion-related validity, (2) construct validity, and (3) content validity.

While the three types of validity have unique qualities, these distinctions are made more complex by virtue of the fact that one can examine validity from two perspectives. John D. Hathcoat (2013) explains that these perspectives are (1) the instrumental perspective and (2) the argument-based perspective. Validity in general—and the three different types in particular—look quite different depending on the perspective. This is a central theme of this chapter, and I make a case for the argument-based perspective as superior, particularly as it relates to CAs. The chapter also covers the following topics.

 

Chapter 2: Designing and Scoring Parallel Assessments

ePub

chapter 2

Designing and Scoring Parallel Assessments

For the classroom teacher, following the new validity paradigm for CAs is a continual process that sometimes involves all students in class and sometimes involves individual students. There is a variety of CAs that a teacher might use in the measurement process. Designing and scoring parallel assessments includes ten important aspects.

  1.  Traditional tests

  2.  Essays

  3.  Performance tasks, demonstrations, and presentations

  4.  Portfolios

  5.  Probing discussions

  6.  Student self-assessments

  7.  Assessments that cover one level of a proficiency scale

  8.  Complete measurement process

  9.  Assessment planning

10.  Differentiated assessments

Traditional Tests

The term test is probably the most common when referring to CAs. Unfortunately, people use it in vastly different ways. Here, I restrict the meaning to assessments that are written and involve selected-response items, short constructed-response items, or both. The process of designing a traditional test, then, involves generating selected-response items and short constructed-response items that correspond to the various levels of content in a proficiency scale.

 

Chapter 3: Discussing the CA Paradigm for Reliability

ePub

chapter 3

Discussing the CA Paradigm for Reliability

As mentioned in the introduction, the view of reliability from the CA perspective represents a dramatic shift in measurement theory, because it is based on determining score precisions for individual students as measured over a set of parallel assessments as opposed to the differences between students’ scores on a single test. What is perhaps most unique and powerful about the new paradigm for reliability is that it involves explicit estimation of each student’s true score on each assessment. These estimations account for the possible error in the observed scores. To understand this shift in perspective, it is useful to discuss the traditional view of reliability in some depth. Later in this chapter, I’ll also discuss estimating true scores using mathematical models, using technology, discussing the implications for formative and summative scores, using instructional feedback, employing the method of mounting evidence, and considering the issue of scales.

 

Chapter 4: Measuring Growth for Groups of Students

ePub

chapter 4

Measuring Growth for Groups of Students

In chapter 3, I addressed the precision (reliability) of the scores for individual students on a specific topic. This is the primary role of CAs—to generate scores for individual students on specific measurement topics that are as close to each student’s true scores as possible. The method of mathematical models provides teachers with tools for doing this. In this scenario, reliability or precision is defined as determining the model that best fits the observed score data. One might say that this type of reliability is designed to answer the question, On this particular topic, what is the most precise score that you can assign to each student on each assessment over time? While not as rigorous a process mathematically, the method of mounting evidence also provides reasonably precise estimates of students’ true scores, particularly at the end of a series (and particularly for their summative scores).

Estimating the true scores on assessments for individual students on a specific topic is not the only concern of a classroom teacher, though. Rather, comparative student growth is also of interest. A teacher might want to analyze how much one student has grown in a particular topic in comparison to other students in the class or compared to other students in other classes. This might also be of interest to school administrators or district administrators. As we will see in chapter 5, depicting student growth on report cards provides useful information to students, teachers, and parents. Therefore, measuring the growth of groups of students in a reliable manner is beneficial to virtually all constituents in a district, school, or classroom.

 

Chapter 5: Transforming the System Using the New Classroom Assessment Paradigms

ePub

chapter 5

Transforming the System Using the New Classroom Assessment Paradigms

The new paradigms for CAs have the potential of transforming the K–12 system far beyond measurement in the classroom. Taken to their logical endpoints, these paradigms can influence many aspects of schooling that might seem unrelated on the surface. Here we consider two areas of transformation: (1) report cards and (2) teacher evaluations.

Transforming Report Cards

Ultimately, most teachers must translate students’ summative scores for measurement topics addressed during a grading period into an overall or omnibus grade. This can be done using the new paradigms for CA, but the report card will contain more information than traditional report cards. Consider figure 5.1.

Figure 5.1: Report card.

The bar graphs in figure 5.1 represent a student’s proficiency scale scores on specific measurement topics. The dark part of each bar graph indicates a student’s first score at the beginning of the year. The light part of each bar graph represents the student’s score on a proficiency scale at the end of the grading period. As discussed previously, students’ final or summative scores should be computed using mathematical models that best fit the data. If this is not possible, summative scores can be compiled using the method of mounting evidence. For reporting purposes, these scores are rounded up or down to the nearest half-point or quarter-point score.

 

Appendix: Technical Notes

ePub

appendix

Technical Notes

Making Classroom Assessments Reliable and Valid is intended to raise the status of CAs in classrooms, schools, and districts. To do so requires the articulation of psychometric paradigms for validity and reliability different from those employed in the classical test theory. The new paradigm for validity includes CAs designed from highly focused proficiency scales that allow for the construction of parallel assessments scored using the measurement process.

Measurement, by definition, involves translating information for CAs into scores on proficiency scales. The new paradigm for reliability involves analyzing scores from parallel assessments for individual students. Mathematical analyses can provide estimates of each student’s true score on each assessment. These types of analyses require programming using standard spreadsheet software. If such software and programming are not available, teachers can estimate the true score at the conclusion of a set of parallel assessments by examining mounting evidence. Finally, teachers can compare the growth of groups of students by identifying a common way of measuring growth across groups of students on a specific topic.

 



Details

Print Book
E-Books
Chapters

Format name
ePub
Encrypted
No
Sku
BPE0000240259
Isbn
9781945349188
File size
4.38 MB
Printing
Allowed
Copying
Allowed
Read aloud
Allowed
Format name
ePub
Encrypted
No
Printing
Allowed
Copying
Allowed
Read aloud
Allowed
Sku
In metadata
Isbn
In metadata
File size
In metadata