The essay item , in our opinion, is one of the most misunderstood, misused, and abused items within the entire paper-and-pencil domain. It has definite strengths, but it also is prone to pronounced weaknesses, some of which are attributable to the construction and scoring of the item. In this chapter, we show you how to avoid construction and scoring weaknesses.
A primary strength of the essay item is its provision for an in-depth detailed analysis of a small area of material (Analysis level). This item further permits assessment of students’ organizational, creative, and writing skills; their ability to build a case and make a point; and their capacity to evaluate phenomena, all of which occur at the Synthesis and Evaluation levels. Also, it is comparatively easy to construct. However, it is of limited use for assessing broad latitudes of content.
Scoring essay items requires expertise, especially since some students are adept at talking around a point. However, clearly defined items and adherence to corresponding rubrics, as we discuss later, will neutralize any diversionary tactics devised by an ill-informed student. Also problematic to the neophyte or inattentive teacher are secondary factors, such as spelling, handwriting, and neatness, but well-constructed rubrics can minimize or even negate such distractions: They can illuminate responses that do not meet specified criteria, regardless of how neat and verbally correct they may be.
As with the short-answer item, student skills in writing essay responses have become increasingly important with the implementation of open-ended questions on statewide examinations. When you know how to construct and score essay items, are aware of the strengths and weaknesses of these items, as well as which cognitive levels are most conducive to their assessment, the result will be higher student scores on both high stakes state examinations and on your own levels.
Following an essay exam, a common answer to the question, “How did you do?” is, “I don’t know…it depends on what the teacher wants.” If this is the response, the items are probably vague, and if students do not understand the question, how can you determine whether they know the answer? Of course, the item should not contain clues to its answer, but rather should specify exactly what is expected of the students; so it is important to structure the item to include the specific points that should appear in the student’s responses. For example, the item may begin with a task-directed statement:
Within two pages, compare and contrast how George Washington and Francis Marion contributed to the Revolutionary War. In your narrative, cite one similarity and one difference, including two supporting arguments for each. Your essay should be well organized as well as grammatically and linguistically sound.
Beginning with task-directed statement, this item leaves not doubt as to what is expected of the student.
Rubrics should be used to score essay items. Rubrics in this context are scoring guides, delineating a point spread for each item and the bases for awarding the points, including partial credit for partially right responses. In some instances, they are best developed as tests are constructed, although generic rubrics can also be helpful. In either case, they help ensure consistency, objectivity and fairness in scoring, and we strongly advocate their use.
For instance, using the previous example, the cited similarity and difference could each have a zero to three-point value, depending on the selection and presentation. Then the four supporting arguments could have a zero to three-point range, based on their relevancy and significance. Although the item stresses organizational, grammatical, and linguistic soundness, these skills should previously have been taught if they are to have rubric point value. If they have not been taught, they should not be factored into the rubric.
Generic rubrics are appropriate for multiple classroom situations. As Arter and McTighe (2001, p. 27) point out, they are “useful to help students understand the nature of quality-the ‘big picture’ details that contribute to the quality of a type of performance or product.” They go on to explain that “task-specific scoring could happen in mathematics, social studies, science, and any class that has a particular content to be learned (p. 28).” In describing task-specific rubrics, they acknowledge that these rubrics “allow students to see what quality looks like in a simple problem-the one at hand” (p.27). Such rubrics also provide for analytic scoring.
Although it is cumbersome, we recommend scoring essay tests item-by-item as opposed to paper-by-paper. This strategy assists you in focusing on a specific area, allowing you to detect patterns in student responses (e.g., a number of students omitting the same point), which may indicate the need to adjust your instruction or the item itself. The item-by-item method also allows you to score the papers more anonymously; you are assessing responses rather than students, thus negating the halo effect. For example, if Fred has previously performed well, you may assume that he is going to perform well on this particular assignment and thus award him undeserved points. Of course, with the scoring of any test-and especially the essay test-if you become tired, stressed, or hungry, stop immediately and do not resume until you have regained homeostasis.