Formal assessment also includes constructed response items in which students are asked to recall information and create an answer— not just recognize if the answer is correct— so guessing is reduced. Constructed response items can be used to assess a wide variety of kinds of knowledge and two major kinds are discussed: completion or
short answer (also called short response) and extended response.
Completion and short answer
Completion and short answer items can be answered in a word, phrase, number, or symbol. These types of items are essentially the same only varying in whether the problem is presented as a statement or a question (Linn & Miller 2005). For example:
Completion: The first traffic light in the US was invented by
Short Answer: Who invented the first traffic light in the US?
These items are often used in mathematics tests, e.g.
3 + 10 = ?
If x = 6, what does x(x-i) =
Draw the line of symmetry on the following shape
A major advantage of these items is they that they are easy to construct. However, apart from their use in mathematics they are unsuitable for measuring complex learning outcomes and are often difficult to score. Completion and short answer tests are sometimes called objective tests as the intent is that there is only one correct answer and so there is no variability in scoring but unless the question is phrased very carefully, there are frequently a variety of correct answers. For example, consider the item Where was President Lincoln born?
The teacher may expect the answer "in a log cabin" but other correct answers are also "on Sinking Spring Farm", 'in Hardin County" or "in Kentucky". Common errors in these items are summarized in Table 38.
Extended response items are used in many content areas and answers may vary in length from a paragraph to several pages. Questions that require longer responses are often called essay questions. Extended response items have several advantages and the most important is their adaptability for measuring complex learning outcomes— particularly integration and application. These items also require that students write and therefore provide teachers a way to assess writing skills. A commonly cited advantage to these items is their ease in construction; however, carefully worded items that are related to learning outcomes and assess complex learning are hard to devise (Linn & Miller, 2005). Well-constructed items phrase the question so the task of the student is clear. Often this involves providing hints or planning notes. In the first example below the actual question is clear not only because of the wording but because of the format (i.e. it is placed in a box). In the second and third examples planning notes are provided:
Example 1: Third grade mathematics:
The owner of a bookstore gave 14 books to the school. The principal will give an equal number of books to each of three classrooms and the remaining books to the school library. How many books could the principal give to each student and the school?
Show all your work on the space below and on the next page. Explain in words how you found the answer. Tell why you took the steps you did to solve the problem.
From Illinois Standards Achievement Test, 2006; ( http://www.isbe.state.il.us/assessment/isat.htm ')
Example 2: Fifth grade science: The grass is always greener
Jose and Maria noticed three different types of soil, black soil, sand, and clay, were found in their neighborhood. They decided to investigate the question, "How does the type of soil (black soil, sand, and clay) under grass sod affect the height of grass?"
Plan an investigation that could answer their new question.
In your plan, be sure to include:
• Prediction of the outcome of the investigation
• Materials needed to do the investigation
• Procedure that includes:
• logical steps to do the investigation
• one variable kept the same (controlled)
• one variable changed (manipulated)
• any variables being measure and recorded
• how often measurements are taken and recorded
(From Washington State 2004 assessment of student learning )
Example 3: Grades 9-11 English:
Some people think that schools should teach students how to cook. Other people think that cooking is something that ought to be taught in the home. What do you think? Explain why you think as you do.
□ I think schools should teach students how to cook
□ I think cooking should 1 be taught in the home
I think cooking should be taught in because
(school) or (the home)
(From Illinois Measure of Annual Growth in English http://www.isbe.state.il.us/assessment/image.htm)
A major disadvantage of extended response items is the difficulty in reliable scoring. Not only do various teachers score the same response differently but also the same teacher may score the identical response differently on various occasions (Linn & Miller 2005). A variety of steps can be taken to improve the reliability and validity of scoring. First, teachers should begin by writing an outline of a model answer. This helps make it clear what students are expected to include. Second, a sample of the answers should be read. This assists in determining what the students can do and if there are any common misconceptions arising from the question. Third, teachers have to decide what to do about irrelevant information that is included (e.g. is it ignored or are students penalized) and how to evaluate mechanical errors such as grammar and spelling. Then, a point scoring or a scoring rubric should be used.
In point scoring components of the answer are assigned points. For example, if students were asked:
What are the nature, symptoms, and risk factors of hyperthermia?
Point Scoring Guide:
Definition (natures) 2 pts
Symptoms (1 pt for each) 5 pts
Risk Factors (1 point for each) 5 pts
Writing 3 pts
This provides some guidance for evaluation and helps consistency but point scoring systems often lead the teacher to focus on facts (e.g. naming risk factors) rather than higher level thinking that may undermine the validity of the assessment if the teachers' purposes include higher level thinking. A better approach is to use a scoring rubric that describes the quality of the answer or performance at each level.
Scoring rubrics can be holistic or analytical. In holistic scoring rubrics, general descriptions of performance are made and a single overall score is obtained. An example from grade 2 language arts in Los Angeles Unified School District classifies responses into four levels: not proficient, partially proficient, proficient and advanced is on Table 39.
Analytical rubrics provide descriptions of levels of student performance on a variety of characteristics. For example, six characteristics used for assessing writing developed by the Northwest Regional Education Laboratory (NWREL) are:
• ideas and content
• word choice
• sentence fluency
Descriptions of high, medium, and low responses for each characteristic are available from:
Holistic rubrics have the advantages that they can be developed more quickly than analytical rubrics. They are also faster to use as there is only one dimension to examine. However, they do not provide students feedback about which aspects of the response are strong and which aspects need improvement (Linn & Miller, 2005). This means they are less useful for assessment for learning. An important use of rubrics is to use them as teaching tools and provide them to students before the assessment so they know what knowledge and skills are expected.
Teachers can use scoring rubrics as part of instruction by giving students the rubric during instruction, providing several responses, and analyzing these responses in terms of the rubric. For example, use of accurate terminology is one dimension of the science rubric in Table 40. An elementary science teacher could discuss why it is important for scientists to use accurate terminology, give examples of inaccurate and accurate terminology, provide that component of the scoring rubric to students, distribute some examples of student responses (maybe from former students), and then discuss how these responses would be classified according to the rubric. This strategy of assessment for learning should be more effective if the teacher (a) emphasizes to students why using accurate terminology is important when learning science rather than how to get a good grade on the test (we provide more details about this in the section on motivation later in this chapter); (b) provides an exemplary response so students can see a model; and (c) emphasizes that the goal is student improvement on this skill not ranking students.
Typically in performance assessments students complete a specific task while teachers observe the process or
procedure (e.g. data collection in an experiment) as well as the product (e.g. completed report) (Popham, 2005;
Stiggens, 2005). The tasks that students complete in performance assessments are not simple— in contrast to
selected response items— and include the following:
• playing a musical instrument
• athletic skills
• artistic creation
• conversing in a foreign language
• engaging in a debate about political issues
• conducting an experiment in science
• repairing a machine
• writing a term paper
• using interaction skills to play together
These examples all involve complex skills but illustrate that the term performance assessment is used in a variety of ways. For example, the teacher may not observe all of the process (e.g. she sees a draft paper but the final product is written during out-of-school hours) and essay tests are typically classified as performance assessments (Airasian, 2000). In addition, in some performance assessments there may be no clear product (e.g. the performance maybe group interaction skills).
Two related terms, alternative assessment and authentic assessment are sometimes used instead of performance assessment but they have different meanings (Linn & Miller, 2005). Alternative assessment refers to tasks that are not pencil-and-paper and while many performance assessments are not pencil-and paper tasks some are (e.g. writing a term paper, essay tests). Authentic assessment is used to describe tasks that students do that are similar to those in the "real world". Classroom tasks vary in level of authenticity (Popham, 2005). For example, a Japanese language class taught in a high school in Chicago conversing in Japanese in Tokyo is highly authentic— but only possible in a study abroad program or trip to Japan. Conversing in Japanese with native Japanese speakers in Chicago is also highly authentic, and conversing with the teacher in Japanese during class is moderately authentic. Much less authentic is a matching test on English and Japanese words. In a language arts class, writing a letter (to an editor) or a memo to the principal is highly authentic as letters and memos are common work products. However, writing a five-paragraph paper is not as authentic as such papers are not used in the world of work. However, a five paragraph paper is a complex task and would typically be classified as a performance assessment.
Advantages and disadvantages
There are several advantages of performance assessments (Linn & Miller 2005). First, the focus is on complex learning outcomes that often cannot be measured by other methods. Second, performance assessments typically assess process or procedure as well as the product. For example, the teacher can observe if the students are repairing the machine using the appropriate tools and procedures as well as whether the machine functions properly after the repairs. Third, well designed performance assessments communicate the instructional goals and meaningful learning clearly to students. For example, if the topic in a fifth grade art class is one-point perspective the performance assessment could be drawing a city scene that illustrates one point perspective. (http://www.sanford-artedventures.com) . This assessment is meaningful and clearly communicates the learning goal. This performance assessment is a good instructional activity and has good content validity— common with well designed performance assessments (Linn & Miller 2005).
One major disadvantage with performance assessments is that they are typically very time consuming for students and teachers. This means that fewer assessments can be gathered so if they are not carefully devised fewer learning goals will be assessed— which can reduce content validity. State curriculum guidelines can be helpful in determining what should be included in a performance assessment. For example, Eric, a dance teacher in a high school in Tennessee learns that the state standards indicate that dance students at the highest level should be able to do demonstrate consistency and clarity in performing technical skills by:
• performing complex movement combinations to music in a variety of meters and styles
• performing combinations and variations in a broad dynamic range
• demonstrating improvement in performing movement combinations through self-evaluation
• critiquing a live or taped dance production based on given criteria
( http://www.tennessee.gov/education/c...danceQi2.shtml ')
Eric devises the following performance task for his eleventh grade modern dance class .
In groups 0/4-6 students will perform a dance at least 5 minutes in length. The dance selected should be multifaceted so that all the dancers can demonstrate technical skills, complex movements, and a dynamic range (Items 1-2). Students will videotape their rehearsals and document how they improved through self evaluation (Item 3). Each group will view and critique the final performance of one other group in class (Item 4). Eric would need to scaffold most steps in this performance assessment. The groups probably would need guidance in selecting a dance that allowed all the dancers to demonstrate the appropriate skills; critiquing their own performances constructively; working effectively as a team, and applying criteria to evaluate a dance.
Another disadvantage of performance assessments is they are hard to assess reliably which can lead to inaccuracy and unfair evaluation. As with any constructed response assessment, scoring rubrics are very important. An example of holistic and analytic scoring rubrics designed to assess a completed product are in Table 39 and Table 40. A rubric designed to assess the process of group interactions is in Table 41.
This rubric was devised for middle grade science but could be used in other subject areas when assessing group process. In some performance assessments several scoring rubrics should be used. In the dance performance example above Eric should have scoring rubrics for the performance skills, the improvement based on self evaluation, the team work, and the critique of the other group. Obviously, devising a good performance assessment is complex and Linn and Miller (2005) recommend that teachers should:
• Create performance assessments that require students to use complex cognitive skills. Sometimes teachers devise assessments that are interesting and that the students enjoy but do not require students to use higher level cognitive skills that lead to significant learning. Focusing on high level skills and learning outcomes is particularly important because performance assessments are typically so time consuming.
• Ensure that the task is clear to the students. Performance assessments typically require multiple steps so students need to have the necessary prerequisite skills and knowledge as well as clear directions. Careful scaffolding is important for successful performance assessments.
• Specify expectations of the performance clearly by providing students scoring rubrics during the instruction. This not only helps students understand what it expected but it also guarantees that teachers are clear about what they expect. Thinking this through while planning the performance assessment can be difficult for teachers but is crucial as it typically leads to revisions of the actual assessment and directions provided to students.
• Reduce the importance of unessential skills in completing the task. What skills are essential depends on the purpose of the task. For example, for a science report, is the use of publishing software essential? If the purpose of the assessment is for students to demonstrate the process of the scientific method including writing a report, then the format of the report may not be significant. However, if the purpose includes integrating two subject areas, science and technology, then the use of publishing software is important.
Because performance assessments take time it is tempting to include multiple skills without carefully considering if all the skills are essential to the learning goals.