11.9: Constructed Response Items

Last updated
Save as PDF

Page ID: 11643

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

Formal assessment also includes constructed response items in which students are asked to recall information and create an answer— not just recognize if the answer is correct— so guessing is reduced. Constructed response items can be used to assess a wide variety of kinds of knowledge and two major kinds are discussed: completion or
short answer (also called short response) and extended response.

Completion and short answer

Completion and short answer items can be answered in a word, phrase, number, or symbol. These types of items are essentially the same only varying in whether the problem is presented as a statement or a question (Linn & Miller 2005). For example:

Exercise \(\PageIndex{1}\)

Completion: The first traffic light in the US was invented by

Short Answer: Who invented the first traffic light in the US?

These items are often used in mathematics tests, e.g.

Exercise \(\PageIndex{2}\)

3 + 10 = ?

If x = 6, what does x(x-i) =

Draw the line of symmetry on the following shape

A major advantage of these items is they that they are easy to construct. However, apart from their use in mathematics they are unsuitable for measuring complex learning outcomes and are often difficult to score. Completion and short answer tests are sometimes called objective tests as the intent is that there is only one correct answer and so there is no variability in scoring but unless the question is phrased very carefully, there are frequently a variety of correct answers. For example, consider the item Where was President Lincoln born?

The teacher may expect the answer "in a log cabin" but other correct answers are also "on Sinking Spring Farm", 'in Hardin County" or "in Kentucky". Common errors in these items are summarized in Table \(\PageIndex{1}\).

Table \(\PageIndex{1}\) : Common errors in constructed response items

Type of item	Common errors	Example
Completion and short answer	There is more than one possible answer.	e.g. Where was US President Lincoln born? The answer could be in a log cabin, in Kentucky etc.
	Too many blanks are in the completion item so it is too difficult or doesn’t make sense.	e.g. In ….. theory, the first stage,is when infants process through their ……. and ….. ………
	Clues are given by length of blanks in completion items.	e.g. Three states are contiguous to New Hampshire: . ….is to the West, ……is to the East andis to the South.
Extended Response	Ambiguous questions	e.g. Was the US Civil War avoidable? Students could interpret this question in a wide variety of ways, perhaps even stating “yes” or “no”. One student may discuss only political causes another moral, political and economic causes. There is no guidance in the question for students.
	Poor reliability in grading	The teacher does not use a scoring rubric and so is inconsistent in how he scores answers especially unexpected responses, irrelevant information, and grammatical errors.
	Perception of student influences grading	By spring semester the teacher has developed expectations of each student’s performance and this influences the grading (numbers can be used instead of names). The test consists of three constructed responses and the teacher grades the three answers on each students’ paper before moving to the next paper. This means that the grading of questions 2 and 3 are influenced by the answers to question 1 (teachers should grade all the 1st question then the 2nd etc).
	Choices are given on the test and some answers are easier than others.	Testing experts recommend not giving choices in tests because then students are not really taking the same test creating equity problems.

Extended response

Extended response items are used in many content areas and answers may vary in length from a paragraph to several pages. Questions that require longer responses are often called essay questions. Extended response items have several advantages and the most important is their adaptability for measuring complex learning outcomes— particularly integration and application. These items also require that students write and therefore provide teachers a way to assess writing skills. A commonly cited advantage to these items is their ease in construction; however, carefully worded items that are related to learning outcomes and assess complex learning are hard to devise (Linn & Miller, 2005). Well-constructed items phrase the question so the task of the student is clear. Often this involves providing hints or planning notes. In the first example below the actual question is clear not only because of the wording but because of the format (i.e. it is placed in a box). In the second and third examples planning notes are provided:

Example \(\PageIndex{1}\): Third grade mathematics

The owner of a bookstore gave 14 books to the school. The principal will give an equal number of books to each of three classrooms and the remaining books to the school library. How many books could the principal give to each student and the school?

Show all your work on the space below and on the next page. Explain in words how you found the answer. Tell why you took the steps you did to solve the problem.

From Illinois Standards Achievement Test, 2006; ( www.isbe.state.il.us/assessment/isat.htm ')

Example \(\PageIndex{2}\): Fifth grade science: The grass is always greener

Jose and Maria noticed three different types of soil, black soil, sand, and clay, were found in their neighborhood. They decided to investigate the question, "How does the type of soil (black soil, sand, and clay) under grass sod affect the height of grass?"

Plan an investigation that could answer their new question.

In your plan, be sure to include:

Prediction of the outcome of the investigation
Materials needed to do the investigation
Procedure that includes:
logical steps to do the investigation
- one variable kept the same (controlled)
- one variable changed (manipulated)
- any variables being measure and recorded

• how often measurements are taken and recorded
(From Washington State 2004 assessment of student learning )

Exercise \(\PageIndex{3}\): Grades 9-11 English

Writing prompt

Some people think that schools should teach students how to cook. Other people think that cooking is something that ought to be taught in the home. What do you think? Explain why you think as you do.

Planning notes

Choose One:

I think schools should teach students how to cook
I think cooking should 1 be taught in the home

I think cooking should be taught in _________ because________
(school) or (the home)

(From Illinois Measure of Annual Growth in English www.isbe.state.il.us/assessment/image.htm dead link found in the Internet Archive)

A major disadvantage of extended response items is the difficulty in reliable scoring. Not only do various teachers score the same response differently but also the same teacher may score the identical response differently on various occasions (Linn & Miller 2005). A variety of steps can be taken to improve the reliability and validity of scoring. First, teachers should begin by writing an outline of a model answer. This helps make it clear what students are expected to include. Second, a sample of the answers should be read. This assists in determining what the students can do and if there are any common misconceptions arising from the question. Third, teachers have to decide what to do about irrelevant information that is included (e.g. is it ignored or are students penalized) and how to evaluate mechanical errors such as grammar and spelling. Then, a point scoring or a scoring rubric should be used.

In point scoring components of the answer are assigned points. For example, if students were asked:

Exercise \(\PageIndex{4}\)

What are the nature, symptoms, and risk factors of hyperthermia?

Point Scoring Guide:

Definition (natures) 2 pts
Symptoms (1 pt for each) 5 pts
Risk Factors (1 point for each) 5 pts
Writing 3 pts

This provides some guidance for evaluation and helps consistency but point scoring systems often lead the teacher to focus on facts (e.g. naming risk factors) rather than higher level thinking that may undermine the validity of the assessment if the teachers' purposes include higher level thinking. A better approach is to use a scoring rubric that describes the quality of the answer or performance at each level.

Scoring rubrics

Scoring rubrics can be holistic or analytical. In holistic scoring rubrics, general descriptions of performance are made and a single overall score is obtained. An example from grade 2 language arts in Los Angeles Unified School District classifies responses into four levels: not proficient, partially proficient, proficient and advanced is on Table \(\PageIndex{1}\).

Table \(\PageIndex{2}\) : Example of holistic scoring rubric: English language arts grade 2

Assignment. Write about an interesting, fun, or exciting story you have read in class this year. Some of the things you could write about are:

What happened in the story (the plot or events)
Where the events took place (the setting)
People, animals, or things in the story ( the characters)

In your writing make sure you use facts and details from the story to describe everything clearly.

After you write about the story, explain what makes the story interesting, fun or exciting.

Scoring rubric
Advanced Score 4	The response demonstrates well-developed reading comprehension skills. Major story elements (plot, setting, or characters) are clearly and accurately described. Statements about the plot, setting, or characters are arranged in a manner that makes sense. Ideas or judgments (why the story is interesting, fun, or exciting) are clearly supported or explained with facts and details from the story.
Proficient Score 3	The response demonstrates solid reading comprehension skills. Most statements about the plot, setting, or characters are clearly described. Most statements about the plot, setting, or characters are arranged in a manner that makes sense. Ideas or judgments are supported with facts and details from the story.
Partially Proficient Score 1	The response demonstrates some reading comprehension skills There is an attempt to describe the plot, setting, or characters Some statements about the plot, setting, or characters are arranged in a manner that makes sense. Ideas or judgments may be supported with some facts and details from the story.
Not Proficient Score 1	The response demonstrates little or no skill in reading comprehension. The plot, setting, or characters are not described, or the description is unclear. Statements about the plot, setting, or characters are not arranged in a manner that makes sense. Ideas or judgments are not stated, and facts and details from the text are not used.
Source: Adapted from English Language Arts Grade 2 Los Angeles Unified School District, 2001 (http://www.cse.ucla.edu/resources/justforteachers_set.htm from the Internet Archive)

Scoring rubric

Advanced

Score 4

The response demonstrates well-developed reading comprehension skills.

Major story elements (plot, setting, or characters) are clearly and accurately described.

Statements about the plot, setting, or characters are arranged in a manner that makes sense.

Ideas or judgments (why the story is interesting, fun, or exciting) are clearly supported or explained with facts and details from the story.

Proficient

Score 3

The response demonstrates solid reading comprehension skills.

Most statements about the plot, setting, or characters are clearly described.

Most statements about the plot, setting, or characters are arranged in a manner that makes sense.

Ideas or judgments are supported with facts and details from the story.

Partially Proficient

Score 1

The response demonstrates some reading comprehension skills

There is an attempt to describe the plot, setting, or characters

Some statements about the plot, setting, or characters are arranged in a manner that makes sense.

Ideas or judgments may be supported with some facts and details from the story.

Not Proficient

Score 1

The response demonstrates little or no skill in reading comprehension.

The plot, setting, or characters are not described, or the description is unclear.

Statements about the plot, setting, or characters are not arranged in a manner that makes sense.

Ideas or judgments are not stated, and facts and details from the text are not used.

Source: Adapted from English Language Arts Grade 2 Los Angeles Unified School District, 2001 (http://www.cse.ucla.edu/resources/justforteachers_set.htm from the Internet Archive)

Analytical rubrics provide descriptions of levels of student performance on a variety of characteristics. For example, six characteristics used for assessing writing developed by the Education Northwest are:

• ideas and content
• organization
• voice
• word choice
• sentence fluency
• conventions

Descriptions of high, medium, and low responses for each characteristic are available from Education Northwest

Holistic rubrics have the advantages that they can be developed more quickly than analytical rubrics. They are also faster to use as there is only one dimension to examine. However, they do not provide students feedback about which aspects of the response are strong and which aspects need improvement (Linn & Miller, 2005). This means they are less useful for assessment for learning. An important use of rubrics is to use them as teaching tools and provide them to students before the assessment so they know what knowledge and skills are expected.

Teachers can use scoring rubrics as part of instruction by giving students the rubric during instruction, providing several responses, and analyzing these responses in terms of the rubric. For example, use of accurate terminology is one dimension of the science rubric in Table 40. An elementary science teacher could discuss why it is important for scientists to use accurate terminology, give examples of inaccurate and accurate terminology, provide that component of the scoring rubric to students, distribute some examples of student responses (maybe from former students), and then discuss how these responses would be classified according to the rubric. This strategy of assessment for learning should be more effective if the teacher (a) emphasizes to students why using accurate terminology is important when learning science rather than how to get a good grade on the test (we provide more details about this in the section on motivation later in this chapter); (b) provides an exemplary response so students can see a model; and (c) emphasizes that the goal is student improvement on this skill not ranking students.

Table \(\PageIndex{3}\) Example of a scoring rubric, Science

*On the High School Assessment, the application of a concept to a practical problem or real-world situation will be scored when it is required in the response and requested in the item stem.

	Level of understanding	Use of accurate scientific terminology	Use of supporting details	Synthesis of information	Application of information*
4	There is evidence in the response that the student has a full and complete understanding.	The use of accurate scientific terminology enhances the response.	Pertinent and complete supporting details demonstrate an integration of ideas.	The response reflects a complete synthesis of information.	An effective application of the concept to a practical problem or real-world situation reveals an insight into scientific principles.
3	There is evidence in the response that the student has a good understanding.	The use of accurate scientific terminology strengthens the response.	The supporting details are generally complete.	The response reflects some synthesis of information.	The concept has been applied to a practical problem or real- world situation.
2	There is evidence in the response that the student has a basic understanding.	The use of accurate scientific terminology may be present in the response.	The supporting details are adequate.	The response provides little or no synthesis of information.	The application of the concept to a practical problem or real-world situation is inadequate.
1	There is evidence in the response that the student has some understanding.	The use of accurate scientific terminology is not present in the response.	The supporting details are only minimally effective.	The response addresses the question.	The application, if attempted, is irrelevant.
0	The student has NO UNDERSTANDING of the question or problem. The response is completely incorrect or irrelevant.

Performance assessments

Typically in performance assessments students complete a specific task while teachers observe the process or
procedure (e.g. data collection in an experiment) as well as the product (e.g. completed report) (Popham, 2005;
Stiggens, 2005). The tasks that students complete in performance assessments are not simple— in contrast to
selected response items— and include the following:

playing a musical instrument
athletic skills
artistic creation
conversing in a foreign language
engaging in a debate about political issues
conducting an experiment in science
repairing a machine
writing a term paper
using interaction skills to play together

These examples all involve complex skills but illustrate that the term performance assessment is used in a variety of ways. For example, the teacher may not observe all of the process (e.g. she sees a draft paper but the final product is written during out-of-school hours) and essay tests are typically classified as performance assessments (Airasian, 2000). In addition, in some performance assessments there may be no clear product (e.g. the performance maybe group interaction skills).

Two related terms, alternative assessment and authentic assessment are sometimes used instead of performance assessment but they have different meanings (Linn & Miller, 2005). Alternative assessment refers to tasks that are not pencil-and-paper and while many performance assessments are not pencil-and paper tasks some are (e.g. writing a term paper, essay tests). Authentic assessment is used to describe tasks that students do that are similar to those in the "real world". Classroom tasks vary in level of authenticity (Popham, 2005). For example, a Japanese language class taught in a high school in Chicago conversing in Japanese in Tokyo is highly authentic— but only possible in a study abroad program or trip to Japan. Conversing in Japanese with native Japanese speakers in Chicago is also highly authentic, and conversing with the teacher in Japanese during class is moderately authentic. Much less authentic is a matching test on English and Japanese words. In a language arts class, writing a letter (to an editor) or a memo to the principal is highly authentic as letters and memos are common work products. However, writing a five-paragraph paper is not as authentic as such papers are not used in the world of work. However, a five paragraph paper is a complex task and would typically be classified as a performance assessment.

Advantages and disadvantages

There are several advantages of performance assessments (Linn & Miller 2005). First, the focus is on complex learning outcomes that often cannot be measured by other methods. Second, performance assessments typically assess process or procedure as well as the product. For example, the teacher can observe if the students are repairing the machine using the appropriate tools and procedures as well as whether the machine functions properly after the repairs. Third, well designed performance assessments communicate the instructional goals and meaningful learning clearly to students. For example, if the topic in a fifth grade art class is one-point perspective the performance assessment could be drawing a city scene that illustrates one point perspective. (www.sanford-artedventures.com) . This assessment is meaningful and clearly communicates the learning goal. This performance assessment is a good instructional activity and has good content validity— common with well designed performance assessments (Linn & Miller 2005).

One major disadvantage with performance assessments is that they are typically very time consuming for students and teachers. This means that fewer assessments can be gathered so if they are not carefully devised fewer learning goals will be assessed— which can reduce content validity. State curriculum guidelines can be helpful in determining what should be included in a performance assessment. For example, Eric, a dance teacher in a high school in Tennessee learns that the state standards indicate that dance students at the highest level should be able to do demonstrate consistency and clarity in performing technical skills by:

performing complex movement combinations to music in a variety of meters and styles
performing combinations and variations in a broad dynamic range
demonstrating improvement in performing movement combinations through self-evaluation
critiquing a live or taped dance production based on given criteria
( https://www.tn.gov/education/instruction/academic-standards/arts-education.html )

Eric devises the following performance task for his eleventh grade modern dance class .

In groups 0/4-6 students will perform a dance at least 5 minutes in length. The dance selected should be multifaceted so that all the dancers can demonstrate technical skills, complex movements, and a dynamic range (Items 1-2). Students will videotape their rehearsals and document how they improved through self evaluation (Item 3). Each group will view and critique the final performance of one other group in class (Item 4). Eric would need to scaffold most steps in this performance assessment. The groups probably would need guidance in selecting a dance that allowed all the dancers to demonstrate the appropriate skills; critiquing their own performances constructively; working effectively as a team, and applying criteria to evaluate a dance.

Another disadvantage of performance assessments is they are hard to assess reliably which can lead to inaccuracy and unfair evaluation. As with any constructed response assessment, scoring rubrics are very important. An example of holistic and analytic scoring rubrics designed to assess a completed product are in Table \(\PageIndex{2}\) and Table \(\PageIndex{3}\). A rubric designed to assess the process of group interactions is in Table \(\PageIndex{4}\).

Table \(\PageIndex{4}\) : Example of group interaction rubric

Score	Time management	Participation and performance in roles	Shared involvement
0	Group did not stay on task and so task was not completed.	Group did not assign or share roles.	Single individual did the task.
1	Group was off-task the majority of the time but task was completed.	Groups assigned roles but members did not use these roles.	Group totally disregarded comments and ideas from some members.
2	Group stayed on task most of the time.	Groups accepted and used some but not all roles.	Group accepted some ideas but did not give others adequate consideration.
3	Group stayed on task throughout the activity and managed time well.	Group accepted and used roles and actively participated.	Groups gave equal consideration to all ideas.
4	Group defined their own approach in a way that more effectively managed the activity.	Group defined and used roles not mentioned to them. Role changes took place that maximized individuals’ expertise.	Groups made specific efforts to involve all group members including the reticent members.
Source: Adapted from Group Interaction ( GI) SETUP ( 2003). Issues, Evidence and You. Ronkonkomo, NY Lab-Aids. (http://cse.edc.org/products/assessment/middleschool/scorerub.asp link through the Internet Archive

This rubric was devised for middle grade science but could be used in other subject areas when assessing group process. In some performance assessments several scoring rubrics should be used. In the dance performance example above Eric should have scoring rubrics for the performance skills, the improvement based on self evaluation, the team work, and the critique of the other group. Obviously, devising a good performance assessment is complex and Linn and Miller (2005) recommend that teachers should:

• Create performance assessments that require students to use complex cognitive skills. Sometimes teachers devise assessments that are interesting and that the students enjoy but do not require students to use higher level cognitive skills that lead to significant learning. Focusing on high level skills and learning outcomes is particularly important because performance assessments are typically so time consuming.

• Ensure that the task is clear to the students. Performance assessments typically require multiple steps so students need to have the necessary prerequisite skills and knowledge as well as clear directions. Careful scaffolding is important for successful performance assessments.

• Specify expectations of the performance clearly by providing students scoring rubrics during the instruction. This not only helps students understand what it expected but it also guarantees that teachers are clear about what they expect. Thinking this through while planning the performance assessment can be difficult for teachers but is crucial as it typically leads to revisions of the actual assessment and directions provided to students.

• Reduce the importance of unessential skills in completing the task. What skills are essential depends on the purpose of the task. For example, for a science report, is the use of publishing software essential? If the purpose of the assessment is for students to demonstrate the process of the scientific method including writing a report, then the format of the report may not be significant. However, if the purpose includes integrating two subject areas, science and technology, then the use of publishing software is important.
Because performance assessments take time it is tempting to include multiple skills without carefully considering if all the skills are essential to the learning goals.