Test Writing Strategies

Opportunities to Improve Our Educational Approach

What We are About

Welcome!

We are teachers and we want to do our best for our students.  Sometimes we need a chance to see what others are doing to help us “improve our game.”

The goal of this blog is to explore the strategies, philosophies, and various options of test writing.  We’ll take a systematic approach, starting with general tips about tests and test construction and then proceeding through different test item types.

We will look at articles and advice on the Internet and discuss how the ideas may or may not apply to our discipline.  This is not a “one-size-fits-all” topic!  Neither should it be considered a best practices list.  We are the topic experts and the best judges for the information we are assessing.

Everyone is invited to read and comment and offer examples of what worked and what didn’t.

I look forward to your responses.

Tracy Johnston
STEM 1 Curriculum and Program Improvement (CPI) Coordinator
Palomar College

This is a sticky post; newer posts appear below.

For a list of the resources on this blog, click here.

Make Me a Match!

2-pattern-matching

One of my favorite (but not used all that much) test item types is the “matching exercise.”  One class I teach has quite a bit of vocabulary that my students just flat-out need to memorize.  Matching seems like a good, concise way of testing them with a minimum amount of pain on their part (writing the answers) and my part (creating the test).

The sources all agree on the definition:

A matching exercise consists of a list of questions or problems to be answered along with a list of responses.  The examinee is required to make an association between each question and a response.

(Source:http://www.iub.edu/~best/pdf_docs/better_tests.pdf)

I was pleased to see this same source describing the types of material that can be used:

The most common is to use verbal statements…  The problems might be locations on a map, geographic features on a contour map, parts of a diagram of the body, biological specimens, or math problems.

Similarly, the responses don’t have to be terms or labels, they might be functions of various parts of the body, or methods, principles, or solutions.

This other source, http://teaching.uncc.edu/learning-resources/articles-books/best-practice/assessment-grading/designing-test-questions, lists

  • terms with definitions
  • phrases with other phrases
  • causes with effects
  • parts with larger units
  • problems with solutions

As you can see, this test item format is well-suited for testing the Knowledge Level of Bloom’s Taxonomy, however several sources hint that it can apply to the Comprehension Level “if appropriately constructed.”

Only one source discusses in detail how to “aim for higher order thinking skills” by describing variations that address, for example, Analysis and Synthesis.  (http://www.k-state.edu/ksde/alp/resources/Handout-Module6.pdf)

One variation is to give a Keylist or Masterlist, that is information about several objects, and have the student interpret the meaning of the information, do comparisons (least/greatest, highest/lowest, etc.), and translate symbols.  The example gives three elements from the periodic table with the properties listed below them but no title on the properties.  The questions ask “Which of the above elements has the largest atomic weight?” and “Which has the lowest melting point?” and other similar inquiries.

Another variation is a ranking example:

Directions:  Number (1 – 8) the following events in the history of ancient Egypt in the order in which they occurred, using 1 for the earliest event.

These directions are followed by a list of events.

While I see these variations more as the “fill-in-the-blank” types, their connections to matching properties to objects or events to a time line make it reasonable to treat them as matching types.

What are the advantages and disadvantages of matching exercises?

(Source: http://cft.vanderbilt.edu/guides-sub-pages/writing-good-multiple-choice-test-questions/)

These questions help students see the relationships among a set of items and integrate knowledge.

They are less suited than multiple-choice items for measuring higher levels of performance.

 

(Source: http://www.iub.edu/~best/pdf_docs/better_tests.pdf)

Because matching items permit one to cover a lot of content in one exercise, they are an efficient way to measure.

It is difficult, however, to write matching items that require more than simple recall of factual knowledge.

 

(Source:  http://teaching.uncc.edu/learning-resources/articles-books/best-practice/assessment-grading/designing-test-questions)

Maximum coverage at knowledge level in a minimum amount of space/prep time.

Valuable in content areas that have a lot of facts.

But

Time consuming for students.

There are design strategies that can reduce the amount of time it takes for students to work through the exercise, and others that don’t put so much emphasis on reading skills.  We’ll look at those in the next post.

Alternative-Response Design: Structure and Advice

In the previous post we talked about the pros and cons of the alternative-response (e.g., true-false) types of questions as well as their application to Bloom’s Taxonomy.  Next we discuss aspects to consider when writing the questions.

I found this “Simple Guidelines” list helpful and informative.

(Source: http://webs.rtc.edu/ii/Teaching%20Resources/GuidelinesforWritingTest.htm)

  1. Base the item on a single idea.
  2. Write items that test an important idea
  3. Avoid lifting statements right from the textbook.
  4. Make the questions a brief as possible
  5. Write clearly true or clearly false statements.  Write them in pairs: one “true” and one “false” version and choose one to keep balance on the test.
  6. Eliminate giveaways:
    • Keep true and false statements approximately equal in length
    • Make half the statements true and half false.
    • Try to avoid such words as “all,” “always,” “never,” “only,” “nothing,” and “alone.” Students know these words usually signify false statements.
  7. Beware of words denoting indefinite degree.  The use of words like “more,” “less,” “important,” “unimportant,” “large,” “small,” “recent,” “old,” “tall,” “great,” and so on, can easily lead to ambiguity.
  8. State items positively.  Negative statements may be difficult to interpret.  This is especially true of statements using the double negative.  If a negative word, such as “not” or “never,” is used, be sure to underline or capitalize it.
  9. Beware of detectable answer patterns.  Students can pick out patterns such as (TTTTFFFF) which might be designed to make scoring easier.

All of this makes sense to me.  At first I objected to “Make half the statements true and half false” but when I thought about it, I wouldn’t do exactly half necessarily but maybe close to half.  In fact this source, http://teaching.uncc.edu/learning-resources/articles-books/best-practice/assessment-grading/designing-test-questions,  suggests making the ratio more like 60% false to 40% true since students are more likely to guess the answer is true.

I found other points to add to the guidelines list.  (Source:  http://www.k-state.edu/ksde/alp/resources/Handout-Module6.pdf)

Two ideas can be included in a true-false statement if the purpose is to show cause and effect.

  • If a proposition expresses a relationship, such as cause and effect or premise and conclusion, present the correct part of the statement first and vary the truth or falsity of the second part.
  • When a true-false statement is an opinion, it should be attributed to someone in the statement.
  • Underlining or circling answers is preferable to having the student write them.
  • Make use of popular misconceptions/beliefs as false statements.
  • Write items so that the incorrect response is more plausible or attractive to those without the specialized knowledge being tested.
  • Avoid the use of unfamiliar vocabulary.
  • Determine that the questions are appropriately answered by “True” or “False” rather than by some other type of response, such as “Yes” or “No.”
  • Avoid the tendency to add details in true statements to make them more precise.  The answers should not be obvious to students who do not know the material.
  • Be sure to include directions that tell students how and where to mark their responses.

This same source gives you a nice tip for writing true-false items:

Write a set of true statements that cover the content, then convert approximately half of them to false statements.  State the false items positively, avoiding negatives or double negatives.

Most of this discussion has been about True-False questions but the category is really Alternative-Response.  Let’s look at the variations available to us.

(Source:  http://www.k-state.edu/ksde/alp/resources/Handout-Module6.pdf)

  • The True-False-Correction Question
    In this variation, true-false statements are presented with a key word or brief phrase that is underlined.  It is not enough that a student correctly identify a statement as being false.  … the student must also supply the correct word or phrase which, when used to replace the underlined part of the statement, makes the statement a true one.This type of item is more thorough in determining whether students actually know the information that is presented in the false statements.

    The teacher decides what word/phrase can be changed in the sentence; if students were instructed only to make the statement a true statement, they would have the liberty of completely rewriting the statement so that the teacher might not be able to determine whether or not the student understood what was wrong with the original statement.

    If, however, the underlined word/phrase is one that can be changed to its opposite, it loses the advantage over the simpler true-false question because all the student has to know is that the statement is false and change is to is not.

  • The Yes-No Variation
    The student responds to each item by writing, circling or indicating yes-no rather than true-false.  An example follows:

What reasons are given by students for taking evening classes?  In the list below, circle Yes if that is one of the reasons given by students for enrolling in evening classes; circle No if that is not a reason given by students.

Yes   No   They are employed during the day.
Yes   No   They are working toward a degree.
Yes   No   They like going to school.
Yes   No   There are no good television shows to watch.
Yes   No   Parking is more plentiful at night.

  • The A-B Variation
    The example below shows a question for which the same two answers apply.  The answers are categories of content rather than true-false or yes-no.

Indicate whether each type of question below is a selection type or a supply type by circling A if it is a selection , B if it is supply.

Select     Supply
A      B            Multiple Choice
A      B            True-False
A      B            Essay
A      B            Matching
A      B            Short Answer

In summary, the sources all tend to agree that the best type of Alternative-Response items are those that are unambiguous (“true or false with respect to what?”), concisely written, covering one idea per question, and aimed at more than rote memorization.  We should avoid trick questions or questions that test on trivia.  And the best tests with A-R items have a lot of questions with a True-to-False ratio of 40:60.

Next test item type:  Matching!

Alternative-Response: True/False and Similar Items

truefalse

My original intent for the title here was just “True or False Test Items” but one source pointed out that the best name is Alternative-Response.  That source is https://www.msu.edu/dept/soweb/writitem.html and defines alternative-response as

… a special case of the multiple-choice item format.  There are many situations which call for either-or decisions, such as deciding whether a specific solution is right or wrong, whether to continue or to stop, whether to use a singular or plural construction, and so on.  For such situations, the alternative-response item is an ideal measuring device.  

It goes on to point out the advantages of this item type:

Since only two options are possible, alternative-response items are generally shorter, and, therefore, require less reading time. Students may respond to more alternative-response items than other types of items in a given length of time.

But there is a major disadvantage:  “students have fifty-fifty probability of answering the item correctly by chance alone.”

It suggests making up for this by offering “a larger number of alternative-choice items than of other  types of items in order to achieve a given level of test reliability.”

There are other advantages and disadvantages, such as these offered by http://www.iub.edu/~best/pdf_docs/better_tests.pdf, which only addresses true-false items.

Strength:  They are relatively easy to write.

Limitation:  Items are often ambiguous because of the difficulty of writing statements that are unequivocally true or false.

To me this seems like a contradiction:  How can they be easy to write but difficult to make unambiguous?  This same source offers tips on writing good true-false items which we will address in the next post.

This source:  http://www.k-state.edu/ksde/alp/resources/Handout-Module6.pdf  brings up some other points on true-false items.

Good for:  

  • Knowledge level content
  • Evaluating student understanding of popular misconceptions
  • Concepts with two logical responses

Advantages:  

  • Can test large amounts of content
  • True-false are adaptable to the measurement of a wide variety of learning outcomes
  • Quick and easy to grade
  • If constructed well, can be highly reliable in assessing student knowledge

Disadvantages:  

  • It is difficult to discriminate between students that know the material and students who do not
  • Individual true-false items are less discriminating than individual multiple choice items
  • There is a tendency to write trivial true-false items, which lead students to verbatim memorization
  • True-false items are not amenable to concepts that cannot be formulated as propositions
  • The extent of students’ command of a particular area of knowledge is indicated by their success in judging the truth or falsity of propositions related to it

Bloom’s Taxonomy is of concern for us here, too.  Most sources I read indicate these types of questions items address only the first level of Bloom’s, Knowledge.  We saw that above with the source listing “Good for” bullet points.  Other sources say it bluntly:

True and false questions are best used when you are looking to test a student’s recall ability of specific facts or knowledge.

(Source:  http://www.helpteaching.com/about/how_to_write_good_test_questions)

However this source, http://www.iub.edu/~best/pdf_docs/better_tests.pdf, makes this statement:

Instructors generally use true-false items to measure the recall of factual knowledge such as names, events, dates, definitions, etc.  But this format has the potential to measure higher levels of cognitive ability, such as comprehension of significant ideas and their application in solving problems.

It goes on to give four examples, the first of which is just recall of facts but the others require the student to recall the important information and apply it correctly in order to answer without guessing.

T   F  1.  Jupiter is the largest planet in the solar system.

T   F  2. If Triangle ABC is isosceles and angle A measures 100 degrees, then angle B is 100 degrees.

T   F  3.  If a distribution of scores has a few extremely low scores, then the median will be numerically larger than the mean.

T   F  4.  The larger the number of scores in a distribution, the larger the standard deviation of the scores must be.

(In case you were wondering, the answers are T, F, T, F.)

The first example above measures recall of a specific fact.  The other examples, however, show how a true-false item can be written to measure comprehension and application.

We can see that true-false, alternative-response type questions have the potential to address higher levels of Bloom’s as well as cover a large portion of course material with minimal reading on the student’s part.  This question type can benefit our exams if its strengths and weaknesses are considered carefully when being used.

In the next post we will look at the recommendations for writing quality alternative-response questions.

Aug 2015 Plenary Talk Documents

The documents used at the Palomar College Plenary breakout sessions for August, 2015 are below.

The slideshow in .PPTX format.  You can click “View” and then “Notes View” to see each slide presented on a single page with the accompanying text:

Test Writing Plenary Talk Aug 2015

The worksheet in .PDF format:

Test Writing Plenary Talk Worksheet

 

The verb lists and question frames documents are found here as well as in context within the blog.

Bloom’s Verbs, one page

Bloom’s Verbs for Math

Bloom’s Question Frames

More Bloom’s Question Frames

Bloom’s Verbs for Science

 

Multiple Choice and Bloom’s Taxonomy

Blooms_Taxonomy_pyramid_cake-style-use-with-permission

*Graphic from http://tips.uark.edu/using-blooms-taxonomy/

It is often thought that multiple choice questions will only test on the first two levels of Bloom’s Taxonomy: remembering and understanding.

However, the resources point out that multiple choice questions can be written for the higher levels:  applying, analyzing, evaluating, and creating.

First, we can recognize the different types of multiple choice questions.  While I have used all of these myself, it never occurred to me to classify them.

Source: http://www.k-state.edu/ksde/alp/resources/Handout-Module6.pdf

Types:

Question/Right answer

Incomplete statement

Best answer

In fact, this source states:

…almost any well-defined cognitive objective can be tested fairly in a multiple choice format.

Advantages:

  • Very effective
  • Versatile at all levels
  • Minimum of writing for student
  • Guessing reduced
  • Can cover broad range of content

Can provide an excellent basis for post-test discussion, especially if the discussion addresses why the incorrect responses were wrong as well as why the correct responses were right.

Disadvantages:

  • Difficult to construct good test items
  • Difficult to come up with plausible distractors/alternative responses

They may appear too discriminating to students, especially when the alternatives are well constructed and are open to misinterpretation by students who read more into questions than is there.

So what can we do to make multiple choice questions work for higher levels of Bloom’s?

Source: http://www.uleth.ca/edu/runte/tests/

To Access Higher Levels in Bloom’s Taxonomy

Don’t confuse “higher thinking skills” with “difficulty” or “complicated”

      • use data or pictures to go beyond recall
      • use multiple choice to get at skill questions

Ideas:

  • Read and interpret a chart
  • Create a chart
  • “Cause and effect”  (e.g., read a map and draw a conclusion)

Another part of this source brings up the idea of using the “inquiry process” to present a family of problems that ask the student to analyze a quote or situation.

    • No more than 5 or 6 questions to a family
    • Simulates going through inquiry process, step-by-step
      • Identify the issue
      • Address advanced skill of organizing a good research question
      • Ask an opinion question (but not the student’s opinion)
      • Analyze implicit assumptions
      • Provide for a condition contrary to the facts, “hypothesize”

This source gives some good ideas, too.

Source: http://www.k-state.edu/ksde/alp/resources/Handout-Module6.pdf

Develop questions that resemble miniature “cases” or situations.  Provide a small collection of data, such as a description of a situation, a series of graphs, quotes, a paragraph, or any cluster of the kinds of raw information that might be appropriate material.

Then develop a series of questions based on that material.  These questions might require students to apply learned concepts to the case, to combine data, to make a prediction on the outcome of a process, to analyze a relationship between pieces of the information, or to synthesize pieces of information into a new concept.

In short, multiple choice questions, when designed with good structure and strategies, can provide an in-depth evaluation of a student’s knowledge and understanding.  It can be challenging to write those good questions but the benefits are worthwhile.

I thought about writing a summary of what we have learned about multiple choice questions but found this funny little quiz to be better than anything I could come up with:

Can you answer these 6 questions about multiple-choice questions?

Multiple Choice Structure

Taking a Test

One type of objective question is multiple choice.  We all know what it is but let’s look in detail at its description anyway.

Source: https://www.msu.edu/dept/soweb/writitem.html

Description of a multiple choice item:

Presents a problem or question in the stem of the item and requires the selection of the best answer or option. The options consist of a most-correct answer and one or more distractors or foils.

The major purpose of a multiple choice item is to identify examinees who do not have complete command of the concept or principle involved.

Properties:
• State the problem in the stem
• Include one correct or most defensible answer
• Select diagnostic foils or distractors such as:

o Clichés
o Common misinformation
o Logical interpretations
o Partial answers
o Technical terms or textbook jargon

The distractors must appear reasonable as the correct answer to the students who have not mastered the material.

So the structure of a multiple choice question is a stem followed by options.  The options contain one correct answer and a set of distractors.

The Stem

Some advice for constructing a good stem is

Source: http://www.iub.edu/~best/pdf_docs/better_tests.pdf

  • Write questions that test a significant concept, that are unambiguous, and that don’t give test-wise students an advantage
  • The stem should fully state the problem and all qualifications. Always include a verb in the statement
  • Items should measure students’ ability to comprehend, apply, analyze, and evaluate as well as recall
  • Include words in the stem that would otherwise be repeated in each option
  • Eliminate excessive wording and irrelevant information in the stem

Here are some examples of good and bad stem design:

Source:  http://cft.vanderbilt.edu/guides-sub-pages/writing-good-multiple-choice-test-questions/#stem

     

A stem that does not present a clear problem, however, may test students’ ability to draw inferences from vague descriptions rather serving as a more direct test of students’ achievement of the learning outcome.

  

If a significant learning outcome requires negative phrasing, such as identification of dangerous laboratory or clinical practices, the negative element should be emphasized with italics or capitalization.

  

A question stem is preferable because it allows the student to focus on answering the question rather than holding the partial sentence in working memory and sequentially completing it with each alternative

The best thought about the stem I have seen on the Internet:

Source: http://www.k-state.edu/ksde/alp/resources/Handout-Module6.pdf

Way to judge a good stem: students who know the content should be able to answer before reading the alternatives.

The Options

Sometimes known as “the alternatives”, they are composed of one right answer and a group of “foils” or distractors.

One point that is emphasized regularly in the resources is that the distractors should all be plausible and attractive answers.

Source:  http://cft.vanderbilt.edu/guides-sub-pages/writing-good-multiple-choice-test-questions/

Common student errors provide the best source of distractors.

Alternatives should be stated clearly and concisely. Items that are excessively wordy assess students’ reading ability rather than their attainment of the learning objective

Alternatives should be mutually exclusive. Alternatives with overlapping content may be considered “trick” items by test-takers, excessive use of which can erode trust and respect for the testing process.

[Ed. Note:  I have some issues with this particular example but I get the point of their suggestion.]

Alternatives should be homogenous in content. Alternatives that are heterogeneous in content can provide cues to student about the correct answer.

  

The alternatives should be presented in a logical order (e.g., alphabetical or numerical) to avoid a bias toward certain positions.

Avoid complex multiple choice items, in which some or all of the alternatives consist of different combinations of options. As with “all of the above” answers, a sophisticated test-taker can use partial knowledge to achieve a correct answer.

Other suggestions from this source:

Alternatives should be free from clues about which response is correct. Sophisticated test-takers are alert to inadvertent clues to the correct answer, such differences in grammar, length, formatting, and language choice in the alternatives. It’s therefore important that alternatives

  • have grammar consistent with the stem.
  • are parallel in form.
  • are similar in length.
  • use similar language (e.g., all unlike textbook language or all like textbook language).

The alternatives “all of the above” and “none of the above” should not be used. When “all of the above” is used as an answer, test-takers who can identify more than one alternative as correct can select the correct answer even if unsure about other alternative(s). When “none of the above” is used as an alternative, test-takers who can eliminate a single option can thereby eliminate a second option. In either case, students can use partial knowledge to arrive at a correct answer.

The number of alternatives can vary among items as long as all alternatives are plausible. Plausible alternatives serve as functional distractors, which are those chosen by students that have not achieved the objective but ignored by students that have achieved the objective. There is little difference in difficulty, discrimination, and test score reliability among items containing two, three, and four distractors.

Keep the specific content of items independent of one another. Savvy test-takers can use information in one question to answer another question, reducing the validity of the test.

 

There is more to think about for multiple choice questions, which we will examine in the next post.

Objective or Subjective? Those are the Questions

tobeornottobe

Now that we have studied general test writing strategies, ideas, and tips, it is time to pull our focus inward to the details of the questions themselves.

In general, question types fall into two categories:

  1. Objective
  2. Subjective

I needed specific definitions for these, which I found here.

Source: http://www.k-state.edu/ksde/alp/resources/Handout-Module6.pdf

1. Objective, which require students to select the correct response from several alternatives or to supply a word or short phrase to answer a question or complete a statement.

Examples: multiple choice, true-false, matching, completion

2. Subjective or essay, which permit the student to organize and present an original answer

Examples: short-answer essay, extended-response essay, problem solving, performance test items

This source also suggests guidelines for choosing between them:

Essay tests are appropriate when:

  • The group to be tested is small and the test is not to be reused
  • You wish to encourage and reward the development of student skill in writing
  • You are more interested in exploring student attitudes than in measuring his/her achievement

Objective tests are appropriate when:

  • The group to be tested is large and the test may be reused.
  • Highly reliable scores must be obtained as efficiently as possible.
  • Impartiality of evaluation, fairness, and free from possible test scoring influences are essential.

Either essay or objective tests can be used to:

  • Measure almost any important educational achievement a written test can measure
  • Test understanding and ability to apply principles.
  • Test ability to think critically.
  • Test ability to solve problems.

And it continues with this bit of advice:

 The matching of learning objective expectations with certain item types provides a high degree of test validity:  testing what is supposed to be tested.

  • Demonstrate or show: performance test items
  • Explain or describe: essay test items

I wanted to see what different sources would say, so I also found this one.

Source: http://www.helpteaching.com/about/how_to_write_good_test_questions/

If you want the student to compare and contrast an issue taught during a history lesson, open ended questions may be the best option to evaluate the student’s understanding of the subject matter.

If you are seeking to measure the student’s reasoning skills, analysis skills, or general comprehension of a subject matter, consider selecting primarily multiple choice questions.

Or, for a varied approach, utilize a combination of all available test question types so that you can appeal to the learning strengths of any student on the exam.

Take into consideration both the objectives of the test and the overall time available for taking and scoring your tests when selecting the best format.

I am not sure that “multiple choice” should be the primary choice but I understand they are suggesting to avoid open-ended questions if you want to measure reasoning or analytic skills or general comprehension.

This bothers me a little.  It seems to me, from reviewing the previous posts in this blog, that an open-ended question could measure those skills.  The example that comes to mind is the question I had in botany about describing the cell types a pin might encounter when passing through a plant stem.  That was an essay question measuring general comprehension of plant tissues.

The following source brings up good points about analyzing the results.  It also notes that objective tests, when “constructed imaginatively,” can test at higher levels of Bloom’s Taxonomy.

Source: http://www.calm.hw.ac.uk/GeneralAuthoring/031112-goodpracticeguide-hw.pdf

Objective tests are especially well suited to certain types of tasks. Because questions can be designed to be answered quickly, they allow lecturers to test students on a wide range of material. … Additionally, statistical analysis on the performance of individual students, cohorts and questions is possible.

The capacity of objective tests to assess a wide range of learning is often underestimated. Objective tests are very good at examining recall of facts, knowledge and application of terms, and questions that require short text or numerical responses. But a common worry is that objective tests cannot assess learning beyond basic comprehension.

However, questions that are constructed imaginatively can challenge students and test higher learning levels. For example, students can be presented with case studies or a collection of data (such as a set of medical symptoms) and be asked to provide an analysis by answering a series of questions…

Problem solving can also be assessed with the right type of questions. …

A further worry is that objective tests result in inflated scores due to guessing. However, the effects of guessing can be eliminated through a combination of question design and scoring techniques. With the right number of questions and distracters, distortion through guessing becomes largely irrelevant. Alternatively, guessing can be encouraged and measured if this is thought to be a desirable skill.

There are, however, limits to what objective tests can assess. They cannot, for example, test the competence to communicate, the skill of constructing arguments or the ability to offer original responses. Tests must be carefully constructed in order to avoid the decontextualisation of knowledge (Paxton 1998) and it is wise to use objective testing as only one of a variety of assessment methods within a module. However, in times of growing student numbers and decreasing resources, objective testing can offer a viable addition to the range of assessment types available to a teacher or lecturer.

I like their point about how objective tests cannot test competence to communicate, construct arguments, or offer original answers.  Training our students to take only multiple choice tests (or simply answer “true” or “false”) does not help them to learn how to explain their thoughts or even ensure that they can write coherent sentences.

This is addressed by the second source and in previous posts.  The suggestion is to use a variety of test item types.  This can give you a better picture of what your students know, whereas using one single type can be biased against students who are not strong respondents to that type.

Strategy Summary

summary

We are at the point of our investigation where we need to start looking in more detail at test construction.  Here is a brief summary that puts together the pieces of what we have learned so far.

Our challenges

Write an accurate measure of student achievement that

  • motivates students and reinforces learning,
  • enables us to assess student mastery of course objectives,
  • and allows us to recognize what material was or was not communicated clearly.

Some things we can do to accomplish this

In general, when designing a test we need to

  • consider the length of the test,
  • write clear, concise instructions,
  • use a variety of test item formats,
  • test early and/or frequently,
  • proofread and check for accuracy,
  • consider the needs of our disabled or non-native speaker students,
  • and use humor.

More specifically, our test goals are to

  • Assess achievement of instructional objectives,
  • measure important aspects of the subject,
  • accurately reflect the emphasis placed on important aspects of instruction,
  • measure an appropriate level of student knowledge,
  • and have the questions vary in levels of difficulty.

We should consider the technical quality of our tests

Quality means “conformance to requirements” and “fitness for use.”  The criteria are

  • offering cognitive complexity,
  • reviewing content quality,
  • writing meaningful questions,
  • using appropriate language,
  • being able to generalize about student learning from their test performance,
  • and writing a fair test with answers that represent what students know.

A useful tool is Bloom’s Taxonomy

It lists learning levels in increasing order of complexity:

  1. Remembering
  2. Understanding
  3. Applying
  4. Analyzing
  5. Evaluating
  6. Creating

To apply Bloom’s directly, we looked at

  • Lists of verbs associated with each level (some were discipline-specific),
  • question frames — nearly complete questions we can modify for our topics,
  • and knowledge domains — the kinds of knowledge that can be tested:
    • factual,
    • conceptual,
    • procedural,
    • and metacognitive.

Next up:  Learning what question types to use to achieve our goals.

Using Bloom’s in Test Writing

bloom-verbs

When I first started considering Bloom’s Taxonomy, I thought it was good to help expand my ideas on how to test but I struggled with applying it directly.  I appreciated the increasing cognitive levels but needed help in writing test questions that utilized them.

What I found were lists of verbs associated with each level.  A good one to start with is:

Source: http://www.lshtm.ac.uk/edu/taughtcourses/writinggoodexamquestions.pdf

A table of suggested verbs mapped against the Anderson and Krathwohl adapted levels of Bloom’s Taxonomy of Cognition Cognitive Level Verb Examples

  1. Remember: define, repeat, record, list, recall, name, relate, underline.
  2. Understand: translate, restate, discuss, describe, recognise, explain, express, identify, locate, report, review, tell.
  3. Apply: interpret, apply, employ, use, demonstrate, dramatise, practice, illustrate, operate, schedule, sketch.
  4. Analyse: distinguish, analyse, differentiate, appraise, calculate, experiment, test, compare, contrast, criticise, diagram, inspect, debate, question, relate, solve, examine, categorise.
  5. Evaluate: judge, appraise, evaluate, rate, compare, revise, assess, estimate
  6. Create: compose, plan, propose, design, formulate, arrange, assemble, collect, construct, create, set-up, organise, manage, prepare.

Here is an extensive list that is printable on one page, useful for reference while you are designing your test:

Bloom’s Verbs, one page.

Other useful lists:

Bloom’s Verbs for Math

Bloom’s Question Frames (looks very good for English, literature, history, etc.)  This gives you nearly complete questions which you can manipulate into test items appropriate to your discipline.

More Bloom’s Question Frames (2 pages).

Bloom’s Verbs for Science

What comes across to me again and again throughout the sources is that considering the hierarchy when designing exams creates a culture of learning that involves thinking deeply about the course material, taking it beyond simple rote memorization and recitation.

This culture would benefit from also considering Bloom’s while you are teaching.  Modeling higher level thought processes, showing joy at cognitive challenges, exploring topics in depth (if time permits) or mentioning the depth exists (if time is short) can send a strong signal that thinking is valued and important to learning.

Another view on Bloom’s as applied to test writing is to consider the knowledge domains inherent in your course material.  They are:

Source: http://www.lshtm.ac.uk/edu/taughtcourses/writinggoodexamquestions.pdf

The kinds of knowledge that can be tested

Factual Knowledge

Terminology, Facts, Figures

Conceptual Knowledge

Classification, Principles, Theories, Structures, Frameworks

Procedural Knowledge

Algorithms, Techniques and Methods and Knowing when and how to use them.

Metacognitive Knowledge

Strategy, Overview, Self Knowledge, Knowing how you know.

When I put this list with the verbs lists, I get more ideas for test questions and directions for exploring student acquisition of the course knowledge.

Defining Bloom’s Taxonomy

fx_Bloom_New

One recurring recommendation in the resources is that we should consider Bloom’s Taxonomy when designing tests. To do so, we should know what it is.

The triangle above is a version of the revised Bloom’s, using active verbs and with an addition of one level and a slight reordering at the top.

According to http://www.learnnc.org/lp/pages/4719,

Bloom’s Taxonomy was created in 1948 by psychologist Benjamin Bloom and several colleagues. Originally developed as a method of classifying educational goals for student performance evaluation, Bloom’s Taxonomy has been revised over the years and is still utilized in education today.

The original intent in creating the taxonomy was to focus on three major domains of learning: cognitive, affective, and psychomotor. The cognitive domain covered “the recall or recognition of knowledge and the development of intellectual abilities and skills”; the affective domain covered “changes in interest, attitudes, and values, and the development of appreciations and adequate adjustment”; and the psychomotor domain encompassed “the manipulative or motor-skill area.” Despite the creators’ intent to address all three domains, Bloom’s Taxonomy applies only to acquiring knowledge in the cognitive domain, which involves intellectual skill development.

The site goes on to say:

Bloom’s Taxonomy can be used across grade levels and content areas. By using Bloom’s Taxonomy in the classroom, teachers can assess students on multiple learning outcomes that are aligned to local, state, and national standards and objectives. Within each level of the taxonomy, there are various tasks that move students through the thought process. This interactive activity demonstrates how all levels of Bloom’s Taxonomy can be achieved with one image.

Further, http://www.edpsycinteractive.org/topics/cognition/bloom.html tells us,

The major idea of the taxonomy is that what educators want students to know (encompassed in statements of educational objectives) can be arranged in a hierarchy from less to more complex.  The levels are understood to be successive, so that one level must be mastered before the next level can be reached.

And also,

In any case it is clear that students can “know” about a topic or subject in different ways and at different levels.  While most teacher-made tests still test at the lower levels of the taxonomy, research has shown that students remember more when they have learned to handle the topic at the higher levels of the taxonomy (Garavalia, Hummel, Wiley, & Huitt, 1999).

Let’s see what each level represents.  The following list is based on the original Bloom’s categories but it is still enlightening.

Source: http://www.calm.hw.ac.uk/GeneralAuthoring/031112-goodpracticeguide-hw.pdf

Table 2.2 Bloom’s taxonomy and question categories

Competence        Skills demonstrated  

Knowledge             Recall of information

Knowledge of facts, dates, events, places

Comprehension    Interpretation of information in                                                                            one’s own words

Grasping meaning

Application            Application of methods, theories,                                                                          concepts to new situations

Analysis                 Identification of patterns

Recognition of components and their           relationships

Synthesis                Generalize from given knowledge

Use old ideas to create new ones

Organize and relate knowledge from several areas

Draw conclusions, predict

Evaluation             Make judgments

Assess value of ideas, theories

Compare and discriminate between ideas

Evaluate data

Based on the work by Benjamin B.S. Bloom et. al. Evaluation to Improve Learning (New York: McGraw-Hill, 1981)

We will look at these in more detail in the next post.