Test production process - Approaches to language testing - Techniques of language testing - Bloom's taxonomy

1. Phạm Phúc Khánh Minh

2. Nguyễn Trần Hoài Phương

3. Nguyễn Ngọc Phương Thành

4. Võ Thị Thanh Thư

5. Đỗ Thị Bạch Vân

6. Ngô Thảo Vy TESOL 2014B

1. The test production process

2. Approaches to language testing

3. Techniques of language testing: Item types

4. Bloom’s taxonomy and testing

Item analysis

Classical Test Theory

Item-Response Theory

One-Parameter (Rasch Model)

Two-Parameter Three-Parameter

1. The test production process

1.1. Classical Test Theory (CTT) vs

Item-Response Theory (IRT)

CTT

• Measured at test level

• Only apply to those students taking that

test

IRT• Measured at item level

• Provide sample-free measurement

1.2. Advantages theory offered by Latent Trait Theory

Sample-Free Item Calibration


• The estimated itemdifficulty varies withthe average ability ofthe particular sample ofexaminees observed

• -> Item analysis issample-bound


• An item difficulty scaleis independent of abilitydifferences of abilitydifferences of anyparticular sample ofexaminees

• -> Item analysis issample-free


Test-Free Person Measurement


• Ability measurementis dependent on theunique clustering ofitems


• Possible to compareabilities of persons usingdifferent tests


Multiple Reliability Estimation


• Ability estimationvaries in reliability.One global estimate ofreliability should notbe applied inevaluating theaccuracy of scores forevery individualexamined


• Reliability estimation goes beyond a global estimate for a given test, to a confidence estimate associated with every possible person and item score on that test


Identification of Guessers and Other Deviant Respondents


• Impossible to identify persons’ misfit


• Possible to identifypersons’ misfit


Reconciliation of Norm-Referenced and Criterion-Referenced Testing


• Unable to reconcileNorm-Referenced andCriterion-ReferencedTesting tomeasurement


• Able to reconcileNorm-Referenced andCriterion-ReferencedTesting tomeasurement


Test Equating Facility


• Equated tests requireall test forms to beequated beadministered to thesame large sample of

• -> time-consuming


• No need to administerall forms of tests to thesame large sample ofexaminees


Test Tailoring Facility

The tailor test will provide much greater decisionaccuracy than the standardized test. Fewer students willbe wrongly admitted to or wrongly rejected fromuniversity or intensive English study.


Item Banking Facility

Items calibrated -> stored in an item bank

according to a common metric of difficulty

Permit the construction of tests of known

reliability and validity based on appropriate

selection of item subsets from the bank without

further need for trial in the field


The Study of Item and Test Bias


• Uncommon to quantifythe amount anddirection of bias for anygiven item or person


• Able to quantify theamount and direction ofbias for any given itemor person

• => Test bias is neutralized by removal or inclusion of biased items in the opposite direction

1.2 Advantages theory offered by Latent Trait Theory

Elimination of Boundary Effects in Program Evaluation


• The problem of boundaryeffects


• The person gets all items correct or all itemsincorrect => that person’s ability is notestimated => search for items of greater orlesser difficulty => ability estimation occurs

• The item is missed by all persons or is gotten correctly by all persons => that item’s difficulty is not estimated => search for persons of greater or lesser ability until at least one person passes and one person fails each item => calibration of item difficulty

• Sample size, dispersion and central tendency are transformed to articulate to the same interval scale

• => Boundary effects are removed

1.3 Competing Latent Trait Models

The Rasch One-Parameter Model is preferred by teachers and language testers


Sample size constraints:

- The Rasch Model: 100 – 200 persons

- Two-Parameter Model: 200 – 400 persons

- Three-Parameter Model:

1,000 – 2,000 persons


Introduction to the Rasch, One-Parameter Model

The Rasch Model is probabilistic in nature: the persons

and items are not only graded for ability and difficulty,

but are judged according to the probability of their

response patterns given the observed person ability and

item difficulty.


Computation of Item Difficulty and Person Ability

By computer: BICAL (Mead, Wright, and Bell, 1979)BILOG II (Mislevy and Bock, 1984)

By hand: PROX (Wright and Stone, 1979) – 5 steps

Step 1: Edit the Binary Response MatrixEvery person or item for which all responses are correct or all responses are incorrect is eliminatedStep 2: Calculate Initial Item Difficulty CalibrationsFind the logit incorrect value for each possible number correct and set the mean of the vector of logic difficulty values at zeroStep 3: Calculate the Initial Person MeasuresUse logit correct values instead of logit incorrect valuesStep 4: Calculate the Expansion Factors


Computation of Item Difficulty and Person Ability

Step 5: Calculate the Standard Errors Associated with These EstimatesThe standard error for each of the final item difficulty calibrations

The standard error for each of the final personality measures

2. Approaches to language testing

The essay-translation approach

The structuralist

approach

The Integrative approach

The communicative

approach

2.1 The essay-translation approach

The pre-scientific stage of language testing

Require no special skill or expertise in testing

Tests: + Essay writing, translation & grammatical

analysis

+ A heavy literature and cultural bias

2.2 The structuralist approach

The systematic acquisition of a set of habits:+ Structural linguistics+ Separate elements of the target language (phonology,

vocabulary & grammar)

TESTS

Words and sentences are completely divorced from any context

Listening, speaking, reading and writing skills are separated from one another

2.3 The Integrative approach

o Concerned with meaning and the total

communicative effect of discourse

o Assess learners’ ability to use two or more skills

simultaneously

o Types of integrative tests:

+ Doze testing and dictation

+ Oral interview and composition writing

+ Translation unreliable

2.3.1 DOZE TESTING

The Gestalt theory of “closure”

Measure the reader’s ability to decode “interrupted” messages by making the most acceptable substitutions

The more blanks contained in the text, the more reliable the doze test will prove

Scoring

Acceptable answer

Correct answer

Misspellings should not be penalisedGrammatical errors should be penalisedThe subject in doze tests should be neutral in content and

language variety usedProvide a lead-in

In a doze test:

Doze testing:

Good indicator of general linguistic abilityRequire linguistic knowledge, textual

knowledge, and knowledge of the worldUsed in achievement, proficiency, classroom

placement tests and diagnostic tests

2.3.2 DICTATION

• Solely measure Ss’ listening comprehension

skillsPreviously

• Include auditory discrimination, the auditory

memory span, spelling, the recognition of

sound segments, overall textual comprehension

Recently

CHARACTERISTICS

oNo reliable way of assessing the relative importance of the different

abilities required

oTend to measure low-order language skills rather than high-order skills

oFocus too much on individual sounds rather than on the meaning of the

text impair memory span but not retain everything Ss hear

TIPS: Read through the whole dictation passage first Dictate (once or twice) in meaningful units of sufficient length

rather than reading out word by word Read the whole passage once more at slightly lower than normal

speed

2.4 The communicative approach Primarily focus on how language is used in communication

Tasks are as close as possible to those facing the Ss in real life Judge the effectiveness of the communication rather than formal

linguistic accuracy Emphasize on language “use” rather than language “usage”

How people use

language for

different purposes

The formal patterns

of language

Tests of a

communicative nature

Divisibility hypothesis

Measure different language skills

Obtain different profiles of a learner’s performance

Test

score

NS score less than NNS

The assessment of language skills in isolation

may have only a very limited relevance to real

life

Communicative tests must of necessity reflect

the culture of a particular country

Communicative tests should be based on

precise and detailed specifications of the need

of learners

Qualitative judgements are superior to

quantitative assessments

3. ITEM TYPESIT

EM

TY

PE

S Selection items

involve the candidate in making a choice of

response between various options offered.

Candidate-supplied items

demand that the candidate supplies the

response, e.g. short answer items, open cloze

items.

3.1 SELECTION ITEMS

Advantages of selection items:

familiar to nearly all candidates in all places

independent of writing ability

easy and quick to mark

capable of being objectively scored

economical of the candidate's time, so that many can be

attempted in a short period and a range of objectives

covered, adding to the reliability of the test.

Disadvantages of selection items:

tests of recognition rather than production

limited in the range of what they can test

incapable of letting a candidate express a wide rangeof abilities

dependent, in many cases, on reading ability

affected by guesswork

very difficult and time consuming to writesuccessfully

capable of leading to poor classroom practice, ifteaching focuses too intensively on preparation fortackling this sort of test item.

3.1 SELECTION ITEMS

3.1.1. Discrete point multiple choice item

3.1.2. Text-based multiple choice item

3.1 SELECTION ITEMS

3.1.3. True / false item

test takers have to make a choice as to the

truth or otherwise of a statement, normally in

relation to a reading or listening text

3.1 SELECTION ITEMS

3.1.4. Gap-filling (cloze passage) with

multiple choice options

words are deleted from a text, creating

gaps which the candidate has to fill, normally

with either one or a two words.

3.1 SELECTION ITEMS

3.1.5. Gap-filling with selection from bank

consists of a text with gaps accompanied by

a 'bank' containing all the correct words to

insert in the text, with the addition of

several which will not be used.

3.1 SELECTION ITEMS

3.1.6. Gap-filling at paragraph level

consist of a text with six paragraph-length

gaps. A choice of seven paragraphs is given

from which to fill the gaps.

3.1.7. Matching

elements from two separate lists of sets of

options have to be brought together.

3.1 SELECTION ITEMS

3.1.7. Matching

3.1.8. Multiple matching

a number of questions or sentence completion

items are set, which are generally based on a

reading text. The responses are provided in the

form of a bank of words or phrases, each of

which can be used an unlimited number of times.

3.1 SELECTION ITEMS

3.1.9. Extra word error detection

In this type of task there is one extra,

incorrect, word in most of the lines of a text.

3.1 SELECTION ITEMS

Advantages of candidate – supplied items:

are easier to write

allow for a wider sample of content

minimize the effect of guessing

allow for creativity in language use

measure higher as well as lower order skills

have a more positive effect on classroom practice

can provide a similar degree of marking objectivity

as selection items

3.2 Candidate-supplied items

Disadvantages of candidate – supplied items:

There are often acceptable alternative responses

rather than only one unambiguously correct

response.

time consuming and difficult to mark, often

calling for examiner marking rather than clerical

or computerized marking.


3.2.1. Short answer item:

consists of a question which can be answered

in one word or a short phrase. The exact limits

on the length of the answer should be

specified


3.2.2. Sentence completion: In this kind of

item part of a sentence is provided, and the

candidate has to use information derived from

a text to complete it.


3.2.3. Open gap-filling (cloze): In an open

cloze, the gaps are selected by the item writer,

who focuses on the particular structures to be

tested. The candidate's task is to supply the

word which fills each gap in the text.


3.2.4. Transformation: In this type of item,

the candidate is given a sentence, followed by

the opening words of another sentence which

give the same information, but expressed

through a different grammatical structure.

3.2.5. Word formation: In this type of item

one word is deleted from a sentence, and a

related form of the word is given to the

candidate as a prompt.


3.2.6. Transformation cloze:

consists of a text with a word missing in

each line, and a different grammatical form

of the word required supplied.

the candidate has both to find the location

of the missing word and supply it in its

correct form.


3.2.7. Note expansion

In this item type the lexical components of

each sentence are supplied in a reduced form

which resembles notes.

The candidate's task is to supply the correct

grammatical form, including changes in word

order and the addition of such elements as

prepositions, articles and auxiliary verbs.


3.2.7. Note expansion


3.2.8. Error correction / proof reading :

consists of a text in which a word appears in

an incorrect form in each numbered line. The

candidate has first to identify the incorrect

word, and then write it in its correct form at the

end of the line.


3.2.8. Error correction / proof reading


3.2.9. Information transfer: Tasks described in

this way always involve taking information

given in a certain form and presenting it in a

different form.


3.3. NON-ITEM-BASED TASK TYPES

3.3.1. Writing: extended writing questions

Extended writing can be tested in a number of

ways which vary in the degree of control

exercised by the tester over the candidate's

response.

Writing tasks with detailed input


Writing tasks with titles only


3.3.2. Speaking:

Presentation


Use of picture prompts:


Written prompts:


Information gap tasks

3.3.NON-ITEM-BASED TASK TYPES

4. Bloom’s taxonomy and testing

Bloom’s

taxonomy

Definition

Old version vs. New version

6 levels of thinking

4.1. Definition

BLOOM’S

TAXONOMY

An arrangement

of ideas or a way

to group things

together

Name of the

creator

Bloom’s Taxonomy is a type of

classification of the different

objectives that educators might set

for students.

The development of Bloom’s

taxonomy

1948:

Benjamin Bloom’ s study on classroom

activities and goals

1956:

The publication of original Bloom’s

Taxonomy

1995:

The revision of

original Bloom’s

Taxonomy

2001:

The final

revision of

Bloom’s

Taxonomy

Original Bloom’s Taxonomy

Old Bloom’s Taxonomy

4.2. Old vs. New Bloom’s

Taxonomy

What’s the Difference?

Original Bloom’s Taxonomy

• Terminology: Used nouns to

describe the levels of

thinking.

• Structure: One dimensional

using the Cognitive Process.

• Emphasis was originally for

educators and psychologists.

Bloom’s taxonomy was

used by many other

audiences.

Revised Bloom’s Taxonomy

• Terminology: Uses verbs to describe the levels of thinking.

• Structure: Two dimensional using the Knowledge Dimension and how it interacts with the Cognitive Process. See next slide for an interactive grid.

• Emphasis is placed upon its use as a more authentic tool for curriculum planning, instructional delivery and assessment.

4.3. The levels of thinking

There are six levels of learning according to Dr. Bloom:

1. Knowledge

2. Comprehension

3. Application

4. Analysis

5. Synthesis

6. Evaluation

The levels of thinking

Knowledge or Remembering

• Observation and recall of information

• Knowledge of dates, events, places, major ideas, etc.

• Mastery of subject matter

• Key words: list, define, tell, describe, identify, show, label, collect, examine, tabulate, quote, name, who, when, where, etc.

Knowledge/Remembering –

Practice

• Write a list of vegetables.


Comprehension or Understanding

• Understanding information

• Grasp the meaning

• Translate knowledge into new context

• Interpret facts, compare, contrast

• Order, group, infer causes

• Predict consequences

• Key words: summarize, describe, interpret, contrast, predict, associate, distinguish, estimate, differentiate, discuss, extend

Comprehension/

Understanding – Practice

• Retell the story of the “Sleeping

Beauty” in your own words.


Application or Applying

• Use information • Use methods, concepts, theories in new

situations • Solve problems using required skills or

knowledge • Key words: apply, demonstrate, calculate,

complete, illustrate, show, solve, examine, modify, relate, change, classify, experiment, discover

Application/Applying –

Practice

• Make an imaginary story and tell it.


Analysis or Analyzing

• Seeing patterns

• Organization of parts

• Recognition of hidden meanings

• Identification of components

• Key words: analyze, separate, order,

explain, connect, classify, arrange, divide,

compare, select, explain, infer

Analysis/ Analyzing –

Practice

• Make a family tree to show

relationships.


Synthesis or Creating

• Use old ideas to create new ones

• Generalize from given facts

• Relate knowledge from several areas

• Predict, draw conclusions

• Key words: combine, integrate, modify, rearrange, substitute, plan, create, design, invent, what if?, compose, formulate, prepare, generalize, rewrite

Synthesis/Creating –

Practice

• Design a magazine cover that would

appeal to the students in your class.


Evaluation or Evaluating

• Compare and discriminate between ideas

• Assess value of theories, presentations

• Make choices based on reasoned argument

• Verify value of evidence

• Recognize subjectivity

• Key words: assess, decide, rank, grade, test, measure,

recommend, convince, select, judge, explain,

discriminate, support, conclude, compare, summarize

Evaluation/Evaluating –

Practice • Make a booklet about 5 rules for the

country that you see as important.

Convince others.

THANK YOU FOR YOUR

ATTENTION!

Education

Test production process - Approaches to language testing - Techniques of language testing - Bloom's taxonomy