Upload
pham-phuc-khanh-minh
View
110
Download
2
Embed Size (px)
Citation preview
1. Phạm Phúc Khánh Minh
2. Nguyễn Trần Hoài Phương
3. Nguyễn Ngọc Phương Thành
4. Võ Thị Thanh Thư
5. Đỗ Thị Bạch Vân
6. Ngô Thảo Vy TESOL 2014B
1. The test production process
2. Approaches to language testing
3. Techniques of language testing: Item types
4. Bloom’s taxonomy and testing
Item analysis
Classical Test Theory
Item-Response Theory
One-Parameter (Rasch Model)
Two-Parameter Three-Parameter
1. The test production process
1.1. Classical Test Theory (CTT) vs
Item-Response Theory (IRT)
CTT
• Measured at test level
• Only apply to those students taking that
test
IRT• Measured at item level
• Provide sample-free measurement
1.2. Advantages theory offered by Latent Trait Theory
Sample-Free Item Calibration
Classical Test Theory
• The estimated itemdifficulty varies withthe average ability ofthe particular sample ofexaminees observed
• -> Item analysis issample-bound
Item-Response Theory
• An item difficulty scaleis independent of abilitydifferences of abilitydifferences of anyparticular sample ofexaminees
• -> Item analysis issample-free
1.2. Advantages theory offered by Latent Trait Theory
Test-Free Person Measurement
Classical Test Theory
• Ability measurementis dependent on theunique clustering ofitems
Item-Response Theory
• Possible to compareabilities of persons usingdifferent tests
1.2. Advantages theory offered by Latent Trait Theory
Multiple Reliability Estimation
Classical Test Theory
• Ability estimationvaries in reliability.One global estimate ofreliability should notbe applied inevaluating theaccuracy of scores forevery individualexamined
Item-Response Theory
• Reliability estimation goes beyond a global estimate for a given test, to a confidence estimate associated with every possible person and item score on that test
1.2. Advantages theory offered by Latent Trait Theory
Identification of Guessers and Other Deviant Respondents
Classical Test Theory
• Impossible to identify persons’ misfit
Item-Response Theory
• Possible to identifypersons’ misfit
1.2. Advantages theory offered by Latent Trait Theory
Reconciliation of Norm-Referenced and Criterion-Referenced Testing
Classical Test Theory
• Unable to reconcileNorm-Referenced andCriterion-ReferencedTesting tomeasurement
Item-Response Theory
• Able to reconcileNorm-Referenced andCriterion-ReferencedTesting tomeasurement
1.2. Advantages theory offered by Latent Trait Theory
Test Equating Facility
Classical Test Theory
• Equated tests requireall test forms to beequated beadministered to thesame large sample of
• -> time-consuming
Item-Response Theory
• No need to administerall forms of tests to thesame large sample ofexaminees
1.2. Advantages theory offered by Latent Trait Theory
Test Tailoring Facility
The tailor test will provide much greater decisionaccuracy than the standardized test. Fewer students willbe wrongly admitted to or wrongly rejected fromuniversity or intensive English study.
1.2. Advantages theory offered by Latent Trait Theory
Item Banking Facility
Items calibrated -> stored in an item bank
according to a common metric of difficulty
Permit the construction of tests of known
reliability and validity based on appropriate
selection of item subsets from the bank without
further need for trial in the field
1.2. Advantages theory offered by Latent Trait Theory
The Study of Item and Test Bias
Classical Test Theory
• Uncommon to quantifythe amount anddirection of bias for anygiven item or person
Item-Response Theory
• Able to quantify theamount and direction ofbias for any given itemor person
• => Test bias is neutralized by removal or inclusion of biased items in the opposite direction
1.2 Advantages theory offered by Latent Trait Theory
Elimination of Boundary Effects in Program Evaluation
Classical Test Theory
• The problem of boundaryeffects
Item-Response Theory
• The person gets all items correct or all itemsincorrect => that person’s ability is notestimated => search for items of greater orlesser difficulty => ability estimation occurs
• The item is missed by all persons or is gotten correctly by all persons => that item’s difficulty is not estimated => search for persons of greater or lesser ability until at least one person passes and one person fails each item => calibration of item difficulty
• Sample size, dispersion and central tendency are transformed to articulate to the same interval scale
• => Boundary effects are removed
1.3 Competing Latent Trait Models
The Rasch One-Parameter Model is preferred by teachers and language testers
1.3 Competing Latent Trait Models
Sample size constraints:
- The Rasch Model: 100 – 200 persons
- Two-Parameter Model: 200 – 400 persons
- Three-Parameter Model:
1,000 – 2,000 persons
1.3 Competing Latent Trait Models
Introduction to the Rasch, One-Parameter Model
The Rasch Model is probabilistic in nature: the persons
and items are not only graded for ability and difficulty,
but are judged according to the probability of their
response patterns given the observed person ability and
item difficulty.
1.3 Competing Latent Trait Models
Computation of Item Difficulty and Person Ability
By computer: BICAL (Mead, Wright, and Bell, 1979)BILOG II (Mislevy and Bock, 1984)
By hand: PROX (Wright and Stone, 1979) – 5 steps
Step 1: Edit the Binary Response MatrixEvery person or item for which all responses are correct or all responses are incorrect is eliminatedStep 2: Calculate Initial Item Difficulty CalibrationsFind the logit incorrect value for each possible number correct and set the mean of the vector of logic difficulty values at zeroStep 3: Calculate the Initial Person MeasuresUse logit correct values instead of logit incorrect valuesStep 4: Calculate the Expansion Factors
1.3 Competing Latent Trait Models
Computation of Item Difficulty and Person Ability
Step 5: Calculate the Standard Errors Associated with These EstimatesThe standard error for each of the final item difficulty calibrations
The standard error for each of the final personality measures
2. Approaches to language testing
The essay-translation approach
The structuralist
approach
The Integrative approach
The communicative
approach
2.1 The essay-translation approach
The pre-scientific stage of language testing
Require no special skill or expertise in testing
Tests: + Essay writing, translation & grammatical
analysis
+ A heavy literature and cultural bias
2.2 The structuralist approach
The systematic acquisition of a set of habits:+ Structural linguistics+ Separate elements of the target language (phonology,
vocabulary & grammar)
TESTS
Words and sentences are completely divorced from any context
Listening, speaking, reading and writing skills are separated from one another
2.3 The Integrative approach
o Concerned with meaning and the total
communicative effect of discourse
o Assess learners’ ability to use two or more skills
simultaneously
o Types of integrative tests:
+ Doze testing and dictation
+ Oral interview and composition writing
+ Translation unreliable
2.3.1 DOZE TESTING
The Gestalt theory of “closure”
Measure the reader’s ability to decode “interrupted” messages by making the most acceptable substitutions
The more blanks contained in the text, the more reliable the doze test will prove
Scoring
Acceptable answer
Correct answer
Misspellings should not be penalisedGrammatical errors should be penalisedThe subject in doze tests should be neutral in content and
language variety usedProvide a lead-in
In a doze test:
Doze testing:
Good indicator of general linguistic abilityRequire linguistic knowledge, textual
knowledge, and knowledge of the worldUsed in achievement, proficiency, classroom
placement tests and diagnostic tests
2.3.2 DICTATION
• Solely measure Ss’ listening comprehension
skillsPreviously
• Include auditory discrimination, the auditory
memory span, spelling, the recognition of
sound segments, overall textual comprehension
Recently
CHARACTERISTICS
oNo reliable way of assessing the relative importance of the different
abilities required
oTend to measure low-order language skills rather than high-order skills
oFocus too much on individual sounds rather than on the meaning of the
text impair memory span but not retain everything Ss hear
TIPS: Read through the whole dictation passage first Dictate (once or twice) in meaningful units of sufficient length
rather than reading out word by word Read the whole passage once more at slightly lower than normal
speed
2.4 The communicative approach Primarily focus on how language is used in communication
Tasks are as close as possible to those facing the Ss in real life Judge the effectiveness of the communication rather than formal
linguistic accuracy Emphasize on language “use” rather than language “usage”
How people use
language for
different purposes
The formal patterns
of language
Tests of a
communicative nature
Divisibility hypothesis
Measure different language skills
Obtain different profiles of a learner’s performance
Test
score
NS score less than NNS
The assessment of language skills in isolation
may have only a very limited relevance to real
life
Communicative tests must of necessity reflect
the culture of a particular country
Communicative tests should be based on
precise and detailed specifications of the need
of learners
Qualitative judgements are superior to
quantitative assessments
3. ITEM TYPESIT
EM
TY
PE
S Selection items
involve the candidate in making a choice of
response between various options offered.
Candidate-supplied items
demand that the candidate supplies the
response, e.g. short answer items, open cloze
items.
3.1 SELECTION ITEMS
Advantages of selection items:
familiar to nearly all candidates in all places
independent of writing ability
easy and quick to mark
capable of being objectively scored
economical of the candidate's time, so that many can be
attempted in a short period and a range of objectives
covered, adding to the reliability of the test.
Disadvantages of selection items:
tests of recognition rather than production
limited in the range of what they can test
incapable of letting a candidate express a wide rangeof abilities
dependent, in many cases, on reading ability
affected by guesswork
very difficult and time consuming to writesuccessfully
capable of leading to poor classroom practice, ifteaching focuses too intensively on preparation fortackling this sort of test item.
3.1 SELECTION ITEMS
3.1.1. Discrete point multiple choice item
3.1.2. Text-based multiple choice item
3.1 SELECTION ITEMS
3.1.3. True / false item
test takers have to make a choice as to the
truth or otherwise of a statement, normally in
relation to a reading or listening text
3.1 SELECTION ITEMS
3.1.4. Gap-filling (cloze passage) with
multiple choice options
words are deleted from a text, creating
gaps which the candidate has to fill, normally
with either one or a two words.
3.1 SELECTION ITEMS
3.1.5. Gap-filling with selection from bank
consists of a text with gaps accompanied by
a 'bank' containing all the correct words to
insert in the text, with the addition of
several which will not be used.
3.1 SELECTION ITEMS
3.1.6. Gap-filling at paragraph level
consist of a text with six paragraph-length
gaps. A choice of seven paragraphs is given
from which to fill the gaps.
3.1.7. Matching
elements from two separate lists of sets of
options have to be brought together.
3.1 SELECTION ITEMS
3.1.8. Multiple matching
a number of questions or sentence completion
items are set, which are generally based on a
reading text. The responses are provided in the
form of a bank of words or phrases, each of
which can be used an unlimited number of times.
3.1 SELECTION ITEMS
3.1.9. Extra word error detection
In this type of task there is one extra,
incorrect, word in most of the lines of a text.
3.1 SELECTION ITEMS
Advantages of candidate – supplied items:
are easier to write
allow for a wider sample of content
minimize the effect of guessing
allow for creativity in language use
measure higher as well as lower order skills
have a more positive effect on classroom practice
can provide a similar degree of marking objectivity
as selection items
3.2 Candidate-supplied items
Disadvantages of candidate – supplied items:
There are often acceptable alternative responses
rather than only one unambiguously correct
response.
time consuming and difficult to mark, often
calling for examiner marking rather than clerical
or computerized marking.
3.2 Candidate-supplied items
3.2.1. Short answer item:
consists of a question which can be answered
in one word or a short phrase. The exact limits
on the length of the answer should be
specified
3.2 Candidate-supplied items
3.2.2. Sentence completion: In this kind of
item part of a sentence is provided, and the
candidate has to use information derived from
a text to complete it.
3.2 Candidate-supplied items
3.2.3. Open gap-filling (cloze): In an open
cloze, the gaps are selected by the item writer,
who focuses on the particular structures to be
tested. The candidate's task is to supply the
word which fills each gap in the text.
3.2 Candidate-supplied items
3.2.4. Transformation: In this type of item,
the candidate is given a sentence, followed by
the opening words of another sentence which
give the same information, but expressed
through a different grammatical structure.
3.2.5. Word formation: In this type of item
one word is deleted from a sentence, and a
related form of the word is given to the
candidate as a prompt.
3.2 Candidate-supplied items
3.2.6. Transformation cloze:
consists of a text with a word missing in
each line, and a different grammatical form
of the word required supplied.
the candidate has both to find the location
of the missing word and supply it in its
correct form.
3.2 Candidate-supplied items
3.2.7. Note expansion
In this item type the lexical components of
each sentence are supplied in a reduced form
which resembles notes.
The candidate's task is to supply the correct
grammatical form, including changes in word
order and the addition of such elements as
prepositions, articles and auxiliary verbs.
3.2 Candidate-supplied items
3.2.8. Error correction / proof reading :
consists of a text in which a word appears in
an incorrect form in each numbered line. The
candidate has first to identify the incorrect
word, and then write it in its correct form at the
end of the line.
3.2 Candidate-supplied items
3.2.9. Information transfer: Tasks described in
this way always involve taking information
given in a certain form and presenting it in a
different form.
3.2 Candidate-supplied items
3.3. NON-ITEM-BASED TASK TYPES
3.3.1. Writing: extended writing questions
Extended writing can be tested in a number of
ways which vary in the degree of control
exercised by the tester over the candidate's
response.
4. Bloom’s taxonomy and testing
Bloom’s
taxonomy
Definition
Old version vs. New version
6 levels of thinking
4.1. Definition
BLOOM’S
TAXONOMY
An arrangement
of ideas or a way
to group things
together
Name of the
creator
Bloom’s Taxonomy is a type of
classification of the different
objectives that educators might set
for students.
The development of Bloom’s
taxonomy
1948:
Benjamin Bloom’ s study on classroom
activities and goals
1956:
The publication of original Bloom’s
Taxonomy
1995:
The revision of
original Bloom’s
Taxonomy
2001:
The final
revision of
Bloom’s
Taxonomy
What’s the Difference?
Original Bloom’s Taxonomy
• Terminology: Used nouns to
describe the levels of
thinking.
• Structure: One dimensional
using the Cognitive Process.
• Emphasis was originally for
educators and psychologists.
Bloom’s taxonomy was
used by many other
audiences.
Revised Bloom’s Taxonomy
• Terminology: Uses verbs to describe the levels of thinking.
• Structure: Two dimensional using the Knowledge Dimension and how it interacts with the Cognitive Process. See next slide for an interactive grid.
• Emphasis is placed upon its use as a more authentic tool for curriculum planning, instructional delivery and assessment.
4.3. The levels of thinking
There are six levels of learning according to Dr. Bloom:
1. Knowledge
2. Comprehension
3. Application
4. Analysis
5. Synthesis
6. Evaluation
The levels of thinking
Knowledge or Remembering
• Observation and recall of information
• Knowledge of dates, events, places, major ideas, etc.
• Mastery of subject matter
• Key words: list, define, tell, describe, identify, show, label, collect, examine, tabulate, quote, name, who, when, where, etc.
The levels of thinking
Comprehension or Understanding
• Understanding information
• Grasp the meaning
• Translate knowledge into new context
• Interpret facts, compare, contrast
• Order, group, infer causes
• Predict consequences
• Key words: summarize, describe, interpret, contrast, predict, associate, distinguish, estimate, differentiate, discuss, extend
Comprehension/
Understanding – Practice
• Retell the story of the “Sleeping
Beauty” in your own words.
The levels of thinking
Application or Applying
• Use information • Use methods, concepts, theories in new
situations • Solve problems using required skills or
knowledge • Key words: apply, demonstrate, calculate,
complete, illustrate, show, solve, examine, modify, relate, change, classify, experiment, discover
The levels of thinking
Analysis or Analyzing
• Seeing patterns
• Organization of parts
• Recognition of hidden meanings
• Identification of components
• Key words: analyze, separate, order,
explain, connect, classify, arrange, divide,
compare, select, explain, infer
The levels of thinking
Synthesis or Creating
• Use old ideas to create new ones
• Generalize from given facts
• Relate knowledge from several areas
• Predict, draw conclusions
• Key words: combine, integrate, modify, rearrange, substitute, plan, create, design, invent, what if?, compose, formulate, prepare, generalize, rewrite
Synthesis/Creating –
Practice
• Design a magazine cover that would
appeal to the students in your class.
The levels of thinking
Evaluation or Evaluating
• Compare and discriminate between ideas
• Assess value of theories, presentations
• Make choices based on reasoned argument
• Verify value of evidence
• Recognize subjectivity
• Key words: assess, decide, rank, grade, test, measure,
recommend, convince, select, judge, explain,
discriminate, support, conclude, compare, summarize
Evaluation/Evaluating –
Practice • Make a booklet about 5 rules for the
country that you see as important.
Convince others.