35
PISA 2012 PISA 2012 Mathematical Literacy: Mathematical Literacy: writing assessment tasks Symposium Comillas, Spain, September 2013 Dave Tout D id T t@ d David.Tout@acer.edu.au

CÓMO ELABORAR ÍTEMS DE MATEMÁTICAS POR DAVID TOUT (ACER). SEMINARIO DE COMILLAS (CANTABRIA)

Embed Size (px)

Citation preview

PISA 2012PISA 2012 Mathematical Literacy:Mathematical Literacy:

writing assessment tasks

Symposium y pComillas, Spain, September 2013

Dave ToutD id T t@ [email protected]

International test development processesprocesses

THE BEGINNING: Framework development: what is being 

d d h ?Training of item 

iti tInitial item writing 

b tassessed and why? Description and elaboration.

writing team  by teams

Cognitive laboratories / pilots 

Review process: panels

Use and refinement of fundamental mathematical competencies to predict 

item difficulty

Items revised

fi li d d f ll

Country reviews / Experts

Translation & review

Items finalised and fully trialled in national Field 

Trials

Psychometric analysis &  Items deleted / fine‐

Expertsreview

THE END Fi l ty y

review for performance / difficulty / fairness / reliability / validity

tuned / Expert review. Scale development.

THE END: Final assessment tool created to meet 

Framework specifications

Write to the PISA frameworkWrite to the PISA framework

The definition of the domain• The definition of the domain• Description of variablesp

(contexts, processes, content)• A blueprint for test development (how• A blueprint for test development (how

many items of each type)

Write to the PISA frameworkWrite to the PISA framework

Mathematical contexts Personal Occupational Societal Scientific

25% 25% 25% 25%

Mathematical Uncertainty Space & Change &Mathematical content areas Quantity Uncertainty

& dataSpace & shape

Change & relationships

25% 25% 25% 25%25% 25% 25% 25%

Mathematical processes Formulate Employ Interpretprocesses

25% 50% 25%

Structure of PISA assessment materialmaterial

Example of a PISA Unit

SUBIDA AL MONTE FUJIEl Monte Fuji es un famoso volcán inactivo del Japón.

PISA UnitStimulusmu u

Q01 La subida al Monte Fuji sólo está abierta al público desde el 1 de julio hasta el 27 de agosto de cada año. Alrededor de unas 200.000 personas suben al Monte Fuji durante este periodo de tiempo.Como media, ¿alrededor de cuántas personas suben al Monte Fuji cada día?

Th

Como media, ¿alrededor de cuántas personas suben al Monte Fuji cada día?A. 340B. 710C. 3.400D. 7.100E 7 400 Three

questions E. 7.400

Q02La ruta del Gotemba, que lleva a la cima del Monte Fuji, tiene unos 9 kilómetros (km) de longitud.Los senderistas tienen que estar de vuelta de la caminata de 18 km a las 20:00 h.

- ItemsToshi calcula que puede ascender la montaña caminado a 1,5 kilómetros por hora, como media, y descenderla al doble de velocidad. Estas velocidades tienen en cuenta las paradas para comer y descansar.Según las velocidades estimadas por Toshi, ¿a qué hora puede, como muy tarde, iniciar su caminata de modo que pueda estar de vuelta a las 20:00 h? q p_______________________________________________________________________

Q03Toshi llevó un podómetro para contar los pasos durante su recorrido por la ruta del Gotemba.El podómetro mostró que dio 22 500 pasos en la ascensiónEl podómetro mostró que dio 22.500 pasos en la ascensión.Calcula la longitud media del paso de Toshi en su ascensión de 9 km por la ruta del Gotemba. Expresa tu respuesta en centímetros (cm).

Respuesta: ______________ cm

Why use a stimulus?

A ti lA stimulus …

• connects the subject area to the real world and makes the question have a purpose and meets PISA Framework requirements;PISA Framework requirements;

• makes an item interesting and more visually attractive to students than just text or symbols;attractive to students than just text or symbols;

• the context and realism makes it more engaging to students;to students;

• allows for a number of questions to be asked about the same stimulus/contextabout the same stimulus/context.

A d ti lA good stimulus …i i h d i t ti• is rich and interesting;

• is optimally challenging; not too hard or y g gtoo easy;

• does not pose artificial challenges;does not pose artificial challenges;• offers opportunity to pose searching

questions;questions;• is (more or less) equally accessible and

it bl f diff t did tequitable for different candidates.

Bad stimulus: what to avoid• Giving offence or creating disturbance

T ( id t i l )• Trauma (car accidents, violence)– Sex, religion, politics or other emotionally

contentious issues– Nasty behaviour, violence, racism, immorality or

irresponsibility Undesirable models drugs alcohol inducements– Undesirable models – drugs, alcohol, inducements to do anything potentially dangerous

– Bad language• Expecting too much or too little of

studentsSti l th t li f ili k l d– Stimulus that relies on unfamiliar knowledge

– Stimulus that 15-year-olds would think suitable for babiesbabies

Selecting Stimulus• Authentic – from the real world• Cultural appropriateness• Language appropriateness

I t t f t t• Interest for target age group• Long life?• Long life?• DifficultyDifficulty• Copyright clearance?py g

Where can you find a stimulus?

• News articles – in print or on the internet• Shopping, including advertisements• Real items and their packaging – food stuffs, drinks,

ingredients, materials, household and garden chemicals, goods, medicines, etcC ki d f d i• Cooking and food – recipes

• From the world of music and moviesS t f tb ll f l• Sport – football, for example

• Pamphlets and fliers

Where can you find a stimulus?

• Maps and guides• Art and craft, hobbies, etc.• Financial information• Information about buildings/gardens/ parks/sport

fields/etc.• Holidays and travel information• Work related materials, if appropriate• Even a simple photo along with some text can act as a

stimulusEt• Etc.

The challenge in mathematics

• An authentic stimulus (often needs to be simplified)( p )• Assesses a targeted components of the maths

framework – context, content, process• Accessible – including context, language and

terminology, not a reading assessmentLevel of reading:• The level of reading required to successfully engage

with an item should be considered very carefully. • The wording of items should be as simple and direct

iblas possible. • This is an assessment of Mathematical literacy, not of

reading abilityreading ability.

The challenge in mathematics

The challenge in mathematics

Where is our coffee coming from?The top five coffee producing countries in 2011/2012 were Brazil, Vietnam, Indonesia, Colombia and Ethiopia. The , , , ptable shows their total coffee bean production figures.And did you know that coffee beans are packed and measured in thousands of 60‐kilogram (kg) bags?

Country 60‐kg bags1000s

Percentage (%) of world production

2011/12 Coffee Production

Brazil 49,200 35.8%

Vietnam 21,000 15.3%

Indonesia 8,300 6.0%

Colombia 7,500 5.5%

Ethiopia 6,300 4.6%

Source: United States Department of Agriculture, June 2012

CHOOSING STIMULUSCHOOSING STIMULUS

TOMORROW IT’S YOUR TURN!

PISA Item formats

Item formats used in PISAItem formats used in PISA

Selected response Constructed response

• Simple multiple • Constructed

Selected response Constructed response

p pchoice

• Complex multipleresponse manual

• Constructed• Complex multiple choice

• Constructed response expert

A simple multiple-choice itemp pSUBIDA AL MONTE FUJIEl Monte Fuji es un famoso volcán inactivo del Japón. PISA classification:j p f

•Societal•Formulate•QuantityQuantity

Question

Q01 La subida al Monte Fuji sólo está abierta al público desde el 1 de julio hasta el 27 de agosto de cada año Alrededor de unas 200 000 personas suben al Questionel 27 de agosto de cada año. Alrededor de unas 200.000 personas suben al Monte Fuji durante este periodo de tiempo.Como media, ¿alrededor de cuántas personas suben al Monte Fuji cada día?A. 340

Multiple choice options -distractors

B. 710C.3.400D.7.100E 7 400E. 7.400

Advantages of multiple-choice item format

• Coding or marking the responses to multiple-choiceCoding or marking the responses to multiple choice items is perfectly reliable.

• Coding is quick and easy. For large samples the g q y g pcoding can be done by machine, thus saving coding costs.

What makes a good multiple-h i i ?

A i l t

choice item?• A single correct answer• Three (or four) plausible but clearly incorrect distractors

Wh iti MC it i j tifi ti f h• When writing MC items, give a justification for each distractor – makes the distractors plausible and can be used for diagnostic informationused for diagnostic information

• Assessment of a single, well-defined skill• Language that can be read and understood by most• Language that can be read and understood by most

students• A set of options (answers and distractors) that do notA set of options (answers and distractors) that do not

provide cues or hints to the key• Does not include negative wording (not, no)g g ( , )• If numbers, put numbers in numerical order

MC items: mathematics

ff fWhere is our coffee coming from?The top five coffee producing countries in 2011/2012 were Brazil, Vietnam, Indonesia, Colombia and Ethiopia. The table shows their total coffee bean production figures.And did you know that coffee beans are packed and production is measured in 

C 60 k b P (%) f

thousands of 60‐kilogram (kg) bags?

2011/12 Coffee Production

Country 60‐kg bags1000s

Percentage (%) of world production

Brazil 49,200 35.8%

Vietnam 21,000 15.3%

Indonesia 8,300 6.0%

Colombia 7,500 5.5%

Ethiopia 6,300 4.6%Source: United States Department of Agriculture, June 2012

MC items: mathematicsPossible question:• Approximately how many thousands of 60-kilogram• Approximately how many thousands of 60-kilogram

bags of coffee beans were produced worldwide in 2011/2012?

• Calculated answers are around 137,000 while estimates are around 140,000  to 145 000 (e g if you say 35 8% is a bit more than a third then you get 147 600145,000 (e.g. if you say 35.8% is a bit more than a third, then you get 147,600, therefore an answer of 140,000 would be seen as probably correct.)

What are some good distracters?

• 35 8% of 49 200 17 900 therefore 20 000 is (just) a plausible answer not• 35.8% of 49,200 = 17,900 therefore 20,000 is (just) a plausible answer – not strong as it is less than the production of Brazil. Although a common way a student might solve the problem.

135 8% f 49 200 66 813 th f 70 000 i l ibl• 135.8% of 49,200 = 66,813 therefore 70,000 is plausible 

• If you add all five values you get 92,300 – so 90,000

• Add five %’s and get 67.2% ‐ so 67.2% x 92,300 = 62025.6  => 60,000

• But all lower than correct answer – is there a plausible distracter that is higher than 140,000? 167% x 92,300 = 154141  => 150,000

DistractorsWhere is our coffee coming from?The top five coffee producing countries in 2011/2012 were Brazil, Vietnam, Indonesia, Colombia and Ethiopia. The table shows their total coffee bean production figures.And did you know that coffee beans are packed and production is measured in thousands of 60‐kilogram (kg) bags?

2011/12 Coffee Production

Country 60‐kg bags1000s

Percentage (%) of world production

Brazil 49 200 35 8%Brazil 49,200 35.8%

Vietnam 21,000 15.3%

Indonesia 8,300 6.0%

Colombia 7 500 5 5%Colombia 7,500 5.5%

Ethiopia 6,300 4.6%Source: United States Department of Agriculture, June 2012

Approximately how many thousands of 60‐kilogram bags of coffee beans were produced worldwide in 2011/2012?

A. 60,000 B. 70,000 C. 90,000 D. 140,000 E. 150,000A.   60,000  B.   70,000 C.   90,000 D.  140,000  E.  150,000

A complex multiple-choice itemp p

Set of 2 or 3 Set of 2 or 3 statements or options Responses – Yes/No

or True/Falseor True/False

A complex multiple-choice itemp p

Complex multiple-choice items

• Allows assessment of deeper/more comprehensive• Allows assessment of deeper/more comprehensive understanding of a concept or process

• Can help reduce the need to write explanations inCan help reduce the need to write explanations in mathematics

• All parts of the question must relate to the sameAll parts of the question must relate to the same concept or process

• Challenge is to make the wording conciseg g• Complex-multiple choice items reduce the possibility

of guessing the correct answer• Therefore it is generally harder for a student to gain

full credit for items of this type.• They are as easy and reliable to code as simple

multiple-choice items

A constructed-response manual

SUBIDA AL MONTE FUJIEl Monte Fuji es un famoso volcán inactivo del Japón.

response manual item

QuestionQ03Toshi llevó un podómetro para contar los pasos durante su recorrido por la ruta del Gotemba.El podómetro mostró que dio 22.500 pasos en la ascensión.Calcula la longitud media del paso de Toshi en su ascensión de 9 km por la ruta del Gotemba. g p pExpresa tu respuesta en centímetros (cm).

Respuesta: ______________ cm

Rules

CLIMBING MOUNT FUJI SCORING 3QUESTION INTENT:

Description: Divide a length given in km by a specific number and express the quotient in cmMathematical content: Quantity Rules

for coding

yContext: SocietalProcess: Employ

Full CreditCode 2: 40 g

Partial CreditCode 1: Responses with the digit 4 based on incorrect conversion to centimetres.

• 0.4 [answer given in metres]• 4000 [incorrect conversion][ ]

No CreditCode 0: Other responses.Code 9: Missing.

A constructed-response expert itemFRECUENCIA DE GOTEOLas infusiones intravenosas (goteo) se utilizan para administrar líquidos y fármacos a los pacientesfármacos a los pacientes.

Las enfermeras tienen que calcular la frecuencia de goteo G de las infusiones intravenosas en gotas por minuto.

Utilizan la fórmula donde

g es el factor de goteo expresado en gotas por mililitro (ml)v es el volumen de la infusión intravenosa en mln es el número de horas que ha de durar la infusión intravenosa.

Una enfermera quiere duplicar la duración de una infusión intravenosa.Explica exactamente cómo varía G si se duplica n pero sin variar g y v.

A constructed-response expert item

Full CreditCode 2: Explanation describes both the direction of the effect and its size.

•It halves•It is half •D will be 50% smaller Rules for coding•D will be 50% smaller•D will be half as big

Partial CreditCode 1: A response which correctly states EITHER the direction OR the

size of the effect, but not BOTH.•D gets smaller [no size]•There’s a 50% change [no direction]•D gets bigger by 50% [incorrect direction but correct size]•D gets bigger by 50%. [incorrect direction but correct size]

No CreditCode 0: Other responses.

•D will also double [Both the size and direction are incorrect.]Code 9: Missing.

Writing constructed-response items

Closed constructed response items are often:

g p

Closed constructed response items are often:• simpler, straightforward items• typically there is a single best answer a numeric• typically there is a single best answer – a numeric

answer; a name; selecting a value/place on a graph or table; etc.or table; etc.

• for a numeric answer, these can be used when there are too many options for a simple Multiple Choice y p p pitem – sometimes after a trial a CR can be changed to a MC based on the most common answers

Writing constructed-response itemsg p

• Question stem needs to be well structured with clear• Question stem needs to be well structured, with clear instructions and unambiguous e.g., “How long will it take to walk …..” compared withg , g p“How many minutes will it take to walk …..”

• Make sure there is not a 50% chance of guessing the correct answer e.g., cannot simply answer “yes” or “no”

• If it is a numerical answer, try to make it easy to code e.g. make the result a whole number or finite decimal.

Writing constructed-response items• If an extended response, must be written to avoid

g p

superficial responses• Coding is critical - try to predict all possible answers

d thand cover them.• Use dot point examples – use actual student responses

from pilots and trialsfrom pilots and trials• In PISA 2012, we used examples where you needed to

argue against a given statement:argue against a given statement:

A journalist reports on the new flu test as follows.“Doctors have a test for the new type of flu. For people who do have the new flu, the test is correct in 95% of cases; for people who do not have the new flu, the test is correct in 10% of cases.”Is the journalist’s report correct? Give a reason for your answer.

QUESTIONS?