Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
1. Mengaplikasikan taksiran formatif untuk mengenalpasti pencapaian hasil pembelajaran
2. Mengenalpasti teknik pengajaran dan pembelajaran yang sesuai
Analisis cara mengajar 4 Penilaian •Pemulihan/ pembetulan dan pengukuhan/ pengayaan
…but have you answered the questions all learners need to know?
Where do I need to go?
Why should I go there?
How will I get there?
How will I know when I’ve arrived?
Common Test Types And Characteristics
Type
Advantages Disadvantages Best Utilized
True – False Yes - No
• Easy to construct • Can be ambigeous •Can reinforce incorrect information Enables guessing
• To measure recall and comprehension of facts
Multiple Choice • Easy to score and statistically analyse • Can be constructed to measure analyse and synthesis of information
• Difficult to construct • Enables students to answer by process of unintentionally hidden clues
• To measure comprehension • To measure higher cognitive skills
Matching •Popular with students •Can be constructed to include broad range of information
• Difficult to construct • Enables students to answer by process of
• To measure comprehension by comparing information
Short Answer Open- Ended
•Easy to construct •Adaptable to specific subject content
•Difficult to score as more than one answer may be correct
•To measure to recall of facts and specific knowledge
Fill in The Blank •Can be more focused and easily scored
•Diffuclt to score when more than one answer may be correct
•To measure recall of facts and specific knowledge
Essay •Easy to construct •Enables students to demonstrate a broad knowledge base
•Scoring is quite time consuming
•To measure application and higher cognitive skills
Selected- response test
Short-answer test Essay test
Characteristics Objective; Choose among alternatives; Assess foundational knowledge
Objective; Ask to supply into from memory; Assess foundational knowledge
Ask to discuss one or more related ideas according to certain criteria
Advantages Efficiency Relatively easy to write; Allow for breadth
Assess higher-level abilities
Disadvantages Focus on verbatim memorization
Focus on verbatim memorization
Lack of consistency of grading
Written test
- Selected-response tests
- Short- answer tests
- Essay tests
Performance tests
- Direct writing assessments
- Portfolios
- Exhibitions
- Demonstrations
1. Learning as a quantitative increase in knowledge. Learning is acquiring information or “ knowing a lot”
2. Learning as memorising . Learning is storing information that can be reproduced.
3. Learning as acquiring facts, skills and method that can be retained and used as necessary.
4. Learning as making sense or abstracting meaning. Learning involves relating parts of the subject matter to each other and to the real world.
5. Learning as interpreting and understanding reality in a different way. Learning involves comprehending the world by re-interpreting knowledge.
Type CCategory DeDescription
I Teacher- Focused
A Teaching as transmitting concepts of syllabus
B Teacher as transmitting teacher’s knowledge
II Student Focused
C Teacher as helping students acquire concepts of syllabus
D Teacher as helping students acquire teacher’s knowledge
“ [ Teaching ] is a transfer of knowledge from somebody who accumulates certain amount of knowledge to people who are recipient[s] of the knowledge” (Professor of Medicine)
* Focus on transfer of information
* Students’ prior knowledge not considered
* Students are passive recipients
Teaching viewed as facilitative prosess, not a matter of transmission
Focus of teaching on helping students discover knowledge that he as an expert already holds
Student’s ability to understand the relationship among provied concepts is a valued component of leaning & professional development
“I don’t give [students] recipes. I expect them to understand the concepts behind what they’re doing and I expect them to learn so they are able to do the experiments on their own. And you know, problem solve and design experiments later. So I guess I’m always teaching so they can function on their own as scientist. But they are expected to know, the concepts and not just how you do it, but why you do it, I guess.” ( Preventative Medicine)
Teaching helps students develop & change their conceptions of subject matter
Does not expect student to adopt his own woridview, but to create own perspective on subject material
The goal of teaching is to bring about changes in the student so that they can better understand the given area, so it’s about changing the conceptual representations of studentss such that they can go out and not only be able to recive, go out on their own and recover more information outside of the classroom but also generate their own questions. ( Teacher of Linguistics)
Result
Behavior
Learning
Reactions
Kirkpatrick’s Four Levels of Evalution
Figure 1
1- Reactions : Measures how students have reacted to the training – Program evalution sheets
2- Learning : Measures what students have learned from the training – Individual pre- and post test for comparisons
3- Behavior : Measures whether what was learned is being applied in the life – Observations and feedback from others
4- Result : Measures whether the application of learning in class is achieving result – difficult to measure
* Each successive level of evalution build upon the evalutions
of the previous level. Each successive level of evalutions adds precision to the measure of effectiveness but requires more time consuming analiysis and increased costs.
Level 4 : Does it matter? Does it advance strategy?
Level 3 : are the doing it (objectives) consistently and appropriately?
+++++++++++++++++++++++++++++++
Level 2 : Can they do it (objectives)? Do they show the skills and abilities?
Level 1 : did they like the experience? Satisfaction? Use? Repeat use?
• Student Records
- portfolio
- report cards
- info cards
- anecdotal notes
Interest Inventory
Observations
KWL (focus on K and W)
Class discussions
Observational Checklist
Anecdotal Notes
Class work
Conference Notes
(writing/reading)
Questioning
Traditional
* paper/pencil test
Altenative
* Projects
* Portfolios
* Presentations
Alternative Response
Matching
Multiple Choice
• Stem contains a declarative statement
• Response choice (true/false, yes/no, right/wrong, correct/incorrect, fact/opinion, agree/disagree
• Measures the ability to correct identify the correctness of statements of fact, definitions of terms, or statements of principles (simple learning outcomes)
T F 1. The green coloring material in a plant leaf is called cholophyll
Y N 2. Is 50% of 38 more than 18? T F 3. The Constition of the United States is the highest law of the country T F 4. The earth is a planet T F O 5. There are intelligent life forms on other planets
• If you are the simply doing T/F stay away from opinion statements
• Keep the stem clear and concise (avoid complex sentences)
• Do not use subjective words such as frequently, most, some, few, usually, often ect
• Do not use absolute terms such as always, never, all, none, or only
• Avoid the use of negative terms : no, not • Keep a balanced response set
Efficient
Easy to construct
Provides for a wide-sampling of material
Limited to measuring at the knowledge level
Susceptibility to guessing
Measures simple associations/knowledge level
Student given a stem to match with correct respose
Write the letter for the term in Column B that matches the description in Column A. Each term is used only once
Column A Column B
___ A number divisble by itself and one A. Integer
___ A symbolic represntation of a whole number B. Irrational number
___ Number that can be represnted by a ratio C. Numeral
of whole numbers
___ A positive or negative whole number D. Integer
E. Rational number
Read each example. The write the name of the literary technique beside the example. A literary technique may be used more than once
Personification Simile Metaphor Alliteration ____ The flame of Nadia’s newfound knowledge burned inside her ____ The kitten studied the ball of clay carefully-taking stock of its shape and
size- triyng to decide whether it was going to attack him ____ He helped himself to a helty helping of hash brown potatoes ____ He aparment was like the duty dump ____ Juan felt as light as air-filled with joy of the news ____ The stump sat upright, looking down over the clear-cut valley with disdain
Advantages : compact easy form, easy to construct, easy to score
Limitations : only measures basic recall, sometimes difficult to have enough items to develop a homegeneous set
Measures simple to
complex learning
goals/objectives
Typical items includes a
stem and list of
distacters
• Stem must be clear and concise, the reader should know what the answer should be without looking at the responses
• Do not leave a dangling stem – use a blank or convert to a question
• Blanks should be left near the end of the stem • Convert stems to questions when possible • Avoid negatives (not, no, except, leats, ect) • Only one correct answer! Avoid “A and B”, “B and C”, “B and
A but not C”, “All of the above,” “None of the above” • Responses should be grammatically correct, approximately
the same length • All distracters should be plausible • Avoid clues in the stem
Who was the main character of the story?
a. Granpa Jones
b. Granma Jones
c. Cousin Ralph
d. Anabel Jones
Mary had tickets to the movies. Each tickets cost 6 dollars.
What was the total cost of tickets?
A. 12 dollars
B. 18 dollars
C. 24 dollars
D. 30 dollars
• Advantages : Measures a variety of learning outcomes (knowledge – application), specific item – eliminates vagueness, forces student to know what is correct, greater reliability compared to alternative respons items, can identify misconceptions
• Limitations : Does not move beyond application phase, writing appropriate distracters can be difficult
Completion
Short – answer
Essay
Explanations
Writing Prompt
• Short Answer items uses a direct question or task
• Completion items uses consists of an incomplete
• Used for measuring a wide variety of simple learning outcomes, knowledge of terminology, facts, principle, methods or procedures, interpretation of data, solving numerial problems
• What is the name of the man who invented the steam boat?
__________ __________ (first name) (last name)
What device is used to detect whether an
electric charge is positive or negative? _______
Name three organs in the digestive system. __________ __________ _________
The name of the man who invented the steamboat is _____________
A member of the United States Senate is elected to a term of ___________ years
Easy to construct
Student must supply the answer, limit guessing
Not suitable for measuring complex learning outcomes (knowledge/comprehension)
Difficulty of scoring (partial answer)
Do not start the item with a blank
Provide enough clues to lead to the correct answer
Do not include too many blanks (at most two blanks)
Blanks for answer shoulf be equal in length
If answer is expressed in numerial units indicate the type of answer ___lb. ___oz
Response elicits one or more paragraphs from a student
Measures student’s ability to synthesize, evalute, and compose
Two types – Restricted response
- Extended response
Restricted Response
Limits the form and content of response
Example
In a paragraph, describe two functions of the digestive system. (6 points)
Restricted Response
• Few boundaries
• More extensive response
• May want to limit leght (“use no more than two pages”)
Example
Nurture concise response, convey to students celar expectations for rsponse
- structure questions “name two” “list tree” “in a paragraph…”
Provide a value to each item
- “2 points” versus “5 points”
Proof read your items carefully
Is the language grade level appropriate?
Is the layout grade level appropriate?
Did you provide a space for name and date?
Check for aligment with intructianal objectives!
Make sure each item measures an intrustional obejctive
The assessment should consist of more than one item per objective
Evaluates the quality of each item
Rationale : the quality of items determines the quality of test (i.e., reliability & validity)
May suggest ways of improving the measurement of a test
Can help with understanding why certain tests predict some criteria but not others
• When analyzing the test item, we have several questions about the performance of each item. Some of these questions include :
•Are the item congruent with the test objectives? •Are the item valid? Do they measure what they supposed to measure? •Are the item reliable? Do they measure consistently? •How long does it take an examinee to complete each item? •What item are most difficult to answer correctly? •What item are easy? •Are they any poor performing items that need to be discarded?
Three major types :
1. Assess quality of the distracters
2. Assess difficulty of the items
3. Assess how well an item differentiates between high and low performers
To select the best available items for the final form of the test.
To identify structural or content defects in the items.
To detect learning difficulties of the class as a whole
To identify the areas of weakness of students in need of remediation
1. Examination of the difficulty level of the items.
2. Determination of the discriminating power of each item, and
3. Examination of the effectiveness of distractors in a multiple choice or matching items.
Index of difficulty is the percentage of students answering correctly each item in the test
Index of discrimination refer to the percentage of high-scoring individuals responding correctly versus the number of low-scoring individuals responding responding correctly to an item.
This numeric index indicates how effectively an item differentiates between the students who did well and those who did poorly on the test.
1. Arrange test score from highest to lowest.
2. Ger one-third of the papers from the highest scores and the other third from the lowest scores.
3. Record separately the number of times each alternative was chosen by the students in both groups.
4. Add the number of correct answers to each item made by the combined upper and lower groups.
5. Compute the index of difficulty for each item, following formula :
IDF = (NRC/TS)100
Where IDF = index of difficulty
NRC = number of students responding correctly to an item
TS = total number of an students in the upper and lower groups.
6. Compute thee index of discrimination, based on the formula :
IDN = (CU – CL)
NSG Where IDN = index of discrimination
CU = number of correct responses of the upper group
CL = number of correct responses of the lower group
NSG = number of student per group
The difficulty index of a test items tells a teacher about the comprehension of or performance on material or task contained in an item.
For an item to be considered a good item, its difficulty index should be 50%. An item with 50% difficulty index is neither easy nor difficult.
If an item has a difficulty index of 67.5%, this means that it is 67.5% easy and 32.5% difficult.
Information on the index of difficulty of an item can help a teacher decide whether a test should be revised, retained or modified.
Range Difficulty Level
20 & below
21-40 41-60 61-80
81 & above
Very difficult
Difficult Average
Easy Very easy
The Index Of Discrimination tells a teacher the degree to which a test item differentiates the high achievers from the low achievers in is class. A test item may have positive or negative discriminating power.
An item has a positive discriminating power when more student from the upper group got the right answer than those from the lowest group.
When more student from the upper group got the correct answer on an item than those from the upper group, the item has a negative discriminating power.
There are instance when an item has zero discriminating power – when equal number of students from upper and lower group got the right answer to a test item.
In the given example, item 5 has the highest discriminating power. This means that it can differentiate high and low achievers.
Range Verbal Description
.40 & above
.30 - .39
.20 - .29
.09 - .19
Very Good Item
Good Item Fair Item Poor Item
A test item can be retained when its level of difficulty is average and discriminating power is positive.
It has to rejected when it is either easy / very easy or difficult / very difficult and its discriminating power is negative or zero.
An item can be modified when its difficulty level is average and its discrimination index is negative.
An ideal item is one that all student in the upper group answer correctly and all students in the lower group answer wrongly. And the responses of the lower group have to be evenly distributed among the incorrect alternatives.
Encourage teachers to undertake an item analysis as often as practical
Allowing for accumulated data to be used to make item analysis more reliable
Providing for a wider choice of item format and objectives
Facilitating the revision of items
Accumulating a large pool of items as to allow for some items to be shared with the students for study purposes.
It cannot be used for essay items.
Teacher must be cautious about what damage may be due to the table of specifications when items not meeting the criteria are deleted from the test. These items are to be rewritten or replaced.
Generally, student who did well on the exam should select the correct answer to any given item on the exam.
The Discrimination Index distinguishes for each item between the performance of students who did poorly.
for each item, subtract the number in the lower group who answered correctly from the number of students in the upper group who answered correctly.
Divide the result by the number of students in one group.
The discrimination Index is listed in decimal format and ranges between -1 and 1.
1 90 20 0.7
2 80 70 0.1
3 100 0 1
4 100 100 0
5 50 50 0
6 20 60 -04
Item no.
Number of correct answers in group
Upper 1/4 Lower 1/4
Item Discriminati
on Index
Use the following table as a guideline to determine whether an item ( or its corresponding instruction) should be considered for revision.
Item Discrimination (D)
D = < 0%
0 % < D < 30 %
D > = 30 %
High Medium Low
review review review
ok review ok
ok ok ok
Item Difficulty
First question of item analysis : how many people choose each response?
If there only one best response, then all other response options are distracters.
Example from in class assignment (N=35):
Which method has best internal consistensy ?
a) Projective test 1
b) Peer ratings 1
c) Forced choice 21
d) Differences n.s. 12
A perfect test item would have 2 characteristics :
1. Everyone who knows the item gets it right
2. People who do know the item will have responses equality distributed across the wrong answer.
It is not desirable to have one of the distracters chosen more often then the correct answer.
This result indicates a potential problem with the question. This distracters may be too similar to the correct answer and /or these maybe something in either the stem or the alternatives that is misleading.
Calculate the # of people expected to choose each of the distracters. If random same expected number for each wrong response (Figure 10-1).
# of Persons N answering incorrectly 14
Exp. To Choose ___________________ = __ =4.7
Distracter number of distracters 3
When the number of person choosing a distracter significantly exceeds the number expected, these are 2 possibilities:
1. It is possible that choice reflects partial knowledge
2. The item is a poorly worded trick question
Unpopular distracters may lower item and test difficulty because it is easily eliminated
Extremely popular likely to lower the reliability and validity of the test
Compare the performance of the highest and lowest scoring 25% of the student on the distracter option (i.e. the incorrect answers presented on the exam)
Fewer of the top performers should choose each of the distracters as their answer compared to the bottom performers.
Item 1 A B C D E Omit
% of student in upper 1/4 20
5 0 0 0 0
% of student in middle 15
10
10
10
5 0
% of student in lower 1/4 5 5 5 10
0 0
Item 2 A B C D E Omit
% of student in upper ¼ 0 5 5 15
0 0
% of student in middle 0 10
15
5 20
0
% of student in lower 1/4 0 5 10
0 10
0
What is the purpose of a good distracter?
Which distracters should you consider throwing out?
Review the sample report.
Identify any exam items that may require revision.
For each identify item, list your observation and hypothesis of the nature of the problem.
Multiple Choice Exam Strategies
-improve odds by eliminating 1 or more infeasible or unlikely answer options
Description Exam Strategies
-brain dumping
-part marks
-consideration for perfect answers to questions that were not asked
Depends on the number
of answer options per question
and the number of questions!
1
2
4
6
10
20
50
Number of Questions 2 choice 3 choice 4 choice 5 choice
50 33 25 20
75 56 44 36
69 41 26 18
66 32 17 10
62 21 8 3
59 9.2 1.4 .3
56 1 .01 .0004
Percent Pass ( >50%) by Chance
Negative Marking…
- Elimination strategy reduces odds of wrong answer penalty
- subtracting a percentage of the number of wrong answer obtained from the final grade
- give a grade of 4 a correct answer and a score of – 1 for a wrong on a 4 choice question
- A score of less than zero is possible
-students hate negative marking
-negative marking is not practised in descriptive examinations
- A poor substitute for a test that is too short with too few answer options
Educational Measurement
and Evaluation
Myrna E. Lahoylahoy, Ph.D.
Process of quantifiying individual’s achievement, personality, attitudes, habits and skills
Quantification appraisal of observable phenomena
Process of assigning symbols to dimensions of phenomena
An operation peformed on the physical world by an observer
Process by which information about the attributes or characteristics of things are determined differentiated
Qualitative aspect of determining the outcomes of learning.
Process of ranking with respect to attributes or trait Appraising the extent of learning Judging effectiveness of educ. Experience Interpreting and analyzing changes in behavior Describing accurately quantity and quality of thing Summing up results of measurement or tests giving
meaning based on value judgments Systematic process of determining the extent to which
instructional objectives are achieved Considering evidence in the light of value standard
and in terms of particular situations and goals which the group of individuals are striving to attain
TESTING- a technique of obtaining information needed for evolution purposes
◦ Test, Quizzes, measuring, instruments- are devices
used to obtain such information
1. INSTRUCTIONAL
a)principal (basic purpose)
-to determine what knowledge, skills, abilities, habits and attitudes have been acquired
-to determine what progress or extent of learning attained
-to determine strengths, weaknesses, difficults and needs of students
1. Evaluation assesses or make appraisal of -Educational objectives, programs, curricula, instructional materials, facilities
- teacher
- Learner
-Public relations of the school
- achievement scores of the learner
2. Evaluation conducts research
Evaluation should be
1. Based on clearly stated objectives
2. Comprehensive
3. Cooperative
4. Used Judiciously
5. Continuous and integral part of the teaching-learning process
1. Diagnostic Evaluation-detects pupil’s learning difficulties which somehow are not revealed by formative tests. It is more comprehensive and specific
2. Formative Evaluation- it provides feedback regarding the student’s performance in attaining instructional objectives. It identifies learning errors that needed to be corrected and it provides information to make instruction more effective
3.Placement evaluation- it defines student’s entry behaviors. It determines knowledge and skills he possesses which are necessary at the beginning of instruction
4. Summative Evaluation-it determines the extent to which objectives of instruction have been attained and is used for assigning grades/marks and to provides feedback to students.
1. VALIDILITY
content, concurrent, predictive, construct
2. RELIABILITY
adequacy, objectivity, testing condition,
test administration procedures
3. USABILITY
(practicality) ease in administration, scoring, interpretation and application, low cost, proper mechanical make-up
Content validity- face validity or logically validity used in evaluating achievement test
Concurrent validity- test agrees with or correlates with a criterion (ex. Entrance examination)
predictive validity-degree of accuracy of how activity which it intends to foretell
Construct validity-agreement of the test with a theoretical construct or trait (ex.IQ)
Methods of estimating reliability
1. Test –retest Method (uses spearmen rank correlation coefficient)
2. Parallel forms/alternate forms (paired observations are correlated)
3. Split-half method (odd-even halves and computed using spearmen brown formula)
4. Internal-consistency method (kuder - Richardson formula 20)
5. Scorer reliability method(two examiners independently score a set of test papers then correlate their scores)
1. Standard Tests
a) psychological test-intelligence test, Aptitude test, Personality (rating scale)
test, vocational and professional interest inventory
b) Educational Test
2. Teacher-made test
planning, Preparing, Reproducing, Administering , Scoring, Evaluating, Interpreting
Norm-Reference Tests It compares a student’s performance of other students in the class It uses the normal curve in distributing grades of students by placing them either above or below the mean. The teacher’s main concern is the variability of the score. The more variable the score is the better because it can determine how individual differs from the other. Uses percentiles and standard scores It tends to be of average difficulty.
Measures of central
Tendency
Mean, Median, Mode
Measures of Variability
Range, Quartile Deviation, Standard Deviation
MODE-the crude of inspectional average measure. It is most frequently occurring score. It is the poorest measure of central tendency. Advantage: Mode is always a real value since it does not fall on zero. It is simple to approximate by observation for small cases. It does not necessitates arrangement of values. Disadvantage: it is not rigidly defined and is inapplicable to irregular distribution What is the mode of these scores? 75,60,78,75 76 75 88 75 81 75
MEDIAN-the scores that divides the distribution into halves. It is sometimes called the counting average.
Advantage : it is the best measure when the distribution is irregular or skewed. It can be located in an open – ended distribution or when the data is incomplete (ex. 80% of the cases is reported)
Disadvantage: It necessitates arranging of items according to size before it can be computed
What is the medium?
75,60,78,75 76 75 88 75 81 75
MEAN-The most widely used and familiar average. The most reliable and the most stable of all measures of central tendency
Advantage: It is the best measure for regular distribution.
Disadvantage: It is affected by extreme values
What is the mean?
75,60,78,75 76 75 88 75 81 75
It is the most important and the best measure of variability of test scores.
A small standard deviation means that the group has small variability or relatively homogeneous.
It is used with mean.
Letter grade Criterion-Referenced
Norm-referenced
Self-referenced
B Very Good or Proficient; complete knowledge of most content, skills ; mastery of most objectives
Very Good; performs above the average of the class
Very Good; some improvement on most or all the objectives
C Acceptable or basic; command of only the basic content skills; mastery of some objectives
Average; performs at the class average
Acceptable; some improvement on some of the objectives
Letter Grade Criterion-referenced
Norm-referenced
Self-Referenced
D Lacking ; little knowledge of most content; master of only a few objectives
Poor ; below the class average
Lacking ; minimal progress on most objectives
F Unsatisfactory; lacks knowledge of content; no mastery of objectives
Unsatisfactory; far below the class average; among the worst in the class
Unsatisfactory; no improvement on any objectives.
- What meaning should each grade symbol carry?
- What should “failure” mean?
- What elements or performances should be incorporated?
- How should the grades in a class be distributed?
- What should the components be like that go into a final grade?
- What method should be used to assign grades?
- Should borderline cases be reviewed?
- What other factors can influence the philosophy of grading?
Grade: A symbol that represents the degree to which students have met a set of well-defined instructional objectives.
Absolute Grading: Absolute grading, or criterion-referenced grading, consists of comparisons between a student’s performance and some previously defined criteria. Thus, student’s are not compared to other students. When using absolute grading, one must be careful in designing the criteria that will be used determine the student’s grades.
Relative Grading:
-relative grading, or norm-referenced grading
-consists of comparisons between a student and others in the same class, the norm group.
- those that perform better than most other students that will be assigned certain grades.
. If using the normal curve in relative grading then 3.6% of the students should be assigned As, 23.8%Bs, 45.2%Cs, 23.8%Ds, and 3.6% Fs.
-emphasizes competition among group members and does not accurately reflect any objective level of achievement.
Growth Grading : (self-referenced grading)
-consists of comparisons between a student’s performance and their perceived ability/capability.
. Overachievers would be assign highed grades, while underachievers would be assigned lower grades.
-Growth grading, while de-emphasizing competition, tends to produce invalid grades relative to achievement levels.
Advantages easy to use
Easy to interpret(theoretically)
Concise
• Disadvantages
Meaning of a grade may very widely
Does not address strengths & weaknesses
K-2 student’s may feel threatened by them
Advantages
- Easy to use.
- Easy to interpret (theoretically)
- Concise
- More continuous than Letter Grades
- May be combined with Letter Grades
Disadvantage
- Meaning of grade may vary widely
- Does not address strengths & weaknesses.
- K-2 students may feel threatened by them
- Meaning may need to be explained/interpreted.
Advantages
- less emotional for younger students.
-can encourage risk taking for students that may not want to take the course for a grade
Disadvantages - Less reliable than a continuous measure
- Does not contain much information relative to a student’s achievement.
Advantages
- results in a detailed list of student achievements.
- may be combined with other measures.
Disadvantages - may become too detailed to easily comprehend.
-Difficult for record keeping
Advantages - Involves a personal discussion of achievement. - May be used as a formative, ongoing measure Disadvantages
- Teachers needs to be skilled in discussion and offering+ and-feedback. - Time consuming. -Some students may feel threatened. - Difficult for record keeping.
Advantages - Involves personal discussion of achievement and may alleviate misunderstanding. -Teacher can show samples of work and rational for assessment. -May improve relations with parents. • Disadvantages - teachers need to be skilled in discussion and offering=and- feedback -time consuming -may provoke parent-teacher anxiety -may be inconvenient for parents -Difficult for record keeping
Advantages
- most useful as an addition form of communication
Disadvantages - short letters may not adequately communicate a student’s achievement.
- require good writing skills
-time consuming.
Discuss with students ( and parents when approprite0 the basis of all grading, and all grading procedures, at the beginning of the course/school year
Grades should reflect, and be based on, student’s level of achievement, using only those assessments that validly measure achievement
Grade should reflect, and be based on, a composite of several valid assessments.
When combining several valid assessments, each assessment should be appropriately weighted
An appropriate type of grading framework should be adopted, given the ultimate use of the grade.
All borderline grades should be re-evaluated based on a careful examination of all achievement evidence
Emphasize fair grading and scoring.
Grade relative to specific learning objectives.
Base grades primarily on current performance.
Provide accurate, timely and helpful feedback.
Use a sufficient number of assessments.
Don’t lower grades due to misbehaviors or attendances.
Use professsional judgment.
Harmful to a students psyche.
Do not motivate but may provide disincentive
Mastery may not be the purpose of the activity-or 100% performance may be necessary
Performance may be necessary to determine acquisition of skill (e.g.,piano, computer)
Written activities do not emphasize oral communication which may be a more functional skill
There are vast differences in grading practices between teachers and schools.
Most schools lack a standardized and codified grading policy.
A grade, a simple symbol, is incapable of conveying the complexity of a student’s achievement.
Grading is not always valued by teachers and thus often suffers from carelessness.
Teachers often use grading as form of discipline and motivation, rather than as an assessment report
Select content
Develop of instructional strategy
Develop and select instructional materials
Constructs tests and other instruments for assessing and evaluating
Improve you as a teacher, and our overall program
Learning outcomes Formula
Bloom’s Taxonomy
Characteristic of Good Learning Outcomes
Learning Outcomes Exercise
Write your Learning Outcomes
5 Questions for Instructional Design
1. What do you want the student to be able to do? (outcome)
2. What does the student need to know in order to do this well? (curriculum)
3. What activity will facilitate the learning? (pedagogy)
4. How will the student demonstrate the learning? (assessment)
5. How will I know the student has done this well? (criteria)
This question asks you to develop the outcome.
For example:
Students identifies, consults and evaluates reference books appropriate to the topic in order to locate background information and statistics.
Bad Outcome
- Use Illiad and Texshare in order to access materials not available at UT Arlington Library.
Good Outcome
- Utilize retrieval services in order to obtain materials not owned by UT Arlington library.
Bad Outcome
- Students will construct bibliographies and in-text references using discipline appropriate styles in order to contribute to academic discourse in their discipline.
Good Outcome
- Construct bibliographies and in-text references using discipline appropriate styles in order to correctly attribute other’s work and ideas.
We’re taking a friend camping for the first time (not roughing it too much).
What do they need to know?
We’ll concentrate on how to build a fire
Why do we want our friend to be able to properly build a fire?
Now let’s write the learning outcome
What is our verb (use Bloom’s)
Why?
A test is reliable when it yields consistent results. To establish reliability researchers establish different procedures :
1. Split-half Reliability: Dividing the test into two equal halves and assessing how consistent the scores are.
2. Reliability using different tests: Using different forms of the test to measure consistency between them.
3. Test-Retest Reliability : Using the same test on two occasions to measure consistency.
Reliability of a test does not ensure validity. Validity of a test refers to what the test is supposed to measure or predict.
1. Content Validity: Refer to the extent a test measures your definition of
the construct 2. Criterion-related validity: Relationship between scores on a test and an
independent measure of what the test is supposed to measure
1. predictive Validity: Refers to the function of a test in predicting a particular behavior or trait. For instance, we might theorize that a measure of math ability should be able to predict how well a person will do in an engineering-based profesion.
2. Convergent Validity: we examine the degree to which the operationalization is similar to (converges on) other operationalizations we might correlate the scores on our test with scores on other tests that purport to measure basic math ability, where high correlations would be evidence of convergent validity.
Test score distribution
Test score distribution (average group)
Test score distribution (poor group)
Test score distribution (good group)
Assessment – The process of measuring something with the purpose of assigning a numerical value.
Scoring – The procedure of assigning a numerical value to assessment task.
Evaluation – The process of determining the worth of something in relation to established benchmarks using assessment information.
Formative – for performance enhancement
Summative - for performance enhancement
Formal – quizzes, tests, essays, lab reports,etc.
Informal – active questioning during and at end of class
Traditional – tests, quizzes, homework, lab reports, teacher
Alternative – PBL’s, presentations, essays, book reviews, peers
Alternative to what? Paper & pencil exams Alternatives: -lab work / research projects -portfolios -presentations -research paper -essays -self-assessment / peer assessment -lab practical -classroom “clickers” or responder pads
Rube Goldberg projects Bridge building / rocketry / mousetrap cars Writing a computer program Research project Term paper Create web page Create movie Role playing Building models Academic competitions
Quick-fire questions
Minute paper
1) what did you learn today?
2) what questions do you have?
Directed paraphrasing (explain a concept to a particular audience)
The “muddiest” point (what is it about the topic that remains unclear to you?)
The National Science Education Standards draft (1994) states, “Authentic assessment exerxuces require students to apply scientific information and reasoning to situations like those they will encounter in the world outside the classroom as well situations that approximate how scientists do their work. ”
Validity – is the test assessing what’s intended?
- are test items based on stated objectives? - are test items properly constructed? Difficulty – are questions too hard? (e.g.,30%
to 70% of students should answer a given item correctly)
Discriminability – are the performance on individual test item positively correlated with overall student performances? (e.g., only best students do well on most difficult questions)
Based on a predetermined set of criteria.
For instance,
-90% and up = A
-80% to 89.99% =B
-70% to 79.99 =C
-60% to 69.99% =D
-59.99% and below =F
Pros:
Sets minimum performance expectations.
Demonstrate what students can and cannot do in relation to important content-area standards (e.g, ILS)
Cons:
Some times it’s hard to know just where to set boundary conditions.
Lack of comparison data with other students and/or schools.
Based upon the assumption of a standard normal (Gaussian) distribution with n>30.
Employs the z score:
- A = top 10% (z>+1.28)
- B = next 20% (+0.53<z<+1.28)
- C = central 40% (-0.53<z<+0.53)
- D = next 20% (-1.28<z<-0.53)
- F = bottom 10% (z<-1.28)
Pros: -Ensures a “spread” between top and bottom of
the class for clear grade setting. -Shows student performance relative to group. -Con: In a group with great performance, some
will be ensured an “F”. Cons: -Top and bottom performances can sometime be
very close. -Dispenses with absolute criteria for
performance. -Being above average does not necessarily imply
“A” performance.
Norm-Referenced:
-Ensures a competitive classroom atmosphere
-Assumes a standard normal distribution
-Small-group statistics a problem
-Assumes “this” class like all others
Criterion-Referenced:
-Allows for a cooperative classroom atmosphere
-No assumptions about form of distribution
-Small group statistics not a problem
-Difficult to know just where to set criteria