ETE511_C3_BI

Embed Size (px)

Citation preview

  • 7/27/2019 ETE511_C3_BI

    1/13

    Limitations of

    TestsCHAPTER

    LEARNING OUTCOME

    Upon completion of this chapter, you should be able to:

    1. Identify and describe the major limitations to tests andmeasurements in TESL;

    2. Compare traditional assessment with alternative assessments; and

    3. Describe two forms of alternative assessments and their benefitsand shortcomings.

  • 7/27/2019 ETE511_C3_BI

    2/13

    CHAPTER 3 LIMITATIONS OF TESTS .........................

    1

    INTRODUCTION

    Tests are not perfect measures and certainly have limitations. These limitations should remind usto be careful in interpreting test scores and placing complete trust in test scores. Five majorlimitations of tests and measurements will be discussed in this chapter. Additionally, alternative

    forms of assessment which have been suggested in order to overcome these limitations will alsobe discussed in this chapter.

    Five major limitations identified and discussed by Bachman (1990), are as follows:

    (a) Subjectivity

    (b)Under specification of domain

    (c) Incompleteness

    (d)Indirectness

    (e) Imprecision

    Each of these limitations are described and discussed in the following paragraphs.

    The most obvious form of subjectivity in tests is seen in grading tests that are in the supply typeor subjective format such as essays and even interviews. This issue has been addressed at somelength in previous chapters. However, subjectivity does not refer only to grading but also to

    other elements of the test as well. Even when the test is an objective, select or multiple choicetype test, there is still some amount of subjectivity. This subjectivity is found in the selection ofpassages and item formats as well as content that is to be tested. In a test that contains a readingcomprehension passage, for example, why was one passage selected over another? Was itbecause of the content? If so, then there must surely be many passages on one content. Thequestion then is why the use of one passage and not another on the same content? The answer isthat decisions that affect the test and its ability to precisely measure are made by individuals.

    There is some degree of subjectivity involved in these decisions.

    A test is a measurement of some content. This content can be referred to as the domain or theconstruct of the test. However, while it may be quite easy to specify the domain to be listeningcomprehension, for example, it is not as easy to test or measure the domain. When any kind oftheoretical domain or construct is operationalised, there is bound to be some aspect of thedomain that cannot be translated into a test. The test therefore under-specifies the domain. It isthis under-specification of domain that limits a test as a measure of ability, knowledge or samplebehaviour.

    3.1 LIMITATIONS OF TEST AND MEASUREMENT IN TESL

    3.1.1 SUBJECTIVITY

    3.1.2 UNDER-SPECIFICATION OF DOMAIN

  • 7/27/2019 ETE511_C3_BI

    3/13

    LIMITATIONS OF TESTS CHAPTER 3

    2

    Incompleteness refers to the students inability to demonstrate the entire repertoire of theconstruct being measured. As a test is constrained by time and physical setting, a student willnever be able to show all of what he or she is able to do. Because only a few questions can be

    asked in a test due to time constraints, these questions may not be able to elicit the students trueor complete ability. Similarly, the constraints placed by the physical setting of the test may alsorestrain the student from demonstrating specific kinds of abilities. As such, we should take notethat even when a student scores zero points in a test, this does not mean that he or she iscompletely ignorant of the subject or ability being tested. It is just that the test has not elicitedthe knowledge or abilities that the student is able to convey or perform.

    While we are aware of the importance of having direct tests, it is unlikely that a test will be

    completely free of being an indirect measure of ability. This limitation is inherent in the testingsituation itself. Many of us have gone through test anxiety. Once the word test or assessment ismentioned, the entire situation changes. While some students will be able to speak well insituations outside the classroom, they lose this ability once they become aware that they are beingtested. In addition to this, every test situation has elements that are not related to the constructbeing tested. This is referred to as construct irrelevant variance by Messick (1989) and examplesmay include the test rubrics or instructions, time constraints, and other rules and regulations ofthe test. All these are not present in the actual real-world situation and must be considered asaspect of indirectness. As such, we can only conclude that the test situation is indirect because itis inauthentic. And by being indirect, it fails to capture the true ability of the students if they wereto perform in the real world.

    Finally, we need to acknowledge that there is a degree of imprecision in all tests. While we maybe able to justify some of the weightage in marks or points given to some items, we will never beable to be completely accurate and just. Even in a situation where there are twenty multiplechoice items, each assigned one point, it is almost impossible to claim that each one of thetwenty items are of equal difficulty. As such, we will not be able to justify equal weightage of onepoint for each item. It is this imprecision that must be acknowledged as another constraint oftests.

    In addition to the above, Herman et al. (1992) also point out other limitations such as themismatch between test content and curriculum and instruction; the over emphasis on routineand discrete skills to the neglect of complex thinking and problem solving skills; and the limitedrelevance of major test formats such as the multiple choice format to either classroom or real-

    world learning (pp. 5-6).

    As a teacher, is it important that you take into consideration the

    limitations when conducting tests and measurements? Why and what

    are the consequences if fail to do so?

    3.1.3 INCOMPLETENESS

    3.1.4 INDIRECTNESS

    3.1.5 IMPRECISION

  • 7/27/2019 ETE511_C3_BI

    4/13

    CHAPTER 3 LIMITATIONS OF TESTS .........................

    3

    Alternative assessments are assessment procedures that differ from the traditional notions andpractice of tests with respect to format, performance, or implementation. It is likely thatalternative assessment found its roots in writing assessment because of the need to providecontinuous assessment rather than a single impromptu evaluation (Alderson & Banerjee, 2001).

    Hamayan (1995), considers alternative assessments to be procedures and techniques

    which can be used within the context of instruction and can be easily incorporated into the

    daily activities of the school or classroom(p. 213).

    As the term indicates, alternative assessments are assessment proposals that present alternativesto the more traditional examination formats. They have become more popular of late because ofsome doubts raised regarding the ability of traditional assessment to elicit a fair and accuratemeasure of a students performance. Alternative assessment brings together with it a completeset of perspectives that contrast against traditional tests and assessments. Table 3.1 illustratessome of the major differences between traditional and alternative assessments.

    Table 3.1:Contrasting Traditional and Alternative AssessmentSource:Adapted from Bailey (1998:207 and Puhl, 1997: 5)

    Traditional Assessment Alternative Assessment

    One-shot tests Continuous, longitudinal

    assessment

    Indirect tests Direct tests

    Inauthentic tests Authentic assessment

    Individual projects Group projects

    No feedback to learners Feedback provided to learners

    Speeded exams Power exams

    Decontextualised test tasks Contextualised test tasks

    Norm-referenced score reporting Criterion-referenced score

    reporting

    You have previously read on the five major limitations of tests and

    measurements. What other alternative methods of assessment can

    you think of?

    3.2 ALTERNATIVE ASSESSMENT

  • 7/27/2019 ETE511_C3_BI

    5/13

    LIMITATIONS OF TESTS CHAPTER 3

    4

    Standardised tests Classroom-based tests

    Summative Formative

    Product of instruction Process of instruction

    Intrusive Integrated

    Judgmental Developmental

    Teacher proof Teacher mediated

    In discussing alternative assessments, Herman et al. (1992: 6) list several of their commoncharacteristics. They describe alternative assessments as performing the following:

    (a)Ask the students to perform, create, produce, or do something.

    (b)Tap higher-level thinking and problem-solving skills.

    (c) Use tasks that represent meaningful instructional activities.

    (d)Invoke real-world applications.

    (e) People, not machines, do the scoring, using human judgment.

    (f) Require new instructional and assessment roles for teachers.

    Alternative assessments are suggested largely due to a growing concern that traditionalassessments are not able to accurately measure the ability we are interested in. They are also seento be more student centred as they cater for different learning styles, cultural and educationalbackgrounds as well as language proficiencies.

    Nevertheless, although alternative assessments are compatible with the contemporary emphaseson the process as well as product of learning (Croker, 1999), several shortcomings of alternativeassessments have been noted.

    Perhaps one of the major limitations of alternative assessments is that accounts of the benefits ofalternative assessment tend to be descriptive and persuasive, rather than research-based (Alderson &Banerjee, 2001: 229). Alternative assessments are also said to be limited to the classroom and hasnot become part of mainstream assessment. Brown and Hudson, in advocating alternativeassessment, seem to have taken a safer approach by suggesting the term alternatives inassessment. They believe that educators should be familiar with all possible formats ofassessment and decide on the format that best measures the ability or construct that they areinterested in. Hence, these alternatives would include all possible assessment formats bothtraditional and informal.

    Despite these limitations, alternative assessments present a viable and exciting option in elicitingand assessing the students actual abilities. At present, there are a number of test formats that are

  • 7/27/2019 ETE511_C3_BI

    6/13

    CHAPTER 3 LIMITATIONS OF TESTS .........................

    5

    considered alternative assessment formats. Figure 3.1 provides a list of several of the morecommon formats in alternative assessment.

    Tannenbaum (1996), comments that alternative assessments focus on documenting individualstrengths and development which would assist in the teaching and learning process.

    Figure 3.1:Sample formats in alternative assessmentSource: (Tannenbaum, 1996; Short, 1993)

    In this chapter, however, only two of these formats will be further discussed in order to providea glimpse of what alternative assessment can provide.

    Perhaps the most well known of alternative assessments is the portfolio assessment. Theportfolio, although relatively new in language teaching and assessment, is actually quite acommon form of assessment as many professions place great importance on the development ofpersonal portfolios. Architects and artists, for example, develop their portfolios in order to showpotential customers or employers their work. The contents of the portfolio become evidence oftheir abilities much like how we would use a test to measure the abilities of our students.

    They stress that the collection must include criteria for judging Paulson, Paulson & Meyer (1991),define a portfolio as a purposeful collection of student work that exhibits the students efforts, progress andachievements in one or more areas(p.60).merit as well as evidence of student participation in selecting

    content and in self-reflection. A portfolio is therefore not simply a file folder or manila cardfolder containing a hotch-potch collection of student work but is a careful selection of their

    3.2.1 PORTFOLIOS

    Physical demonstration

    Dialogue journals

    Pictorial products

    Checklist

    K-W-L (what I know/what I

    want to know/what Ive

    learned charts

    Reading response logs

    Teacher-pupil conferences

    Interviews

    Self assessment

    Peerassessment

    Performance tasks

    Portfolios

  • 7/27/2019 ETE511_C3_BI

    7/13

    LIMITATIONS OF TESTS CHAPTER 3

    6

    work. We will see that the portfolio not only provides a source for assessment but also learningopportunities. Bailey (1998, p: 218), describes a portfolio to contain four primary elements.

    First, it should have an introduction to the portfolio itself which provides an overview to thecontent of the portfolio. Bailey even suggests that this section include a reflective essay by the

    student in order to help express the students thoughts and feelings about the portfolio, perhapsexplaining strengths and possible weaknesses as well as explain why certain pieces are included inthe portfolio.

    Secondly, she argues that portfolios should have what she refers to as an academic works section.This section is meant to demonstrate the students improvement or achievement in the major skill areas(p. 218).

    The third section is described as a personal section in which students may wish to include theirjournals, score reports of tests that they have sat for, as well as photographs and other items thatillustrate their experiences with as well as achievements in the English language.

    Finally, an assessment section may contain evaluations made by peers, teachers as well as selfevaluations.

    Table 3.2:Contents of a PortfolioSource: Adapted from Bailey (1998: 218)

    Introductory Section Academic Works Section

    Overview Reflective Essay Samples of best work Samples of work

    demonstrating development

    Personal Section Assessment Section

    Journals Score reports

    Photographs Personal items Evaluation by peers Self-evaluation

    The portfolio can be said to be a students personal documentation that helps demonstrate his orher ability and successes in the language. It may even require students to consciously select items

    that can document their own progress as learners. The actual compilation of the content of theportfolio is in itself a learning experience. Some suggest that students should attach a shortreflection on each piece or item placed in the portfolio. Portfolio assessment, therefore, is both alearning and assessment experience. This dual function can be considered as one of the benefitsof portfolio assessment.

    Brown and Hudson (1998), summarise several other advantages in using portfolios in assessment.They discuss these advantages according to how the portfolio strengthens students learning,enhances the teachers role and improves the testing process. With respect to testing, theadvantages of using portfolio as an assessment instrument are listed as follows (pp.664-665):

  • 7/27/2019 ETE511_C3_BI

    8/13

    CHAPTER 3 LIMITATIONS OF TESTS .........................

    7

    (a) enhances student and teacher involvement in assessment;

    (b)provides opportunities for teachers to observe students using meaningful language;

    (c) to accomplish various authentic tasks in a variety of contexts and situations;

    (d)permit the assessment of the multiple dimensions of language learning;

    (e) provide opportunities for both students and teachers to work together and reflect onwhat it means to assess students language growth;

    (f) increase the variety of information collected on students;

    (g) make teachers ways of assessing student work more systematic.

    However, portfolios are not without problems. It should especially be noted that portfolios canbecome rather problematic assessment devices when they are used on a large scale especially

    with respect to grading. Brown and Hudson (1998), also point out a number of other concernsthat are related to the design, logistics, interpretation, reliability and validity of portfolioassessment. These concerns involve the design, logistics, interpretation, and reliability of theportfolio assessment.

    The design of the portfolio is an issue because it is quite subjective. A portfolio must beconsidered as a personal student product. If the teacher becomes completely involved indetermining the content of the portfolio especially with regard to which student work should be

    included - some of the benefits of the portfolio will be lost. Hence, questions such as who willdetermine grading criteria, how the grading criteria will be established, who determines what theportfolio will contain, and how much of the daily authentic classroom activities should beincluded all become difficult design issues that the portfolio assessment needs to contend with.

    Logistically, the portfolio also poses several real problems. This includes the increased time andresources needed not only in developing the portfolio, but also in assessing it. Another concern

    will be the need to train teachers to assess the portfolio fairly and accurately.

    Portfolios are also problematic in terms of their interpretation. This includes the setting ofstandards and criteria for grading portfolios. Assessing a portfolio will involve evaluating a

    students personal interests. It may not be appropriate for a teacher to consider an item as havingno value when it is invaluable to the person involved. This is the dilemma that teachers willdefinitely face when they attempt to interpret and proceed to assess a students personalportfolio. Similarly, reporting a portfolio assessment result will also be problematic in that inmost cases, it can only be in the form of suggestions to the student rather than clear indicationsof strengths and weaknesses.

    Finally, there is an obvious problem of reliability. It will be difficult to maintain a high inter raterreliability with portfolio assessment. Due to the many different pieces or items included in theportfolio, the tendency will be that a high inter-rater reliability will be even more difficult toachieve than with written essays.

    While these problems hinder the increased use of portfolios, it should not deter us from using

  • 7/27/2019 ETE511_C3_BI

    9/13

    LIMITATIONS OF TESTS CHAPTER 3

    8

    the portfolio at least in a controlled manner in our classrooms. We should remember that theportfolio is not only an assessment tool but is also a learning experience. Even when assessmentmay have some problems related to testing and assessment, it may still benefit the students intheir learning. Furthermore, some of the problems related to the portfolio raised in this sectionmay actually be addressed by self assessment - an assessment technique discussed in the

    following section.

    Two other common forms of alternative assessment are the self-assessment and peer-assessmentprocedures. Both these forms of assessment are strongly advocated by Puhl as she believes thatthey are essential to continuous assessment, a cornerstone to alternative assessment. The benefitsof self and peer assessment are especially found in formative stages of assessment in which thedevelopment of the students abilities are emphasised.

    Black and William (1998), point out that all students in their study said that work involving self

    and peer assessment made them think more and that a large proportion of the students (85%)said that it made them learn more (p. 29). Self assessment can take several forms, including theuse of a yes-no checklist, a Likert-type scale or even an open-ended format. Self assessment,however, should be distinguished from self-marking in which students mechanically check theirown answers (Freeman and Lewis, 1998).

    Self appraisals are also thought to be quite accurate and are said to increase student motivation.Puhl (1997), describes a case study in which she believes self-assessment forced the students toreread and thereby make necessary editing and corrections to their essays before they handedthem in. Nevertheless, in order for self assessment to be useful and not a futile exercise, thelearners need to be trained and initially guided in performing their self assessment. This training

    involves providing students with the rationale for self assessment and how it is intended to workand how it is capable of helping them. Brooks (2002: 70-72) lists several other aspects of trainingsuch as:

    (a) teacher modelling of the use of metacognitive processes and skills;

    (b)student practice of their assessment skills;

    (c) introduction to relevant assessment criteria;

    (d)clarification of abstract assessment criteria;

    (e) the use of self assessment during rather than at the end of an instructional unit.

    3.2.2 SELF ASSESSMENT AND PEER ASSESSMENT

    (a) In your opinion, what are the advantages of usingportfolios as a form of alternative assessment?

    (b) Look at the characteristics of alternative assessment asopposed to traditional assessment in Table 3.1. How many of

    these characteristics accurately describe a portfolio?

  • 7/27/2019 ETE511_C3_BI

    10/13

    CHAPTER 3 LIMITATIONS OF TESTS .........................

    9

    In order to conduct such training for his or her students, the teacher must be conversant withthe concept of self assessment. This is an important prerequisite as some teachers may not beclear of what self assessment entails and hence dismiss the importance of this assessmenttechnique.

    In language teaching and learning, self assessment is relevant in assessing all the language skills.An example of the self assessment of the listening skill, especially in the comprehension ofquestions asked is suggested by Cohen (1994), as follows:

    These questions are useful in the formative stages of assessment as it helps students identify theirown strengths and weaknesses and respond accordingly. Through asking these types of selfassessment questions, the students are expected to become more sensitive to their own learningand ultimately perform better in the final summative evaluation at the end of the instructionalprogramme.

    Luoma and Tarnanen (2003), provide an interesting description of the use of benchmarks in self

    assessment. Their project involved a self-rating instrument that was part of DIALANG, adiagnostic language assessment system for 14 languages for the internet. In the self-ratinginstrument, students write a text and compare it to benchmarks that represent six different levelsor bands. They then determine their level on the basis of the different benchmarks. Luoma and

    Tarnanen (2003) further compared student self-ratings to teacher ratings and found thatalthough the students tended to rate themselves fairly high on the scale, the self-ratings werefairly realistic and tended to match teacher ratings. They report that most of the mismatches betweenteacher and self-ratings were overestimations (p. 452). They consider that the tendency tooverestimate abilities may be due to the influence of student background variables.

    The validity of self assessment can be affected by various factors. Based on their review of

    sixteen studies on self assessment, Blanche and Merino (1989) contend that five of these factorsare the:

    (a) student lack of training in how to perform self assessment;

    (b) lack of a generally accepted criteria for learner self ratings and subsequent teacherinterpretations of ratings;

    (c) conflict between the students cultural background and the culture of self assessment;

    (d)intervening related variables such as students professional aspirations, or academictraining;

  • 7/27/2019 ETE511_C3_BI

    11/13

    LIMITATIONS OF TESTS CHAPTER 3

    10

    (e) student inability to accurately perform aspects of self assessment such as to report onsubconscious behaviour or to report post hoc on their performance.

    These observations underscore the importance of proper student training in self assessment.They also indicate that different types of self assessment techniques should be used in order toaccommodate the different students involved.

    Peer assessment differs from self assessment in that it involves the social and emotionaldimensions to a much greater extent. Peer-assessment can be defined as a response in someform to other learners work (Puhl, 1997). It can be given by a group or an individual and it cantake any of a variety of coding systems: the spoken word, thewritten word, checklists, questionnaires, nonverbalsymbols, numbers along a scale, colours, etc. (p.8) Peer assessment requires that a student take up therole of a critical friend to another student in order to support, challenge, and extend each otherslearning(Brooks, 2002: 73). Among the reported benefits of peer assessment are as follows:

    (a) remind learners they are not working in isolation;

    (b)help create a community of learners;

    (c) improve the product (Two heads are better than one);

    (d) improve the process; motivates, even inspires;

    (e) help learners be reflective;

    (f) stimulate meta-cognition.

    Each of these benefits have real world importance as they all stress how awareness of the otherperson or the peer can actually change perceptions of how the individual should work in asociety or community.

    While the potential benefits of both self and peer assessment are quite apparent, especially in thecontext of todays educational emphasis on self directed, independent and autonomous learning,the preparation and implementation of both forms of assessment need to be done with

    considerable care. In addition to the type of training required as mentioned earlier, correctattitudes towards these forms of assessment must also be formed. Students, for example, need todevelop the correct interpersonal skills such as attentive listening and respectful questioningtechniques in order for peer assessment to proceed without obstacles.

    Negative views and perceptions which may exist about these assessment techniques such as theviews that self assessment is a way of reducing teachers marking burden, as well as that it lowersstandards must also be addressed. It may not be sufficient to consider these negative views asmisconceptions or confusion with other activities such as self marking as suggested by Brooks(2002), and noted earlier in this section. Teachers must once again prove their mettle and providethe necessary conditions and accompanying training in order for these alternative types of

    assessment to succeed.

  • 7/27/2019 ETE511_C3_BI

    12/13

    CHAPTER 3 LIMITATIONS OF TESTS .........................

    11

    SUMMARY

    This chapter has highlighted limitations of tests with regard to incompleteness, indirectnessand imprecision. It also discussed what other alternatives we have to assess our students

    performances, such a using self-assessments or peer assessments and portfolios.

    http://www.2dix.com/pdf-2011/testing-and-evaluation-in-esl-pdf.php

    Understanding Language Tests and Testing Practiceshttp://www.tesol.org/s_tesol/docs/3500/3467.pdf

    Truth about Testing: An Educator's Call to Actionhttp://site.ebrary.com/lib/aeu/Doc?id=10044769&ppg=4

    TOEFL Structure & Skills for iBT success!http://www.youtube.com/watch?v=Em0woQvskjY

    http://www.2dix.com/pdf-2011/testing-and-evaluation-in-esl-pdf.phphttp://www.2dix.com/pdf-2011/testing-and-evaluation-in-esl-pdf.phphttp://www.2dix.com/pdf-2011/testing-and-evaluation-in-esl-pdf.phphttp://www.tesol.org/s_tesol/docs/3500/3467.pdfhttp://www.tesol.org/s_tesol/docs/3500/3467.pdfhttp://site.ebrary.com/lib/aeu/Doc?id=10044769&ppg=4http://site.ebrary.com/lib/aeu/Doc?id=10044769&ppg=4http://www.youtube.com/watch?v=Em0woQvskjYhttp://www.youtube.com/watch?v=Em0woQvskjYhttp://www.youtube.com/watch?v=Em0woQvskjYhttp://site.ebrary.com/lib/aeu/Doc?id=10044769&ppg=4http://www.tesol.org/s_tesol/docs/3500/3467.pdfhttp://www.2dix.com/pdf-2011/testing-and-evaluation-in-esl-pdf.phphttp://www.2dix.com/pdf-2011/testing-and-evaluation-in-esl-pdf.php
  • 7/27/2019 ETE511_C3_BI

    13/13

    LIMITATIONS OF TESTS CHAPTER 3

    12

    GLOSSARY

    Alternative assessment Alternative assessment refers to non-traditional assessment andoften involves assessment formats such as the portfolio, simulation,

    and other forms of generally subjective type tests.

    Peer evaluation Peer evaluation is evaluation that involves providing a response insome form to the work of a peer or fellow student and may beperformed individually or in groups and may take any of a variety ofcoding systems.

    Portfolio A portfolio can be defined as a purposeful collection of student work thatexhibits the students efforts, progress, and achievements in one or more areas(Paulson, Paulson & Meyer, 1991,p. 60) and is most often collected

    by the student himself or herself.

    Self assessment Self assessment refers to assessment in which the student assesseshimself or herself with respect to how well he or she has performedor progressed. Also often referred to as self-appraisals, selfevaluation, and self rating.