ETE511_C1_BI

7/27/2019 ETE511_C1_BI

1/14

ETE 511 LANGUAGE TESTING AND EVALUATION

The Role andPurpose ofTesting and

Evaluation inTESLCHAPTER

LEARNING OUTCOME

Upon completion of this chapter, you should be able to:

1. Distinguish between tests, assessments and measurements;

2. Describe the basic parts of a test or evaluation;

3. Describe the role of tests in the instructional process.

7/27/2019 ETE511_C1_BI

2/14

CHAPTER 1

The Role and Purpose of Testing

and Evaluation in TESL.......................................

1

INTRODUCTION

It is important to fully understand the role and purpose of testing and evaluation before we candiscuss different ways of testing in TESL. In this chapter, the roles and purposes of testing andevaluation in TESL are discussed. This includes a discussion on the difference between variousterminology related to basic concepts in testing; basic constituent parts of a test; as well as therole of tests in the instructional and educational process including decisions that are made on thebasis of test scores.

A course on testing may be called Testing and Measurementat one institution, Testing and Evaluationat another or even simplyAssessmentat a third institution. These terms are obviously related.However, what do the terms mean and how are they inter connected? Before we proceed further

into the subject of testing, it is appropriate that we first understand several basic yet importantterms. Perhaps the most important of these would be the terms tests, assessment, andmeasurement. Let us first look at the definitions of these three terms.

A test can be defined as a systematic procedure for measuring a sample of behaviour by posing aset of questions in a unified manner (Linn & Gronlund,1995:6). The key phrases in thisdefinition are systematic procedure, measuring a sample of behaviour, and a set of questions in a unified manner.

A test is a systematic procedure because there is a planned format in tests. A test cannot behaphazard as a haphazard test would lose much of its credibility as a test.

A test also measures a sample behaviour. In the case of language tests, the sample behaviourwould be language proficiency or any language related construct we are interested in.

Finally, questions or items in a test are seen to be unified. A traditional view of test items is thatthey work in the same way by measuring the same construct. If items in a test are not unified andmeasure different constructs, what then does the test measure?

Assessment is any of a variety of procedures used to obtain information on studentsperformance. Unlike a test, an assessment is seldom exclusively quantitative. A teacher mayassess student learning by simply looking at how students respond to instruction. Students facialexpression can provide valuable information useful in assessment.

A test is an assessment although as mentioned here, not all assessments need to be tests.

1.1 WHAT ARE TESTS, ASSESSMENT AND MEASUREMENTS?

1.1.1 TEST

1.1.2 ASSESSMENTS

7/27/2019 ETE511_C1_BI

3/14

The Role and Purpose of Testing and

Evaluation in TESLCHAPTER 1

2

It should also be noted that the term evaluation can be considered synonymous with assessmentalthough some would limit its use to programme evaluation and not the evaluation of studentperformance. For the sake of brevity, I will consider both terms as synonymous.

Measurement is a numerical description of a particular characteristic. We measure physicalobjects in terms of their height, weight, and depth. We can measure distance as well as length.However, tests tend to measure behavioural and cognitive aspects which are a lot more abstractthan physical objects. Nevertheless, all tests are measurements. We have seen, however, that notall measurements are tests.

The relationship between tests, assessments and measurement can be illustrated in the followingdiagram:

Figure 1.1: The relationship between tests, measurement and assessment.

(Source: Bachman, 1990)

From Figure 1.1, we can conclude that all tests are measurements. Similarly, tests can beassessments as well.

Bachman (1990), considers qualitative assessment of students as an example of what may fall inareaAand teacher ranking of students performance falling in area B. Area C is represented bytests which are also assessments and measurements. A clear example of this would be anachievement test given to students at the end of an instructional programme. Area D representstests which are measurements but not assessments. One such example of this would be research

What do you think are the differences between tests, assessment andmeasurements?

1.1.3 MEASUREMENT

7/27/2019 ETE511_C1_BI

4/14

CHAPTER 1



3

in which tests are given. Finally, we will find many measurements that are neither tests norassessments. The age of students is a measurement but it is not a test. Neither is it an assessment.

These types of measurements are represented by area E.

In the next few chapters we will come across many different types of tests and assessments. We

will also examine measurements commonly used in tests.

There are a number of ways how we can look at a test. We may want to examine characteristicsof a good test and the issues of validity and reliability. These issues, however, will be discussed inChapter 7 of this module. Here, it may be more important to look at the basic structure of a test.If we were to dissect a test and examine its anatomy, how would it look like? Wesche (1983)suggested four major parts to a test. These four parts of a test form a useful framework forexamining any kind of test:

1. Stimulus material.

2. Task posed to the learner.

3. Learners response.

4. Scoring criteria.

The stimulus material refers to the text or material presented to the learner or test taker. Thiscan be in many forms. In a reading comprehension test, for example, the stimulus material couldbe the reading passage. There could or could not be pictures or drawings accompanying thepassage. The passage could be on any of many possible content and written in a particular style.

The stimulus material for this example would also include the questions themselves afterhaving read the text, would the questions be in the form of multiple choice objective typequestions or are students required to write a summary?

Share with your friends some tests that you find good or bad.What are the features of a good test?

1.2 WHAT CONSTITUTES A TEST?

1.2.1 STIMULUS MATERIALS

7/27/2019 ETE511_C1_BI

5/14



4

The second component of the framework, the task posed to the learner, is a somewhatabstract concept. It involves actually determining the mental or cognitive response required ofthe learner toward the stimulus. If the reading comprehension passage is again used as anexample, the cognitive response that is required is for the learner to understand what is read.

This may require comprehension at various levels the word level, the sentence level, and thediscourse level.

The reading for understanding process also involves other abilities and knowledge such asinferencing and cultural knowledge. Similarly, this component will also address how learners areexpected to mentally and cognitively react to the format of the question and what skills, sub-skills and abilities they are required to draw on in order to complete the task.

The learners responseis the actual demonstration of the students ability within the limitationsof the stimulus provided by the test. If reading comprehension is the stimulus and test items arein the form of multiple choice questions, then the learners actual response is to select the correctanswer based on the question and the options that follow. If a summary is required, thenstudents will demonstrate their reading comprehension ability by writing a short summary of thestimulus reading passage. This component of Wesches framework is closely related to thesecond component except that it is the actual physical performance of the task which is seen todemonstrate the ability or behaviour being examined.

Finally, the fourth component of the framework is the scoring criteria. As a test is ameasurement, scoring criteria is an important aspect of the overall structure of a test.

So, once again, the four components of a test according to the Wesche framework are thestimulus material, the task posed to the learner, the learners response, and the scoring criteria.All four components are highly inter-related and important in testing.

Almost every test can be analysed according to these four components. More importantly, whenwe construct a test, we are actually able to make the test easier or more difficult by varying eachcomponent of this framework.

1.2.2 TASK POST TO THE LEARNER

1.2.3 LEARNERS RESPONSE

1.2.4 SCORING CRITERIA

7/27/2019 ETE511_C1_BI

6/14

CHAPTER 1



5

1.3A PRELIMINARY UNDERSTANDING OF TESTS AND THEINSTRUCTIONAL PROCESS

What role do tests have in instruction? It is important for an educator to understand how testsand instruction are related. Do we test what we teach? Or should we teach what we test? In anideal world, these questions may be moot as the relationship between testing and teaching isseamless as both serve the purpose of helping students learn. However, in the real world, it ispossible to justify an affirmative answer to each of the two questions. Yes, we should test what

we teach in order to assess the extent to which our students have understood or perhaps evenmastered what has been presented to them. However, in this world where examinations can playan important role in determining our future, who can blame teachers who teach what is beingtestedi.e. prepare students only for the test?

So what exactly is the relationship between testing and teaching? Perhaps we can try to get aninitial idea with the help of the simple diagram in Figure 1.2.

Figure1.2:Relationship between planning, instruction and testingIn this model, we are reminded that instruction itself is guided by curriculum planning. Testingrepresents a final stage of a three stage process beginning with curriculum planning orinstructional objectives, followed by the actual instruction itself, and finally culminating withtesting. The model also suggests a washback from testing to both curriculum specificationsand instruction stage. The concept of washback will be discussed in greater detail later.

The model suggested by Figure 1.2, however, is clearly a simplified and idealised one. Such amodel may work well if all three components are under the purview of a single person or smallgroup of people. However, when it is applied to a national scenario, the linear process is not so

Why do you think it is important for teachers to understand the testthat they administer?

CurriculumSpecification

Instruction Testing

Washback

7/27/2019 ETE511_C1_BI

7/14



6

easy and likely anymore. Some of the objectives of the curriculum specifications may be lost ininstruction especially as those who carry out the teaching may not be directly involved incurriculum planning. Similarly, national standardised tests or examinations may also fail tocapture the emphases placed during instruction as test constructors in these examinations are notthose who had actually carried out the teaching. Nevertheless, for want of a conceptual idea of

the position of testing in instruction, this simple model in Figure 1.2 would suffice for themoment. We will revisit the model in latter chapters when we hopefully have a clearer and morecomprehensive understanding of tests and instruction.

It should be noted here that the nature of tests is affected by the nature or approach ofinstruction. We need to only look at language testing history to see the truth of this statement. It

was once described to me that language testing had undergone three major historical shifts orphases.

(a)The first phase, the pre scientific phase coincides with a time when teachers were

thought to be competent in constructing tests simply by virtue of being teachers. It was

felt that if they could teach, then they could test.

(b)A more scientific era heralded by behaviorism and audiolingualism saw the notion of

psychometric structuralism where measurement of structural knowledge of language was

given top priority.

(c) Finally, language tests were influenced by the communicative approach movement and a

sociolinguistic integrative perspective in testing was adopted.

Each of the three phases, of course, coincided with theories of and approaches to language

learning and teaching of the time. This further reinforces the notion that there is a closerelationship between teaching and testing.

1.3.1 TAXONOMIES OF INSTRUCTIONAL OBJECTIVES

Perhaps an even more important factor in examining the role of tests in an instructional processis the instructional objectives. Test items should be based on instructional objectives especially if

we wish to know whether the instruction has been effective. In this respect, taxonomies such asthose suggested by Bloom and Barrett are useful tools in ensuring the most appropriatequestions are asked in tests.

Table 1.1:Blooms Taxonomy and Representative Test Questions Adapted from: Nitko, 2001: 27Level Test Question

Knowledge Who are the main characters of the story?

Comprehension What is the main theme of the story?

Application Can the solutions found to the problems in this story be used

in solving problems that many of our youths face?

Analysis What literary devices are being used to convey to the readerthe characters feelings?

7/27/2019 ETE511_C1_BI

8/14

CHAPTER 1



7

Synthesis Based on this story as well as other stories you have read,

describe general strategies that main characters in stories

have taken to overcome the problems that they face.

Evaluation Develop a set of three or four criteria for assessing the

quality of a story and use these criteria to assess any storythat we have read.

1.4 HOW DO STUDENT BENEFIT FROM A TEST

While the kinds of decisions that are made above are largely teacher and educator centred, testsalso provide students with several benefits. First, there is the benefit of motivation. Whenever ateacher announces that there will be a test, the tendency for most students is to study and revisematerial in preparation for the test. In other words, the test acts as an impetus for study. Suchform of motivation is useful when it is done sparingly as teachers should not depend only ontests to motivate students.

The better students also use tests as a source of information. Feedback from test scores inform

students of their strengths and weaknesses, whether their study approach has been beneficial,and if they have understood the material taught. In other words, information in the form of test

Blooms taxonomy consists of six levels which are generally considered to behierarchical. This means that not only are the higher level skills more cognitivelydemanding, they also assume the skills that are lower in the taxonomy are also

mastered. The levels of knowledge, comprehension, application, are oftenreferred to as the lower order skills with knowledge being at the lowest end.Analysis, synthesis, and evaluation are considered the higher order skills withevaluation occupying the highest end of the taxonomy. In Table 1.1, each levelof Blooms taxonomy is accompanied by a matching question that reflects thecognitive demands that it places on the students.

Blooms taxonomy focuses on cognitive abilities and may have limitations whenused in language teaching and learning. Other taxonomies, such as Barrettstaxonomy have been developed for more language related skills. This taxonomyconsists of four levels: literal recognition or recall; inference; evaluation; andappreciation. Each level consists of several sub levels. Barretts taxonomy

focuses on reading and is especially relevant for language teaching andlearning. However, what needed are also taxonomies of the productive languageskills of writing and speaking. In second language situations, such taxonomieswould be useful in charting out progress in learning as well as specifying acomprehensive teaching plan.

7/27/2019 ETE511_C1_BI

9/14



8

results is equally important for the student as it is for the teacher. As such, it should be a generalpractice to return test papers as often and as quickly as possible. A different way of looking atthings is that teachers are now presented with a new responsibility i.e. to develop in theirstudents the ability and self directedness to use information from such sources as test results tolearn and plan their own learning.

1.5 DECISION MADE BASED ON TESTS

Why do we test? Do teachers and instructors have a sadistic streak that they have tests simply tosee their students slog and burn the midnight oil preparing for the test? Certainly not! There aremore noble intentions in testing. We can say that the main purpose of tests is to obtaininformation concerning a particular behaviour or characteristic. Based on information obtainedfrom tests, several different types of decisions can be made.

Kubiszyn & Borich (2000), mention eight different types of decisions made on the basis ofinformation obtained from tests. These educational decisions are shown in Figure 1.3.

Figure 1.3: Eight different types of decisions mode

The first three decisions are often within the domain of the classroom teacher. He or she canmake decisions with respect to instruction, grading as well as diagnostic activities.

What do you think are students reaction towards tests? Do they enjoyor fear tests?

7/27/2019 ETE511_C1_BI

10/14

CHAPTER 1



9

Instructional decisions are made based on test results when, for example, teachers decide tochange or maintain their instructional approach. If a teacher finds out that most of his class havefailed his test, there are many possible reactions he can have. First, he could be very disappointed,blame the students for not studying and punish them in some way. Of course, this is not a wisedecision to make. Instead, the teacher could evaluate the effectiveness of his own teaching or

instructional approach. An instructional decision is made when the teacher decides upon theapproach currently used. Perhaps the teacher may decide that the approach is not suitable and adifferent approach should be used.

Tests yield scores and teachers will have to make decisions in terms of the kind of grades to givestudents. As grades are indicators of student performance, teachers need to decide whether astudent deserves a high grade perhaps an A on the basis of some form of assessment.

Traditionally, and perhaps for a long time to come, this assessment will be in the form of tests.

Sometimes, we give tests to find out the strengths and weaknesses of our students. Can theycorrectly construct a passive sentence? Do they use the different pronoun forms correctly? These

kinds of questions can be answered by observing student performance on tests. When a teacherdecides that he will spend more time teaching passive sentences because student performance onsuch sentences in a test was unsatisfactory, then he has made a diagnostic decision.

Decisions related to selection, placement, counselling and guidance, programme or curriculum,and administrative policy are all made at levels higher than the classroom. Administrators,educational agencies and institutions may be involved in these decisions.

Selection and placement decisions are somewhat similar. However, a selection decision relates towhether or not a student is selected for a programme or for admission into an institution basedon a test score. Tests such as TOEFL and IELTS are often used by universities to decide

whether a candidate is suitable, and hence selected for admission. A placement decision, however,deals with where a candidate should be placed based on performance on the test. A clearexample is the language placement examination for newly admitted students commonlyadministered by many local and foreign universities. Based on their performance on such a test,students are placed into different language classes that are arranged according to proficiencylevels.

Counselling and guidance decisions are also made by relevant parties such as counsellors andadministrators on the basis of exam results. Counsellors often give advice in terms of appropriate

vocations for some of their students. These advice is likely to be made on the basis of thestudents own test scores. Programme or curriculum decisions reflect the kinds ofchanges made

to the educational programme or curriculum based on examination results. Finally, there are alsoadministrative policy decisions that need to be made which are also greatly influenced by testscores.

(a) What do the terms tests, assessments and measurementsmean and how are they interconnected?

(b) What constitutes a good test? What are the 4 major parts of atest as suggested by Wesche (1983)?

7/27/2019 ETE511_C1_BI

11/14



10

1.6 HOW DO WE CONSTRUCT A TEST

The framework of a test is reflected in the way the test is constructed. The first stage in

constructing a test is to determine what is to be tested. This is not as easy as it seems because itrequires determining the theoretical construct of what is to be tested. For example, lets assumethat we are interested in testing communicative competence. This requires that a theoreticalconstruct of communicative competence be first determined. Various theories of communicativecompetence have been suggested (c.f. Bachman, 1990; Canale & Swain, 1980). We need toexamine these theories and determine what communicative competence is to us for the purposeof our test.

The second step in test construction is to operationalise the theoretical construct. A theoreticalconstruct must necessarily be an idealised and abstract notion. When it is operationalised, it isreduced in order to fit into the constraints of a test. The many different formats of tests

multiple choice, dictation, essay-type, matching, etc. represent the different kinds ofoperationalisation available in tests.

Finally, the third step in constructing a test is quantification. As a test is a measure, then numbersand quantities will be a necessary element. Once again, just as with the previous stages, we maytend to take this stage of test construction for granted. There is more to quantification thansimply assigning numbers or points to items in a test. If a test consists of two sections usingdifferent formats such as multiple choice questions and short answer.

The steps described above provide a general description of the test construction process. Inactual practice, there may be some additional steps that need to be taken. Sometime back, I was

asked to construct a test of English language proficiency for a private company. When I set outto do the task, I listed down the steps that I probably had to take. One of the first steps I feltnecessary was some form of needs analysis in order to determine what kind of language shouldbe tested. I wanted to find out from the management what sort of test they wanted and whether

what I had in mind fit their requirements. My intention was to draft the test, show the draft tothe management for approval, pilot it and later validate the test in some way. questions, what

weightage of points would you assign to the items in each section? Even the assignment of thesepoints must be justified.

I would also imagine that if I were teaching in the public schools, I would probably not spend somuch time on the three steps described earlier theoretical construct, operationalisation, and

quantification because the test construction process has largely been determined by theMinistry of Education. The national standardised Sijil Peperiksaan Malaysia is already an

7/27/2019 ETE511_C1_BI

12/14

CHAPTER 1



11

embodiment of the three stages and teachers merely need to follow the model examination paperwith respect to these three elements. However, it may be helpful to construct a test blueprint inorder to ensure that my test spans the necessary content and that there is a variety of skill orabilities being tested.

Table 1.2:Example of Test Blueprint According to Blooms Taxonomy

Knowledge

Comprehension

Application

Analysis

Synthesis

Evaluation

Total

Section A.

Comprehension 1, 3, 2, 4, 5 8 6 7, 10 9 10

Section B.

Grammar

12,16, 18,

1911, 14, 20 17 13 15 - 10

Section C.

Functions21, 23 22, 29

24, 25,

2627 28 30 10

Total 8 8 5 3 4 2 30

There are numerous ways of forming test blueprints, some more comprehensive than others (see

Nitko, 2001 for several examples), but an important point to remember is that the test blueprintshould be used only as a tool rather than to promote exact or rigourous classification(Nitko, 2001:113).

Nevertheless, the most common form of test blueprints in schools in Malaysia has incorporatedBlooms taxonomy as its primary method of classification. In the example in Table 1.2, the 30items in the test are categorised according to Blooms taxonomy. The numbers 1 to 30 in theblueprint refer to the test item numbers. Items number 1 and 3, for example, are comprehensionitems which test knowledge, while item 8 tests application. A blueprint such as this is useful inensuring that different kinds of questions are asked. In this particular example, most questionsare knowledge and comprehension type questions (8 each) which tends to be quite common.

However, all six question types are quite well represented and as such, the test itself can beconsidered acceptable.

What should you take into consideration when constructing a test?

7/27/2019 ETE511_C1_BI

13/14



12

SUMMARY

This chapter has presented a discussion on various basic issues dealing with tests andmeasurements. It has looked at terminology related to tests and measurements andattempted to distinguish between terms which are similar. It has also attempted to situatetesting within the instructional process, taking into consideration instructional objectives as

well as decisions.

GLOSSARY

Assessment An assessment can be any procedure that is used to obtain informationregarding a students performance or ability.

Evaluation For some, evaluation is synonymous with assessment, while for others,evaluation is a more formal form of assessment and may even be specificonly to the evaluation of programmes.

Measurement A measurement is a quantitative description of a particular characteristic.

Test Lind and Gronlund (1995) define a test as a systematic procedure formeasuring a sample behaviour by posing a set of questions in a unifiedmanner.

The following is a test question taken from an English Textbook.

Identify the noun phrases in the following sentences (1 point for

every correctly identified noun phrase):

(a) Do you know where he is?

(b) She didnt know if the teacher was coming.

(c) The policeman stopped me as I was parking the car.

Use the Wesche (1983), framework to describe this test item.

7/27/2019 ETE511_C1_BI

14/14

CHAPTER 1



13

www.prsd.k12.pa.us/esl/Media/ell2_files/ell2.ppthttp://www.malaysianmonarchy.org.my/portal_bi/rk1/rk1.php

www.cesa7.org/ellcenter/Resources/documents/March212007.ppt

http://www.2dix.com/pdf-2011/testing-and-evaluation-in-esl-pdf.php

Real English Clip R (83) Second Language Series # 3 http://www.youtube.com/watch?v=_uMeMChXpfE

Sudbury Schools: #8: Grades, evaluation and testinghttp://www.youtube.com/watch?v=KyONG225aKQ

Documents

ETE511_C1_BI