Kobe University Repository : KernelThere are an increasing number of corpus-based studies in the English language teaching and learning contexts in Thailand. However, the availability

Kobe University Repository : Kernel

タイトルTit le

Designing and Compiling an EFL Spoken Corpus for Assessment andPedagogical Purposes

著者Author(s) Poonpon, Kornwipa

掲載誌・巻号・ページCitat ion Learner Corpus Studies in Asia and the World,2:91-101

刊行日Issue date 2014-05-31

資源タイプResource Type Departmental Bullet in Paper / 紀要論文

版区分Resource Version publisher

権利Rights

DOI

JaLCDOI 10.24546/81006692

URL http://www.lib.kobe-u.ac.jp/handle_kernel/81006692

PDF issue: 2020-11-01

91

Designing and Compiling an EFL Spoken Corpus for

Assessment and Pedagogical Purposes

Kornwipa POONPON

Khan Ka.en University

Abstract There are an increasing number of corpus-based studies in the English language

teaching and learning contexts in Thailand. However, the availability of Thai learner

corpora, especially a spoken learner corpus, is rare. This paper presents an initial

project to design and compile a spoken corpus of Thai English learners (SeaTEr.) for

assessment and pedagogical purposes. The paper starts with a rationale for building the

spoken learner corpus, followed by a design of the corpus based on Tono's framework

(2003) and how to collect, transcribe and code the spoken data. The spoken corpus

includes 264 spoken responses to six speaking test task&-three independent tasks and

three integrated. task8-8.dministered for 44 undergraduate students at a Thai

university for a TOEFL iBT preparation. The use of the spoken corpus as a pedagogical

and research resource in assessment and pedagogical environment is also discussed.

For assessment purposes, the spoken learner corpus can be analyzed to inform salient

lexical and grammatical features to differentiate learners' proficiency levels. Moreover,

samples can be drawn from the corpus to be used as benchmark responses to be used in

a rater training procedure. Pedagogically, the spoken learner corpus can be used to

investigate productive vocabulary or grammatical features to inform syllabus design,

material development, and classroom activities. Finally, limitations and subsequent

steps of the current corpus project are presented.

Keywords Spoken corpus, Learner corpus, Thai learners, EFL learners, Speaking assessment

I Introduction

Corpus studies have gain more popularity in Thailand in the past 10 years. Corpus

studies in Thai contexts focus on different aspects to support language teaching,

learning and research. Most corpus studies emphasized an analysis of language

occuring in published materials or textbooks (e.g., Poonpon, Honsa, & Cowan, 2001;

92

Kaewphanngam, Broughton, & Sorana,taporn, 2002). Phoocharoensil (2010)

investigated five synonyms in English (i,e" ask, beg, pJead, request, and appeal) in

three learners' dictionaries in comparison to the data from concordance lines from Time

corpus. Chanchanglek and Sriussadaporn (2011) analyzed vocabulary input in English

for Engineering course materials at three universities in Thailand. Wan-a-rom (2012)

examined words occurring in three teacher-made ELT course books used in foundation

English courses in a Thai university.

Many corpus studies used corpus techniques as a pedagogical tool. Liangpanit (2010)

designed a corpus-based business vocabulary learning program, based. on most frequent

words found in the British National Corpus, for business English students at a Thai

university and determined the program's effectiveness. Patasorn (2011) examined.

effects of ooncordancin.g instruction on receptive and productive vocabulary knowledge

of Thai undergraduate students. Some studies focused more on analyses of Thai English

learners' language. For example, Patanasorn (2010) investigated linking adverbials

used. in argumentative essays of Thai undergraduate students. Jirapanakorn (2012)

compared the use of reporting verbs in research article introductions published in

International and Thai medical journals. Chanyoo (2013) compared how Thai graduate

students develop and express their oppositional idea in arguments and compare their

use of oppositional connectors in arguments to those of published scholars in the field of

health science . .All of these corpus-based studies have absolutely contributed to English

teaching and learning contexts in Thailand.

Although these studies show how corpora as a pedagogical and research resource can

considerably enrich the learning and teaching environment, Thai teachers and

researchers still encounter problems in particular areas of corpus use. It is noticed that

most corpus-based studies in Thai contexts used 'written' English corpora of 'published

or commercial' texts in which standard or professional English is used. Spoken English

corpora and learner corpora are rarely focused on in the studies. (This is probably

because of the fact that there are more written corpora available and accessible than

spoken corpora.) Therefore, findings of such studies may be used to inform only learning

and teaching written English from the aspect of standard English. We also need

insightful information about spoken English from the aspect of both standard spoken

language and the learners of English themselves.

AB a result of rarity of spoken English corpora of Thai learners and benefits of learner

corpora to be used in a variety of ways to support language learning and teaching, this

paper is an attempt to design and compile a spoken corpus of Thai English learners and

illustrates the use of the spoken corpus in English learning, teaching and testing.

93

II Corpus Design

Designing a spoken corpus is challenging, due to the difEicu1ty, time consumption, and

expense of recording and transcribing spoken texts. 'lb facilitate the process of corpus

design of the project, 'funo's (2003) design considerations for building a learner corpus

was employed. In the framework., researchers were suggested to consider three criteria

including (a) language-related (e.g. mode. genre, topic), (b) task-related (e.g.

spontaneous vs. prepared; cross-sectional vs. longi.tudinaI). and (c) learner-related (e.g.

EFL or ESL, age, sex, mother tongue) <see Table D.

Table 1 Considerations for designing a learner corpus ('rona, 2003)

Criteria

(a) language-related

(b) task-related.

(OJ learner-related

Features

Mode (written/spoken)

Genre Qetter/diary/fictionlessay)

Style (narration/argumentation)

'lbpic (generalJleisure)

Data collection (cross'sectional/longitudinal)

Elicitation (spontaneous! prepared)

Use of references (dictionary! source text)

Time limitation (fixedlfreel homework)

Ll background

L2 environment (ESUEFlJ level of schoo])

L2 proficiency (standard test score)

Internal-cognitive (age/cognitive style)

Internal-affective (motivation! attitude)

In this project, a spoken corpus of Thai English learners (SCoTEL) was designed,

based on 'funo's consideration framework, to include spoken language produced to

respond to speaking test tasks by Thai undergraduate students (see Table 2). In

particular, 44 undergraduate students at a university in Thailand volunteered to take a

proficiency speaking test, administered by the researcher at the Thai university. Nine of

them were males and 33 were females. Their ages ranged from 18 to 22 years old. Eight

were non' English major students, and 36 were English major.

The speaking test consisted of six speaking tasks: three independent tasks and three

integrated tasks. The independent tasks required the students to express and support

their opinions about familiar topics using personal knowledge and experience. Each

independent task allowed the students to prepare their responses for 15 seconds before

recording their voice for 45 seconds. In the integrated tasks, the students were required

to synthesize information from two sources (i.e., reading and listening material) and

then speak a response. Each integrated task included a short reading passage

94

(approximately 100 words long) and a short talk. (approximately one to two minutes

long). The participants had 45 seconds to read a short passage. Then they listened to

part of a lecture related to the passage they read. They were allowed to take notes while

listening to the lecture. Then the participants responded to a question using the

information from the passage and the lecture. They can use the notes to answer the

question. They had 30 seconds to prepare, and 60 seconds to speak into the microphone

and give their response. The tasks were based on campus'related situation and

academic classroom material. The test approximately took. a total of 30 minutes for the

six speaking tasks (about 3 minutes for three independent tasks, 12 minutes for three

integrated. tasks, plus time before and after the actual test).

Table 2 Considerations for designing SCoTEL (based on 'lbno, 2003)

Criteria

(a) language-related.

(b) task-related.

<.) learner-related

2.1 Data Elicitation

Mode: spoken

Genre: monologue

Features

Style: descriptive & narrative

'lbpic: general & academic

Task: independent & integrated

Elicitation: spontaneous

Use ofreIerences: source text

Time limitation: fixed

Ll background: Thai

L2 environment: EFL university

L2 proficiency: low-intermediate & intermediate

All spoken data were elicited from all 44 participants who volunteered to take the

proficiency speaking test in order to prepare themselves for the TOEFL iBT test. The

proficiency test was administered with the students in a language lab, equipped with

cassette booths with headsets and microphones, and a teacher's controlling computer

with a headset and a microphone. Each student gave their responses to all six test tasks,

and their responses were recorded. The test lasted for 30 minutes. The total of 264

responses was obtained (44 students x 6 test tasks). Each of these responses was scored

by two trained raters, using the two scoring rubrics u.e., independent and integrated.

scoring rubrics).

A collection of 264 speaking responses contained 132 responses from the independent

tasks and 132 from the integrated. tasks. A total number of words in this corpus were

21,897, with 10,517 words in independent responses and about 11,380 in integrated

ones (see Table 1). All audio files of the spoken responses were saved individually

as . way files in the study. Each file was labeled using the following protocol:

95

speskerID-tssknumber (e.g., Sl-Ta referring a speaker number 1 producing a spoken

response on reading/listening/speaking task number 3).

Table 3 The spoken corpus of Thai English learners (SCoTEL)

Speaking task

Independent speaking

integrated ,peaking

Thtal

2.2 Transcription and Coding

Number of

responses

132

132

264

Average length

79.67

86.21

82.94

Number of words

10.517

11,380

21,897

A collection of 264 speaking responses was transcribed by the researcher using a

transcribing scheme that includes all spoken features (e.g., discourse markers, :filled

and unfilled pauses). Each of the 264 audio files was transcribed and saved as a .trtfile

for practicality in tagging parts of speech. All transcripts were double checked by the

researcher.

After all spoken responses were transcribed, all spoken responses were coded for

lexico-grammatical features. The transcripts were automatically tagged for

lexico-grammatical features in order to identify parts of speech of the spoken texts. The

tags were rechecked using the reference grammar LongmaJl GrlHD.mar of Spoken aJld

lITitten English (Biber et al., 1999).

III Use of the Spoken Corpus of Thai English Learners for Assessment and

Pedagogical Purposes

3.1 Application in Language Assessment

A number of studies used corpus linguistic methodology for assessment purposes. Many

studies used word frequency data from corpus analysis to develop vocabulary cloze tests

(e.g., Coniam, 1997; Smith et al., 2009). Some studies used a corpus to design test tasks.

Fuentes (2007), for example, used a corpus of advertising texts to design test tasks for

reading comprehension in the English for Tourism course. Corpora used in these studies

were compiled from either professional English or native speakers of English.

For learner corpora. many studies show that a learner corpus analysis can be used to

describe learner language and, sometimes, using such description as a basis for

studying the validity of language tests or assessment systems. In addition, a corpus

analysis followed up with scoring procedures and raters' verbal data elicitation may be

plausibly employed to contribute to assessment purposes of research studies. This

notion is supported by the reviewed studies related to TOEFL iBTs rating (Brown et

al., 2005; Xi & Mollaun, 2006) and other previous research related to scoring rubrics

96

and raters (e.g., Cumming, Grant, Mulcahy-Emt, Powers, 2005; Cumming, Kantor, &

Powers, 2001; Fulcher, 1997; Turner & Upshur, 2002; Upshur & Turner, 1995). In these

studies, a corpus-based analysis is viewed as advantageous in empirically supporting

raters' scoring and verbal data.

As the primary purpose of SCoTEL compilation was to inform scoring rubric

development, a study (Poonpon, 2011) investigated salient language features in the 264

responses that can differentiate good and poor performance of Thai undergraduate

students. The results showed that among 100 grammatical features found in the spoken

corpus, the salient grammatical sets of features included (a) sentence complexity G..e.,

mental verbs, private verbs, verbs not including auxiliary verbs, verbs with uninflected

present, imperative and third person, 'that' deletion, and 'that' complement clauses

controlled by verb of likelihood), (1,) a supplement of information about actions or events

G..e., general adverbs), and (c) reference to things or persons (i.e., concrete nowtS). And

the grammatical set of sentence complexity-representing 'that' complement clauses

controlled by communication and mental verbs-was a significant predictor of students'

proficiency levels. The study confirms the use of the spoken corpus to analyze complex

grammatical features that can differentiate students at different proficiency levels.

SCoTEL can also be used in rater training procedure (see Figure 1), which aims to

prepare raters by helping them understand criteria and scoring rubric descriptors used

in a scoring rubric and training them to apply these criteria and descriptors consistently

in their judgment (Alderson, Clapham, & Wall, 1995). As seen in Figure 1, finding

benchmark responses for each score level is the :first step to be prepared before the rater

training procedure. The responses that were scored by two raters to be at different

proficiency levels can be selected to be used as 'benchmarked responses for each score

level. These responses should have perfect or high reliability of scores (r= 0.9 - 1.00). In

other words, the responses that were given the same or similar score by both raters

should be selected to represent responses in each score level.

During the training, the benchmark spoken responses are played for raters, following

the introduction to the test and explanation of criteria and descriptors in a scoring

rubric. Raters then are asked to discuss salient features in the benchmark responses in

relation to the rubric. At this point, scripts of the responses can also be used in

supplement to the audio files to facilitate the raters' understanding of the rubric.

97

Before training ------to>

Start training

Finding benchmark sample responses for

each score level

Introducing raters to the test ---+[ ~-----,!,-------~

Familiarizing raters with the rubric

(discussing salient features in benchmark

Raters scoring a set of sample responses

Discussing problems or questions relating

to the rubric and the set of samples

Finding satisfied agreement

Figl. A rater training procedure (adapted. from Poonpon, 2009)

3.2 Pedagogical Application

Corpus techniques have been increasingly used in the L2 classroom. Together with

the techniques, data-driven learning (DDL), which is an approach of language learning

emphasizing learning processes controlled by learners themselves, is highlighted <Johns,

1997). The theory behind DDL is that language learners act as language detectives,

discovering language facts or patterns (e.g., inducing meanings and identifying

form-function relationships and patterns of semantic prosody) by themselves from

sorted concordances of words and phrases are presented. A number of DDL studies

reveal positive effects of this learning approach over the use of traditional grammar and

lexical instruction (e.g., Basanta & Martin, 2007; Bernadini, 2004; Cobb & Horst, 2001;

Cresswell, 2007; Jafarpour, Hashemian, & Alipour, 2013; Leel, 2011).

Apart from the DDL approach, an analysis of learner corpus can be done to

investigate productive vocabulary or grammatical features produced by learners to

inform syllabus design, material development, and classroom activities. The current

spoken corpus has been analyzed to examine frequent words and collocations used by

the Thai learners. Discourse markers and pauses have also been investigated to see if

98

there is a relationship between the markers and students' proficiency. An examination

of discourse markers can also show how often particular markers are used. in speaking

by Thai learners versus by native speakers. Findings from such investigations can

help the teacher to determine what language features are linked to the target register.

Such information can then be used to help revise a syllabus, particularly relevant to

speaking skill.

With the help of SCoTEL, exercises that raised. learners' were created, based on real

examples, in order to provide students with an opportunity to detect spoken errors in

sample responses. In this scenario, real examples from a native speaker' corpus (e.g.,

MICASE) were also drawn to be used as a reference for the students to compare the

target language features. This kind of materials encourages consciousness-raising

activities (Schmidt, 1990). The learners used the language facts and patterns to confirm

their vocabulary or grammar understanding or correct lexical or grammatical errors in

their speeches. This learning practice can expose the learners to authentic language

instead of lexical or grammatical exercises for speaking in their textbooks and also

motivate language learners in learning to speak English and improving their speaking

skill.

IV Conclusion

To increase an availability of Thai learner corpora, this paper described an initial

attempt to build a spoken English corpus of Thai English learners. It highlights a

design and compilation of a spoken corpus produced by Thai undergraduate students

taking a speaking proficiency test. The corpus SCoTEL includes 264 spoken responses

to six speaking test tasks administered for 44 students. The illustrations of the use of

the spoken corpus show that this Thai learner corpus can be a useful resource to

investigate language patterns-lexical, structural, le:rico-grammatical, discourse,

phonological-with very specific purposes to inform English language teaching and

learning as well as testing.

However, SCoTEL is still limited. in its size and accessibility. Thus in order to make

SCoTEL a database for further corpus-based. studies, this corpus project still needs an

expansion in terms of the number of spoken responses within the same genre u.e.,

monologues produced in a test situation) as well as a consideration for accessible

channels of SCoTEL. At present, the project included more responses, but they are

under scoring and transcribing processes. To maximize the use of the corpus, the next

step of the project is to include Thai students' spoken language in a dialogical form

produced in test situations, followed by monologica1 (e.g., oral presentation) and

dialogical language (e.g., authentic conversation during class discussion) occurring in

classroom (no-test) contexts.

99

References

Alder,on, J. C., Clapham, C., & Wall, D. (1995). Language testing canatructian and

evaluation. Cambridge: Cambridge University Press.

Basanta, C. P., & Martin, M. E. R. (2007), The application of data-driven learning to a

smail-scale corpus: Using film transcripts for teaching conversational skills. In E.

Hidalgo, L. Quereda, & J. Santana <Ed,.), Corpora in the fOreign language classroom

(pp. 141· 160). Am,terdam: Rodopi.

Bernardini, S. (2004). Corpora in the classroom: An overview and some reflections on

future developments. In J. Sinclair (Ed.), How to use corpora iD.lsnguBge teaching

(Pp.15-36). Amsterdam: Benjamins.

Biber, D., Johansson, S., Leech, G., Conrad, S., & Finnegan, E. (1999). LoDg11l/W.

grammar of spoken and written English. Essex: Pearson Education Limited.

Brown, A., Iwashita, N., & McNamara, T. (2005), An eumination ofraterorientations and test-taker perforHUJ.nce on English-forsca.demic-purposes speakiDg tasks.

(TOEFL Monograph Series MS-29), Princeton, NJ: Educational Testing Service.

Chanchanglek, S., & Sriussadaporn, N. (2011). A corpus analysis of English vocabulary

input in course materials used for engineering students. Humllllities Journal, 18:]),

141-149.

Chanyoo, N. (2013). A corpus-based study of connectors and thematic progression in the

academic writing of Thai EFL students. Unpublished PhD dissertation, University of

Pittsburgh.

Cobb, T, & Horst, M. (2001). Reading academic English: Carrying learners across

the lexical threshold. In: J. Flowerdew & M. Peacock (Eels.), Resesrch

Perspectives an English for Academic Purposes (pp. 315·329). Cambridge:

Cambridge University Press.

Coniam, D. (1997). A preliminary inquiry into using corpus word frequency data in the

automatic generation of English language cloze tests. CALICO Journal, 16, 15-33.

Cresswell, A (2007). Getting to know connectors? Evaluating data-driven learning in a

writing skills course. In E. Hidalgo, L. Quereda, & J. Santana (EdsJ, Corpora in the

foreign language classroom (pp. 267-289). Amsterdam: Rodopi.

Cumming, A., Grant, L., Mulcahy-Emt, P., & Powers, D. E. (2005). A

teacher-verification study of speaking and writing prototype tasks for a Dew TOEFL.

(TOEFL Monograph Series MS-26). Princeton, NJ: Educational Testing Service.

Cumming, A, Kantor, R., & Powers, D. E. (2001). &Dring TOEFL essays aDd TOEFL

2000 prototype writing tasks: All investigation into raters' decision making aDd

development of a. preJimiDa.ry analytic uRllJework. (TOEFL Monograph Series

MS-2.2). Princeton, NJ: Educational Testing Service.

Fuentes, A. C. (2007). A corpus-based assessment of reading comprehension in English

for'lburism studies. In E. Hidaalgo, L. Quereda, & J. Santana (Eds.), Corpora. in the

100

fOreign language classroom (pp. 309'326). Amsterdam! New York, NY- Rodopi.

Fulcher, G. (1997), The testing of speaking in a second language. In C. Clapham & D.

Corson (Ede.), Encyclopedia of l8DIfUBg6 and etiucst.ion (pp. 76-85). Norwell, MA.:

RJuwerA£adenricPubliSham.

Jafarpour, A., Hashemian, M., & Alipour. S. (2013), A corpus-based approach toward

teaching collocation of synonyms. Theory and Practice in Lsnguage Studies,

aD,51·60.

Jirapanakom, N. (2012). How doctors report: A corpus-based contrastive analysis of

reporting verbs in research article introductions published in international and Thai

medical journals. The lJaDgkok Medical JoumsJ. 4, 39-46.

Johns, T. (1997). Contexts: The background., development, and triaIling of a

concordance-based. CALL program. In A. Wichmann, S. Fligelstone, T. McEnery, & G.

Knowles (Eds.), TeacbiDg and language corpora (pp. 100-115). London: Longman.

Kaewphanngam, C., Broughton, M. M., & Soranasataporn, S. (2002), Corpus-based

analysis: Guidelines for getting practical language input in materials development.

SLLTJournal, 2, 16,32.

Level, H. (2011), In defense of concordancing: An application of data -driven learning in

Taiwan. P:rocedia: Social s.nd Behavioral Sciences, 12, 399--408. Retrieved on

January2012: http://www.sciencedirect.com/science/article/piilS187704281100139X

Liangpanit, C. (2010). The development of a corpus-based vocabulary package for

business English majors. Unpublished PhD dissertation, Suranaree University of

Technology.

Patanasom, A. T. (2010). The use of linking adverbials in the argumentative essays of

Thai EFL learners. KKU l/esearcb JourJl8l, 111.78),751'767.

Phoocharoensil, S. (2010). A corpus-based study of English synonyms. InternatiOlUll

Journal of Arts and Science, .(10), 227-245.

Poonpon, K (2009). Expanding a second language speaking rating Scale for

instructional and assessment purposes. Unpublished dissertation, Northern

Arizona University.

Poonpon, K (2011). Investigating salient grammatical features in a test-based spoken

corpus. In G. Weir, S. Ishikawa, & K Poonpon (Eds.), Corpora and language

teclmologies in teaching. learDiDg IlDd research (Pp.13-19). Glasgow: University of

Strathclyde Publishing.

Poonpon, K, Honsa, S., & Cowan, R. (2001). The teaching of academic vocabulary to

Science students at Thai universities. SLLT JOurDaJ, 51-63.

Schmidt, R. (1990). Input, interaction, attention, and awareness: the case for

consciousness-raising in second language teaching. Paper prepared for presentation

at Enpuli Encontro Nacional Professores Universitarios de Lengua Inglesa, Rio de

Janeiro.

101

Smith, S. et al. Cm09). Automatic c10ze generation for English proficiency testing. The

Proceeding of The 2009 LTl'C (The Language Training & Te8ting Center) English

Lsaner Corpus, lnterna.tionai Conference on English Language Teaching s.nd

Testing (Pp.l-12). National Taiwan University, Taipei.

1bno, Y. (2003). Learner corpora: design, development and applications. In Archer D.,

Rayson P., Wilson A. & McEnery T. (Eds.) Proceedings of the Corpus Linguistics 2003

Conference (pp. 800-809). Lancaster: University Centre for Computer Corpus

Research on Language, Lancaster University.

Turner, C. E., & Upshur, J. A. (2002). Rating scales derived from student samples:

Effects of the scale maker and the student sample on scale content and student

,cores. TESOL Quarterly. 36, 49'70.

Upshur, J. A, & Turner, C. E. (1996). Constructing rating scales for second language

tests. ELTJournaJ, 49,3-12.

Wan-a-rom, U. (2012). Lexical evaluation of teacher-made coursebooks: Thai case

studies of foundation English courses at tertiary level. Engh"sb Ls.ngua.ge '1ha.ching,

,;(8),146'166.

Xi, X., & Mollaun, P. (2006). Investigating the utility of analytic scoring for the TOEFL

academic speaking test ('FAST). (TOEFL iBT R.s.aroh Report TOEFLiBT-OD.

Princeton, NJ: Educational Testing Service.

Documents

Kobe University Repository : KernelThere are an increasing number of corpus-based studies in the English language teaching and learning contexts in Thailand. However, the availability