Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Kobe University Repository : Kernel
タイトルTit le
Designing and Compiling an EFL Spoken Corpus for Assessment andPedagogical Purposes
著者Author(s) Poonpon, Kornwipa
掲載誌・巻号・ページCitat ion Learner Corpus Studies in Asia and the World,2:91-101
刊行日Issue date 2014-05-31
資源タイプResource Type Departmental Bullet in Paper / 紀要論文
版区分Resource Version publisher
権利Rights
DOI
JaLCDOI 10.24546/81006692
URL http://www.lib.kobe-u.ac.jp/handle_kernel/81006692
PDF issue: 2020-11-01
91
Designing and Compiling an EFL Spoken Corpus for
Assessment and Pedagogical Purposes
Kornwipa POONPON
Khan Ka.en University
Abstract There are an increasing number of corpus-based studies in the English language
teaching and learning contexts in Thailand. However, the availability of Thai learner
corpora, especially a spoken learner corpus, is rare. This paper presents an initial
project to design and compile a spoken corpus of Thai English learners (SeaTEr.) for
assessment and pedagogical purposes. The paper starts with a rationale for building the
spoken learner corpus, followed by a design of the corpus based on Tono's framework
(2003) and how to collect, transcribe and code the spoken data. The spoken corpus
includes 264 spoken responses to six speaking test task&-three independent tasks and
three integrated. task8-8.dministered for 44 undergraduate students at a Thai
university for a TOEFL iBT preparation. The use of the spoken corpus as a pedagogical
and research resource in assessment and pedagogical environment is also discussed.
For assessment purposes, the spoken learner corpus can be analyzed to inform salient
lexical and grammatical features to differentiate learners' proficiency levels. Moreover,
samples can be drawn from the corpus to be used as benchmark responses to be used in
a rater training procedure. Pedagogically, the spoken learner corpus can be used to
investigate productive vocabulary or grammatical features to inform syllabus design,
material development, and classroom activities. Finally, limitations and subsequent
steps of the current corpus project are presented.
Keywords Spoken corpus, Learner corpus, Thai learners, EFL learners, Speaking assessment
I Introduction
Corpus studies have gain more popularity in Thailand in the past 10 years. Corpus
studies in Thai contexts focus on different aspects to support language teaching,
learning and research. Most corpus studies emphasized an analysis of language
occuring in published materials or textbooks (e.g., Poonpon, Honsa, & Cowan, 2001;
92
Kaewphanngam, Broughton, & Sorana,taporn, 2002). Phoocharoensil (2010)
investigated five synonyms in English (i,e" ask, beg, pJead, request, and appeal) in
three learners' dictionaries in comparison to the data from concordance lines from Time
corpus. Chanchanglek and Sriussadaporn (2011) analyzed vocabulary input in English
for Engineering course materials at three universities in Thailand. Wan-a-rom (2012)
examined words occurring in three teacher-made ELT course books used in foundation
English courses in a Thai university.
Many corpus studies used corpus techniques as a pedagogical tool. Liangpanit (2010)
designed a corpus-based business vocabulary learning program, based. on most frequent
words found in the British National Corpus, for business English students at a Thai
university and determined the program's effectiveness. Patasorn (2011) examined.
effects of ooncordancin.g instruction on receptive and productive vocabulary knowledge
of Thai undergraduate students. Some studies focused more on analyses of Thai English
learners' language. For example, Patanasorn (2010) investigated linking adverbials
used. in argumentative essays of Thai undergraduate students. Jirapanakorn (2012)
compared the use of reporting verbs in research article introductions published in
International and Thai medical journals. Chanyoo (2013) compared how Thai graduate
students develop and express their oppositional idea in arguments and compare their
use of oppositional connectors in arguments to those of published scholars in the field of
health science . .All of these corpus-based studies have absolutely contributed to English
teaching and learning contexts in Thailand.
Although these studies show how corpora as a pedagogical and research resource can
considerably enrich the learning and teaching environment, Thai teachers and
researchers still encounter problems in particular areas of corpus use. It is noticed that
most corpus-based studies in Thai contexts used 'written' English corpora of 'published
or commercial' texts in which standard or professional English is used. Spoken English
corpora and learner corpora are rarely focused on in the studies. (This is probably
because of the fact that there are more written corpora available and accessible than
spoken corpora.) Therefore, findings of such studies may be used to inform only learning
and teaching written English from the aspect of standard English. We also need
insightful information about spoken English from the aspect of both standard spoken
language and the learners of English themselves.
AB a result of rarity of spoken English corpora of Thai learners and benefits of learner
corpora to be used in a variety of ways to support language learning and teaching, this
paper is an attempt to design and compile a spoken corpus of Thai English learners and
illustrates the use of the spoken corpus in English learning, teaching and testing.
93
II Corpus Design
Designing a spoken corpus is challenging, due to the difEicu1ty, time consumption, and
expense of recording and transcribing spoken texts. 'lb facilitate the process of corpus
design of the project, 'funo's (2003) design considerations for building a learner corpus
was employed. In the framework., researchers were suggested to consider three criteria
including (a) language-related (e.g. mode. genre, topic), (b) task-related (e.g.
spontaneous vs. prepared; cross-sectional vs. longi.tudinaI). and (c) learner-related (e.g.
EFL or ESL, age, sex, mother tongue) <see Table D.
Table 1 Considerations for designing a learner corpus ('rona, 2003)
Criteria
(a) language-related
(b) task-related.
(OJ learner-related
Features
Mode (written/spoken)
Genre Qetter/diary/fictionlessay)
Style (narration/argumentation)
'lbpic (generalJleisure)
Data collection (cross'sectional/longitudinal)
Elicitation (spontaneous! prepared)
Use of references (dictionary! source text)
Time limitation (fixedlfreel homework)
Ll background
L2 environment (ESUEFlJ level of schoo])
L2 proficiency (standard test score)
Internal-cognitive (age/cognitive style)
Internal-affective (motivation! attitude)
In this project, a spoken corpus of Thai English learners (SCoTEL) was designed,
based on 'funo's consideration framework, to include spoken language produced to
respond to speaking test tasks by Thai undergraduate students (see Table 2). In
particular, 44 undergraduate students at a university in Thailand volunteered to take a
proficiency speaking test, administered by the researcher at the Thai university. Nine of
them were males and 33 were females. Their ages ranged from 18 to 22 years old. Eight
were non' English major students, and 36 were English major.
The speaking test consisted of six speaking tasks: three independent tasks and three
integrated tasks. The independent tasks required the students to express and support
their opinions about familiar topics using personal knowledge and experience. Each
independent task allowed the students to prepare their responses for 15 seconds before
recording their voice for 45 seconds. In the integrated tasks, the students were required
to synthesize information from two sources (i.e., reading and listening material) and
then speak a response. Each integrated task included a short reading passage
94
(approximately 100 words long) and a short talk. (approximately one to two minutes
long). The participants had 45 seconds to read a short passage. Then they listened to
part of a lecture related to the passage they read. They were allowed to take notes while
listening to the lecture. Then the participants responded to a question using the
information from the passage and the lecture. They can use the notes to answer the
question. They had 30 seconds to prepare, and 60 seconds to speak into the microphone
and give their response. The tasks were based on campus'related situation and
academic classroom material. The test approximately took. a total of 30 minutes for the
six speaking tasks (about 3 minutes for three independent tasks, 12 minutes for three
integrated. tasks, plus time before and after the actual test).
Table 2 Considerations for designing SCoTEL (based on 'lbno, 2003)
Criteria
(a) language-related.
(b) task-related.
<.) learner-related
2.1 Data Elicitation
Mode: spoken
Genre: monologue
Features
Style: descriptive & narrative
'lbpic: general & academic
Task: independent & integrated
Elicitation: spontaneous
Use ofreIerences: source text
Time limitation: fixed
Ll background: Thai
L2 environment: EFL university
L2 proficiency: low-intermediate & intermediate
All spoken data were elicited from all 44 participants who volunteered to take the
proficiency speaking test in order to prepare themselves for the TOEFL iBT test. The
proficiency test was administered with the students in a language lab, equipped with
cassette booths with headsets and microphones, and a teacher's controlling computer
with a headset and a microphone. Each student gave their responses to all six test tasks,
and their responses were recorded. The test lasted for 30 minutes. The total of 264
responses was obtained (44 students x 6 test tasks). Each of these responses was scored
by two trained raters, using the two scoring rubrics u.e., independent and integrated.
scoring rubrics).
A collection of 264 speaking responses contained 132 responses from the independent
tasks and 132 from the integrated. tasks. A total number of words in this corpus were
21,897, with 10,517 words in independent responses and about 11,380 in integrated
ones (see Table 1). All audio files of the spoken responses were saved individually
as . way files in the study. Each file was labeled using the following protocol:
95
speskerID-tssknumber (e.g., Sl-Ta referring a speaker number 1 producing a spoken
response on reading/listening/speaking task number 3).
Table 3 The spoken corpus of Thai English learners (SCoTEL)
Speaking task
Independent speaking
integrated ,peaking
Thtal
2.2 Transcription and Coding
Number of
responses
132
132
264
Average length
79.67
86.21
82.94
Number of words
10.517
11,380
21,897
A collection of 264 speaking responses was transcribed by the researcher using a
transcribing scheme that includes all spoken features (e.g., discourse markers, :filled
and unfilled pauses). Each of the 264 audio files was transcribed and saved as a .trtfile
for practicality in tagging parts of speech. All transcripts were double checked by the
researcher.
After all spoken responses were transcribed, all spoken responses were coded for
lexico-grammatical features. The transcripts were automatically tagged for
lexico-grammatical features in order to identify parts of speech of the spoken texts. The
tags were rechecked using the reference grammar LongmaJl GrlHD.mar of Spoken aJld
lITitten English (Biber et al., 1999).
III Use of the Spoken Corpus of Thai English Learners for Assessment and
Pedagogical Purposes
3.1 Application in Language Assessment
A number of studies used corpus linguistic methodology for assessment purposes. Many
studies used word frequency data from corpus analysis to develop vocabulary cloze tests
(e.g., Coniam, 1997; Smith et al., 2009). Some studies used a corpus to design test tasks.
Fuentes (2007), for example, used a corpus of advertising texts to design test tasks for
reading comprehension in the English for Tourism course. Corpora used in these studies
were compiled from either professional English or native speakers of English.
For learner corpora. many studies show that a learner corpus analysis can be used to
describe learner language and, sometimes, using such description as a basis for
studying the validity of language tests or assessment systems. In addition, a corpus
analysis followed up with scoring procedures and raters' verbal data elicitation may be
plausibly employed to contribute to assessment purposes of research studies. This
notion is supported by the reviewed studies related to TOEFL iBTs rating (Brown et
al., 2005; Xi & Mollaun, 2006) and other previous research related to scoring rubrics
96
and raters (e.g., Cumming, Grant, Mulcahy-Emt, Powers, 2005; Cumming, Kantor, &
Powers, 2001; Fulcher, 1997; Turner & Upshur, 2002; Upshur & Turner, 1995). In these
studies, a corpus-based analysis is viewed as advantageous in empirically supporting
raters' scoring and verbal data.
As the primary purpose of SCoTEL compilation was to inform scoring rubric
development, a study (Poonpon, 2011) investigated salient language features in the 264
responses that can differentiate good and poor performance of Thai undergraduate
students. The results showed that among 100 grammatical features found in the spoken
corpus, the salient grammatical sets of features included (a) sentence complexity G..e.,
mental verbs, private verbs, verbs not including auxiliary verbs, verbs with uninflected
present, imperative and third person, 'that' deletion, and 'that' complement clauses
controlled by verb of likelihood), (1,) a supplement of information about actions or events
G..e., general adverbs), and (c) reference to things or persons (i.e., concrete nowtS). And
the grammatical set of sentence complexity-representing 'that' complement clauses
controlled by communication and mental verbs-was a significant predictor of students'
proficiency levels. The study confirms the use of the spoken corpus to analyze complex
grammatical features that can differentiate students at different proficiency levels.
SCoTEL can also be used in rater training procedure (see Figure 1), which aims to
prepare raters by helping them understand criteria and scoring rubric descriptors used
in a scoring rubric and training them to apply these criteria and descriptors consistently
in their judgment (Alderson, Clapham, & Wall, 1995). As seen in Figure 1, finding
benchmark responses for each score level is the :first step to be prepared before the rater
training procedure. The responses that were scored by two raters to be at different
proficiency levels can be selected to be used as 'benchmarked responses for each score
level. These responses should have perfect or high reliability of scores (r= 0.9 - 1.00). In
other words, the responses that were given the same or similar score by both raters
should be selected to represent responses in each score level.
During the training, the benchmark spoken responses are played for raters, following
the introduction to the test and explanation of criteria and descriptors in a scoring
rubric. Raters then are asked to discuss salient features in the benchmark responses in
relation to the rubric. At this point, scripts of the responses can also be used in
supplement to the audio files to facilitate the raters' understanding of the rubric.
97
Before training ------to>
Start training
Finding benchmark sample responses for
each score level
Introducing raters to the test ---+[ ~-----,!,-------~
Familiarizing raters with the rubric
(discussing salient features in benchmark
Raters scoring a set of sample responses
Discussing problems or questions relating
to the rubric and the set of samples
Finding satisfied agreement
Figl. A rater training procedure (adapted. from Poonpon, 2009)
3.2 Pedagogical Application
Corpus techniques have been increasingly used in the L2 classroom. Together with
the techniques, data-driven learning (DDL), which is an approach of language learning
emphasizing learning processes controlled by learners themselves, is highlighted <Johns,
1997). The theory behind DDL is that language learners act as language detectives,
discovering language facts or patterns (e.g., inducing meanings and identifying
form-function relationships and patterns of semantic prosody) by themselves from
sorted concordances of words and phrases are presented. A number of DDL studies
reveal positive effects of this learning approach over the use of traditional grammar and
lexical instruction (e.g., Basanta & Martin, 2007; Bernadini, 2004; Cobb & Horst, 2001;
Cresswell, 2007; Jafarpour, Hashemian, & Alipour, 2013; Leel, 2011).
Apart from the DDL approach, an analysis of learner corpus can be done to
investigate productive vocabulary or grammatical features produced by learners to
inform syllabus design, material development, and classroom activities. The current
spoken corpus has been analyzed to examine frequent words and collocations used by
the Thai learners. Discourse markers and pauses have also been investigated to see if
98
there is a relationship between the markers and students' proficiency. An examination
of discourse markers can also show how often particular markers are used. in speaking
by Thai learners versus by native speakers. Findings from such investigations can
help the teacher to determine what language features are linked to the target register.
Such information can then be used to help revise a syllabus, particularly relevant to
speaking skill.
With the help of SCoTEL, exercises that raised. learners' were created, based on real
examples, in order to provide students with an opportunity to detect spoken errors in
sample responses. In this scenario, real examples from a native speaker' corpus (e.g.,
MICASE) were also drawn to be used as a reference for the students to compare the
target language features. This kind of materials encourages consciousness-raising
activities (Schmidt, 1990). The learners used the language facts and patterns to confirm
their vocabulary or grammar understanding or correct lexical or grammatical errors in
their speeches. This learning practice can expose the learners to authentic language
instead of lexical or grammatical exercises for speaking in their textbooks and also
motivate language learners in learning to speak English and improving their speaking
skill.
IV Conclusion
To increase an availability of Thai learner corpora, this paper described an initial
attempt to build a spoken English corpus of Thai English learners. It highlights a
design and compilation of a spoken corpus produced by Thai undergraduate students
taking a speaking proficiency test. The corpus SCoTEL includes 264 spoken responses
to six speaking test tasks administered for 44 students. The illustrations of the use of
the spoken corpus show that this Thai learner corpus can be a useful resource to
investigate language patterns-lexical, structural, le:rico-grammatical, discourse,
phonological-with very specific purposes to inform English language teaching and
learning as well as testing.
However, SCoTEL is still limited. in its size and accessibility. Thus in order to make
SCoTEL a database for further corpus-based. studies, this corpus project still needs an
expansion in terms of the number of spoken responses within the same genre u.e.,
monologues produced in a test situation) as well as a consideration for accessible
channels of SCoTEL. At present, the project included more responses, but they are
under scoring and transcribing processes. To maximize the use of the corpus, the next
step of the project is to include Thai students' spoken language in a dialogical form
produced in test situations, followed by monologica1 (e.g., oral presentation) and
dialogical language (e.g., authentic conversation during class discussion) occurring in
classroom (no-test) contexts.
99
References
Alder,on, J. C., Clapham, C., & Wall, D. (1995). Language testing canatructian and
evaluation. Cambridge: Cambridge University Press.
Basanta, C. P., & Martin, M. E. R. (2007), The application of data-driven learning to a
smail-scale corpus: Using film transcripts for teaching conversational skills. In E.
Hidalgo, L. Quereda, & J. Santana <Ed,.), Corpora in the fOreign language classroom
(pp. 141· 160). Am,terdam: Rodopi.
Bernardini, S. (2004). Corpora in the classroom: An overview and some reflections on
future developments. In J. Sinclair (Ed.), How to use corpora iD.lsnguBge teaching
(Pp.15-36). Amsterdam: Benjamins.
Biber, D., Johansson, S., Leech, G., Conrad, S., & Finnegan, E. (1999). LoDg11l/W.
grammar of spoken and written English. Essex: Pearson Education Limited.
Brown, A., Iwashita, N., & McNamara, T. (2005), An eumination ofraterorientations and test-taker perforHUJ.nce on English-forsca.demic-purposes speakiDg tasks.
(TOEFL Monograph Series MS-29), Princeton, NJ: Educational Testing Service.
Chanchanglek, S., & Sriussadaporn, N. (2011). A corpus analysis of English vocabulary
input in course materials used for engineering students. Humllllities Journal, 18:]),
141-149.
Chanyoo, N. (2013). A corpus-based study of connectors and thematic progression in the
academic writing of Thai EFL students. Unpublished PhD dissertation, University of
Pittsburgh.
Cobb, T, & Horst, M. (2001). Reading academic English: Carrying learners across
the lexical threshold. In: J. Flowerdew & M. Peacock (Eels.), Resesrch
Perspectives an English for Academic Purposes (pp. 315·329). Cambridge:
Cambridge University Press.
Coniam, D. (1997). A preliminary inquiry into using corpus word frequency data in the
automatic generation of English language cloze tests. CALICO Journal, 16, 15-33.
Cresswell, A (2007). Getting to know connectors? Evaluating data-driven learning in a
writing skills course. In E. Hidalgo, L. Quereda, & J. Santana (EdsJ, Corpora in the
foreign language classroom (pp. 267-289). Amsterdam: Rodopi.
Cumming, A., Grant, L., Mulcahy-Emt, P., & Powers, D. E. (2005). A
teacher-verification study of speaking and writing prototype tasks for a Dew TOEFL.
(TOEFL Monograph Series MS-26). Princeton, NJ: Educational Testing Service.
Cumming, A, Kantor, R., & Powers, D. E. (2001). &Dring TOEFL essays aDd TOEFL
2000 prototype writing tasks: All investigation into raters' decision making aDd
development of a. preJimiDa.ry analytic uRllJework. (TOEFL Monograph Series
MS-2.2). Princeton, NJ: Educational Testing Service.
Fuentes, A. C. (2007). A corpus-based assessment of reading comprehension in English
for'lburism studies. In E. Hidaalgo, L. Quereda, & J. Santana (Eds.), Corpora. in the
100
fOreign language classroom (pp. 309'326). Amsterdam! New York, NY- Rodopi.
Fulcher, G. (1997), The testing of speaking in a second language. In C. Clapham & D.
Corson (Ede.), Encyclopedia of l8DIfUBg6 and etiucst.ion (pp. 76-85). Norwell, MA.:
RJuwerA£adenricPubliSham.
Jafarpour, A., Hashemian, M., & Alipour. S. (2013), A corpus-based approach toward
teaching collocation of synonyms. Theory and Practice in Lsnguage Studies,
aD,51·60.
Jirapanakom, N. (2012). How doctors report: A corpus-based contrastive analysis of
reporting verbs in research article introductions published in international and Thai
medical journals. The lJaDgkok Medical JoumsJ. 4, 39-46.
Johns, T. (1997). Contexts: The background., development, and triaIling of a
concordance-based. CALL program. In A. Wichmann, S. Fligelstone, T. McEnery, & G.
Knowles (Eds.), TeacbiDg and language corpora (pp. 100-115). London: Longman.
Kaewphanngam, C., Broughton, M. M., & Soranasataporn, S. (2002), Corpus-based
analysis: Guidelines for getting practical language input in materials development.
SLLTJournal, 2, 16,32.
Level, H. (2011), In defense of concordancing: An application of data -driven learning in
Taiwan. P:rocedia: Social s.nd Behavioral Sciences, 12, 399--408. Retrieved on
January2012: http://www.sciencedirect.com/science/article/piilS187704281100139X
Liangpanit, C. (2010). The development of a corpus-based vocabulary package for
business English majors. Unpublished PhD dissertation, Suranaree University of
Technology.
Patanasom, A. T. (2010). The use of linking adverbials in the argumentative essays of
Thai EFL learners. KKU l/esearcb JourJl8l, 111.78),751'767.
Phoocharoensil, S. (2010). A corpus-based study of English synonyms. InternatiOlUll
Journal of Arts and Science, .(10), 227-245.
Poonpon, K (2009). Expanding a second language speaking rating Scale for
instructional and assessment purposes. Unpublished dissertation, Northern
Arizona University.
Poonpon, K (2011). Investigating salient grammatical features in a test-based spoken
corpus. In G. Weir, S. Ishikawa, & K Poonpon (Eds.), Corpora and language
teclmologies in teaching. learDiDg IlDd research (Pp.13-19). Glasgow: University of
Strathclyde Publishing.
Poonpon, K, Honsa, S., & Cowan, R. (2001). The teaching of academic vocabulary to
Science students at Thai universities. SLLT JOurDaJ, 51-63.
Schmidt, R. (1990). Input, interaction, attention, and awareness: the case for
consciousness-raising in second language teaching. Paper prepared for presentation
at Enpuli Encontro Nacional Professores Universitarios de Lengua Inglesa, Rio de
Janeiro.
101
Smith, S. et al. Cm09). Automatic c10ze generation for English proficiency testing. The
Proceeding of The 2009 LTl'C (The Language Training & Te8ting Center) English
Lsaner Corpus, lnterna.tionai Conference on English Language Teaching s.nd
Testing (Pp.l-12). National Taiwan University, Taipei.
1bno, Y. (2003). Learner corpora: design, development and applications. In Archer D.,
Rayson P., Wilson A. & McEnery T. (Eds.) Proceedings of the Corpus Linguistics 2003
Conference (pp. 800-809). Lancaster: University Centre for Computer Corpus
Research on Language, Lancaster University.
Turner, C. E., & Upshur, J. A. (2002). Rating scales derived from student samples:
Effects of the scale maker and the student sample on scale content and student
,cores. TESOL Quarterly. 36, 49'70.
Upshur, J. A, & Turner, C. E. (1996). Constructing rating scales for second language
tests. ELTJournaJ, 49,3-12.
Wan-a-rom, U. (2012). Lexical evaluation of teacher-made coursebooks: Thai case
studies of foundation English courses at tertiary level. Engh"sb Ls.ngua.ge '1ha.ching,
,;(8),146'166.
Xi, X., & Mollaun, P. (2006). Investigating the utility of analytic scoring for the TOEFL
academic speaking test ('FAST). (TOEFL iBT R.s.aroh Report TOEFLiBT-OD.
Princeton, NJ: Educational Testing Service.