The Development of an Error-tagged Learner Corpus: TELC (Turkish-English Learner Corpus) and its Web-interface

Hakan Cangır; Kutay Uzun; Taner Can; Enis Oğuz; Ömer Faruk Kaya

doi:10.18492/dad.1489654

TR EN

Hata Etiketli Öğrenen Derlemi Geliştirilmesi: TELC (Türkçe-İngilizce Öğrenen Derlemi) ve Web-Arayüzü

Abstract

Oldukça nadir olmasına ve derlem dilbilimciler tarafından geliştirmedeki zorlukları nedeniyle tercih edilmemesine rağmen, farklı D1 geçmişlerine sahip öğrencilerin sözlü ve yazılı metinlerinden oluşan öğrenen derlemleri, hem ikinci dil edinimi alanındaki araştırmacılara hem de dil öğretmenlerine fayda sağlayabilir. Bu ihtiyaçtan yola çıkarak ve derlemlerin Türkiye bağlamında dil öğretmenleri ve öğrenenler için potansiyel önemini göz önünde bulundurarak, D2 İngilizce öğrenen derlemimiz, özellikle ikinci dil öğrenenlerin dil üretiminde kilit rol oynayan sözcük hatalarını inceleyen, hata etiketli bir öğrenen derlemi oluşturmaya yönelik bir girişimdir. Hemchua ve Schmitt'in sözcüksel hata taksonomisine dayanan ve alanyazındaki katı metodolojik hususlar (örneğin, hata adlandırma ve birkaç tur etiketleme yoluyla düzeltme) izlenerek geliştirilen derlem, 231 üniversite öğrencisinin 369 yazılı metninden (104.864 sözcük, 3000'den fazla etiketlenmiş ve düzeltilmiş hatadan) oluşmaktadır. Kullanıcı dostu arayüze sahip derlem veri tabanı, kullanıcıların istatistiksel çıktılara ulaşmasına ve sözcüksel hataları ve doğru versiyonlarını görüntüleyebilmesine ve derlem içinde farklı hata türlerini aramasına imkân sağlar. Ayrıca, arayüzde veri tabanının gelişimine olanak sağlayan hata etiketleme eklentisi mevcuttur. TELC, dil öğretenlere ve ikinci dil öğrenenlere rehber kaynak niteliğinde bir internet sitesi olmasının yanı sıra, bu alanda çalışmalar yürüten uygulamalı dilbilimciler tarafından geliştirilebilecek bir dijital platform olarak da değerlendirilebilir. Son olarak, kullanımı kolay arayüzü ve çok yönlü özellikleri sayesinde, Türkiye'deki diğer üniversitelerin de katkısıyla yabancı/ikinci dil olarak İngilizce öğretimi / öğrenimi için referans bir öğrenen derlemi olma potansiyeline sahiptir.

Keywords

Supporting Institution

TÜBİTAK ARDEB

Project Number

220K289

Ethical Statement

Ankara Üniversitesi Etik Kurulunun 29/03/2021 tarihli toplantısında alınan 3/9 sayılı kararıyla çalışmanın etik açıdan uygun olduğuna karar verilmiştir.

The Development of an Error-tagged Learner Corpus: TELC (Turkish-English Learner Corpus) and its Web-interface

Abstract

Though rather rare and not favoured by corpus linguists due to computationally hard-to-handle problems, learner corpora consisting of spoken and written texts by students from different L1 backgrounds can benefit both researchers in the field of second language acquisition and language teachers. Growing from this need and considering corpora’s potential importance for the language teachers and learners in the Turkish context, our L2 English learner corpus is yet another humble attempt to build an error-tagged learner corpus particularly scrutinizing lexical errors, which play a key role in the language production of second language learners. Building on Hemchua and Schmitt’s lexical error taxonomy and developed following the strict methodological considerations in the literature (e.g., error naming and fixing through several rounds of tagging), the corpus consists of 369 written texts by 231 university students (with 104,864 words, 3000+ tagged and fixed errors). The corpus database is provided with a user-friendly web-interface, which consists of statistical output, modules highlighting lexical errors and correct versions, different search options including error types, and an error-tagging add-in for further development. In addition to being a resourceful website trying to guide language practitioners and second language learners, it can be considered a platform with a capacity to be developed further by applied linguists conducting studies in this line of research. Finally, thanks to its easy-to-use interface and versatile features, it has potential to become a reference learner corpus for English as a foreign/second language with the contribution of other universities in Türkiye.

Keywords

Project Number

220K289

References

Anthony, L. (2023). AntConc (Version 4.2.4) [Computer Software]. Waseda University. Available from https://www.laurenceanthony.net/software
Berberich, K., & Kleiber, I. (2023). Tools for corpus linguistics. https://corpus-analysis.com/
Biber, D., Gray, B., & Poonpon, K. (2011). Should we use characteristics of conversation to measure grammatical complexity in L2 writing development? Tesol Quarterly, 45(1), 5-35. https://doi.org/10.5054/tq.2011.244483
Bley-Vroman, R. (1989). What is the logical problem of foreign language learning? In S. M. Gass & J. Schachter (Eds.), Linguistic perspectives on second language acquisition (pp. 41–68). Cambridge University Press. https://doi.org/10.1017/CBO9781139524544.005
Cangır, H., Uzun, K., Can, T., Küllü, K., Oğuz, E., Kaya Ö. M. (2025). Linguistic features and L2 English writing quality: A multidimensional analysis. [Manuscript submitted for publication]. AILA Review.
Cortes, V. (2018). Corpus tools for Writing Teachers. In The TESOL Encyclopedia of English Language Teaching (pp. 1–6). John Wiley & Sons, Ltd. https://doi.org/10.1002/9781118784235.eelt0553
Crosthwaite, P. (Ed.). (2024). Corpora for language learning: Bridging the research-practice divide (1st ed.). Routledge. https://doi.org/10.4324/9781003413301
Ellis, N. C., & Laporte, N. (2014). Contexts of acquisition: Effects of formal instruction and naturalistic exposure on second language acquisition. In Tutorials in bilingualism (pp. 53-83). Psychology Press.

Francis, W., & Kučera, H. (1964). Manual of information to accompany a standard corpus of present-day edited American English, for use with digital computers. Brown University.
Friginal, E. (2013). Developing research report writing skills using corpora. English for Specific Purposes, 32(4), 208–220. https://doi.org/https://doi.org/10.1016/j.esp.2013.06.001
Gablasova, D., Brezina, V., & McEnery, T. (2017). Exploring learner language through corpora: Comparing and interpreting corpus frequency information. Language Learning 67(S1), 130–154. https://doi.org/10.1111/lang.12226
Gilquin, G. (2015). From design to collection of learner corpora. In S. Granger, G. Gilquin, & F. Meunier (Eds.), The Cambridge handbook of learner corpus research (pp. 9–34). Cambridge University Press. https://doi.org/10.1017/CBO9781139649414.002
Gilquin, G., & Granger, S. (2015). Learner language. In D. Biber & R. Reppen (Eds.), The Cambridge handbook of English corpus linguistics (pp. 418–436). Cambridge University Press. https://doi.org/10.1017/CBO9781139764377.024
Gilquin, G. (2023). Written learner corpora to inform teaching. In R.R. Jablonkai & E. Csomay (eds) The Routledge Handbook of Corpora and English Language Teaching and Learning (pp. 281-295). Routledge.
Granger, S. (1993). International Corpus of learner English. In Aarts, J., de Haan, P., & Oostdijk, N. (eds.) English language corpora: Design, analysis and exploitation, (pp. 57 – 71). Rodopi. https://doi.org/10.1163/9789004653559_007
Granger, S. (2002). A Bird’s-eye review of learner corpus research. In Granger, S., Hung, J., Petch-Tyson, S. (eds.), Computer learner corpora, second language acquisition and foreign language teaching (pp. 3-33). John Benjamins. https://doi.org/10.1075/lllt.6.04gra
Granger, S. (2003). The International Corpus of Learner English: A new resource for foreign language learning and teaching and second language acquisition research. In Tesol Quarterly 37(3), pp. 538–546. https://doi.org/10.2307/3588404
Granger, S. (2015). The contribution of learner corpora to reference and instructional materials design. In S. Granger, G. Gilquin, & F. Meunier (Eds.), The Cambridge handbook of learner corpus research (pp. 485-510). Cambridge University Press. https://doi.org/10.1017/CBO9781139649414.022
Granger, S. (2021). Commentary: Have Learner Corpus Research and Second Language Acquisition Finally Met? In B. Le Bruyn & M. Paquot (Eds.), Learner corpus research meets second language acquisition (pp. 243–257). Cambridge University Press. https://doi.org/10.1017/9781108674577.012
Granger, S., Dupont, M., Meunier, F., Naets, H., & Paquot, M. (2020). The International Corpus of Learner English. Version 3. Presses universitaires de Louvain.
Granger, S., Gilquin, G., & Meunier, F. (2015). Introduction: learner corpus research – past, present and future. In S. Granger, G. Gilquin, & F. Meunier (Eds.), The Cambridge handbook of learner corpus research (pp. 1–6). Cambridge University Press. https://doi.org/10.1017/cbo9781139649414.001
Hemchua, S., & Schmitt, N. (2006). An analysis of lexical errors in the English compositions of Thai learners. Prospect, 21(3). 3-25.
Hunston, S. (2002). Corpora in applied linguistics. Cambridge University Press.
Kaya, F. Ö., Uzun, K., & Cangır, H. (2022). Using corpora for language teaching and assessment in L2: A narrative review. Focus on ELT Journal, 4(3), 46-62. https://doi.org/10.14744/felt.2022.4.3.4
Kilgarriff, A., Baisa, V., Bušta, J., Jakubíček, M., Kovář, V., Michelfeit, J., Rychlý, P., & Suchomel, V. (2014). The Sketch Engine: ten years on. Lexicography, 1(1), 7–36. https://doi.org/10.1007/s40607-014-0009-9
Kučera, H., & Francis, W. (1967). Computational analysis of present day American English. Brown University Press. https://doi.org/10.1002/asi.5090190414
Kyle, K. (2016). Measuring syntactic development in L2 writing: Fine grained indices of 97 syntactic complexity and usage-based indices of syntactic sophistication [Doctoral dissertation, Georgia State University]. ScholarWorks @Georgia State University. http://scholarworks.gsu.edu/alesl_diss/35
Kyle, K., Crossley, S. A., & Berger, C. (2018). The tool for the automatic analysis of lexical sophistication (TAALES): Version 2.0. Behavior Research Methods, 50, 1030–1046. https://doi.org/10.3758/s13428-017-0924-4
Lee, J. J., Bychkovska, T., & Maxwell, J. D. (2019). Breaking the rules? A corpus-based comparison of informal features in L1 and L2 undergraduate student writing. System, 80, 143-153. https://doi.org/10.1016/j.system.2018.11.010
Lee, S. (2011). Challenges of using corpora in language teaching and learning: Implications for secondary education. Linguistic Research, 28(1), 159–178. https://doi.org/10.17250/khisli.28.1.201104.009
Leech, G. (1981). Semantics: the study of meaning. 2nd Ed. Penguin.
Liao, Y., & Fukuya, Y. J. (2004). Avoidance of phrasal verbs: The case of Chinese learners of English. Language Learning, 54(2), 193–226. https://doi.org/10.1111/j.1467-9922.2004.00254.x
Meunier, F. (2020). Introduction to learner Corpus research. In The Routledge handbook of second language acquisition and corpora (pp. 23-36). Routledge. https://doi.org/10.4324/9781351137904-4
Moore, J. (2005). Common mistakes at Proficiency ... and how to avoid them. Cambridge University Press.
Murakami, A., & Alexopoulou, T. (2016). L1 influence on the acquisition order of English grammatical morphemes: A learner corpus study. Studies in Second Language Acquisition, 38(3), 365-401. https://doi.org/10.1017/S0272263115000352
Myles, F. (2005). Interlanguage corpora and second language acquisition research. Second Language Research, 21(4), 373-391. https://doi.org/10.1191/0267658305sr252oa
Myles, F. (2021). Commentary: An SLA perspective on learner corpus research. In B. Le Bruyn & M. Paquot (Eds.), Learner Corpus Research Meets Second Language Acquisition (pp. 258–273). Cambridge University Press. https://doi.org/10.1017/9781108674577.013
Nesselhauf, N. (2003). The Use of Collocations by Advanced Learners of English and Some Implications for Teaching. Applied Linguistics, 24(2), 223–242. https://doi.org/10.1093/applin/24.2.223
O’Keeffe, A., McCarthy, M., & Carter, R. (2007). From Corpus to Classroom: Language Use and Language Teaching. Cambridge University Press.
Paquot, M., & Granger, S. (2012). Formulaic language in learner corpora. In Ann Rev Appl Linguist, 32, 130–149. https://doi.org/101017/S0267190512000098
Paquot, M., Larsson, T., Hasselgård, H., Ebeling, S. O., De Meyere, D., Valentin, L., Laso, N. J., Verdaguer, I., & van Vuuren, S. (2022). The varieties of English for specific purposes database (VESPA): Towards a multi-L1 and multi-register learner corpus of disciplinary writing. Research in Corpus Linguistics, 10(2), 1–15. https://doi.org/10.32714/ ricl.10.02.02
Pérez-Paredes, P. (2022). A systematic review of the uses and spread of corpora and data-driven learning in CALL research during 2011–2015. In Computer Assisted Language Learning, 35(1-2), 36–61. https://doi.org/10.1080/09588221.2019.1667832
Schneider, G. (2023). Detecting and analysing learner difficulties using a learner corpus without error tagging. In K. Harrington & P. Ronan (Eds.), Demystifying corpus linguistics for English language teaching (pp. 229–257). Springer International Publishing. https://doi.org/10.1007/978-3-031-11220-1_12
Selivan, L. (2023). Corpus linguistics and vocabulary teaching. In K. Harrington & P. Ronan (Eds.), Demystifying corpus linguistics for English language teaching (pp. 139–161). Springer International Publishing. https://doi.org/10.1007/978-3-031-11220-1_8
Sinclair, J. M. (1990). Collins COBUILD English grammar. Collins.
Thewissen, J. (2013). Capturing L2 accuracy developmental patterns: Insights from an error‐tagged EFL learner corpus. The Modern Language Journal, 97(S1), 77-101.
Thewissen, J. (2015). Accuracy across proficiency levels: A learner corpus approach. Presses universitaires de Louvain.
Xiao, R. (2009). How can corpora help in language pedagogy. In Postgraduate Conference in Applied Linguistics, Ningbo, China.
Xu, Q. (2016). Application of learner corpora to second language learning and teaching: An overview. In English Language Teaching, 9(8), pp. 46–52. Available online at https://eric.ed.gov/?id=EJ1104573

Details

Primary Language

English

Subjects

Corpus Linguistics, Applied Linguistics and Educational Linguistics, Linguistics (Other)

Journal Section

Research Article

Authors

Hakan Cangır ^*
0000-0003-2589-2466
Türkiye

Kutay Uzun
0000-0002-8434-0832
Türkiye

Taner Can
0000-0001-8869-4817
Türkiye

Enis Oğuz
0000-0001-5819-4926
Türkiye

Ömer Faruk Kaya
0000-0001-7329-5557
Türkiye

Publication Date

December 24, 2024

Submission Date

May 24, 2024

Acceptance Date

October 3, 2024

Published in Issue

Year 2024 Volume: 35 Number: 2

DOI

https://doi.org/10.18492/dad.1489654

IZ

https://izlik.org/JA47XC27PG

Cite

RIS / Bibtex

APA

Cangır, H., Uzun, K., Can, T., Oğuz, E., & Kaya, Ö. F. (2024). The Development of an Error-tagged Learner Corpus: TELC (Turkish-English Learner Corpus) and its Web-interface. Dilbilim Araştırmaları Dergisi, 35(2), 279-307. https://doi.org/10.18492/dad.1489654

AMA

1.Cangır H, Uzun K, Can T, Oğuz E, Kaya ÖF. The Development of an Error-tagged Learner Corpus: TELC (Turkish-English Learner Corpus) and its Web-interface. JLR. 2024;35(2):279-307. doi:10.18492/dad.1489654

Chicago

Cangır, Hakan, Kutay Uzun, Taner Can, Enis Oğuz, and Ömer Faruk Kaya. 2024. “The Development of an Error-Tagged Learner Corpus: TELC (Turkish-English Learner Corpus) and Its Web-Interface”. Dilbilim Araştırmaları Dergisi 35 (2): 279-307. https://doi.org/10.18492/dad.1489654.

EndNote

Cangır H, Uzun K, Can T, Oğuz E, Kaya ÖF (December 1, 2024) The Development of an Error-tagged Learner Corpus: TELC (Turkish-English Learner Corpus) and its Web-interface. Dilbilim Araştırmaları Dergisi 35 2 279–307.

IEEE

[1]H. Cangır, K. Uzun, T. Can, E. Oğuz, and Ö. F. Kaya, “The Development of an Error-tagged Learner Corpus: TELC (Turkish-English Learner Corpus) and its Web-interface”, JLR, vol. 35, no. 2, pp. 279–307, Dec. 2024, doi: 10.18492/dad.1489654.

ISNAD

Cangır, Hakan - Uzun, Kutay - Can, Taner - Oğuz, Enis - Kaya, Ömer Faruk. “The Development of an Error-Tagged Learner Corpus: TELC (Turkish-English Learner Corpus) and Its Web-Interface”. Dilbilim Araştırmaları Dergisi 35/2 (December 1, 2024): 279-307. https://doi.org/10.18492/dad.1489654.

JAMA

1.Cangır H, Uzun K, Can T, Oğuz E, Kaya ÖF. The Development of an Error-tagged Learner Corpus: TELC (Turkish-English Learner Corpus) and its Web-interface. JLR. 2024;35:279–307.

MLA

Cangır, Hakan, et al. “The Development of an Error-Tagged Learner Corpus: TELC (Turkish-English Learner Corpus) and Its Web-Interface”. Dilbilim Araştırmaları Dergisi, vol. 35, no. 2, Dec. 2024, pp. 279-07, doi:10.18492/dad.1489654.

Vancouver

1.Hakan Cangır, Kutay Uzun, Taner Can, Enis Oğuz, Ömer Faruk Kaya. The Development of an Error-tagged Learner Corpus: TELC (Turkish-English Learner Corpus) and its Web-interface. JLR. 2024 Dec. 1;35(2):279-307. doi:10.18492/dad.1489654