Research Article
BibTex RIS Cite

Akran Değerlendirmesinde Puanlayıcı Katılığı Kayması

Year 2017, Volume: 8 Issue: 4, 469 - 489, 29.12.2017
https://doi.org/10.21031/epod.328119

Abstract

Akran
değerlendirmesinde elde edilen puanların geçerliği ve güvenirliği hakkında
sağlam psikometrik dayanağı olan ve özellikle puanlayıcı etkisine değinen
yeteri kadar çalışma bulunmamaktadır. Bu çalışmada puanlayıcı etkilerinden olan
puanlayıcı katılık kaymasının (rater severity drift), akran değerlendirmede ne
derece görüldüğü araştırılmıştır. Eğitim fakültesindeki bir ders kapsamında
öğrenciler tarafından gerçekleştirilen sözlü sunum performansları aynı dersi
alan 29 akran tarafından dereceli puanlama anahtarı kullanılarak puanlanmıştır.
İlk üç gün iki sunum, dördüncü gün üç sunum olmak üzere toplam dokuz sunum dört
ayrı günde gerçekleştirilmiştir. Puanlayıcı kayması iki farklı çok yüzeyli
Rasch ölçme modeli (ayrı modeller ve kukla zaman ) yardımıyla incelenmiştir.
Her gün için hesaplanan puanlayıcı kestirimlerinden standartlaştırılmış farklar
indeksi ve kukla zaman modelinden etkileşim terimleri hesaplanmıştır.
Puanlayıcı kayması analizinde,  Gün-1
temel gün alınmış, Gün-1’den diğer günlere (Gün-2, 3 ve 4) değişimler
incelenmiştir. Analizler genel olarak akran puanlayıcıların arkadaşlarını
oldukça cömert bir biçimde puanladıklarını göstermiştir. Puanlayıcılar kendi
aralarında kıyaslandığında ise katılık/cömertlik seviyelerinin birbirlerinden
farklı olduğu görülmüştür. Sunumlar puanlayıcılar tarafından tutarlı bir
şekilde niteliklerine göre sıralandırılmıştır. Puanlayıcı kaymasını incelemek
için kullanılan iki yöntem benzer sonuçlar vermiştir. Gün-1 ve 2 arasında
puanlayıcı kestirimlerinde bir farklılık görülmemektedir. Her ne kadar
ortalamada puanlayıcılar daha cömert puanlama yapsa da, kaymalar istatistiksel olarak
anlamlı değildir. Gün-1 ve 3 arasında puanlayıcıların kestirimlerinde önemli
kaymaların olduğu puanlayıcıların oranı %38,10’dur. İki yönteme göre de
puanlayıcılar ortalamada yaklaşık 0,14 logit kayma gösterip daha katı puanlama
davranışı sergilemiştir. Gün-1 ve 4 arasında puanlayıcıların kestirimlerinde
önemli kaymaların olduğu puanlayıcıların sayısı standartlaştırılmış farklar
yöntemiyle üçgen, etkileşim terimi yöntemiyle birdir. Ortalamada iki yöntemle
de puanlayıcılar daha katılaşmıştır. Ortalamada kaymanın en yüksek olduğu Gün-4’tür. 

References

  • Bond, T. G., & Fox, C. M. (2015). Applying the Rasch Model: Fundamental Measurement in the Human Sciences. Mahwah, NJ: Lawrence Erlbaum.
  • Eckes,T. (2011). Introduction to Many-Facet Rasch Measurement. Analyzing and Evaluating Rater-Mediated Assessments. Frankfurt am Main: Peter Lang.
  • Braun, H. I. (1988). Understanding scoring reliability: Experiments in calibrating essay readers. Journal of Educational Statistics, 13, 1–18.
  • Braun, H. I., & Wainer, H. (1989). Making essay test scores fairer with statistics. In J. Tanur, F. Mosteller,W. H. Kruskal, E. L. Lehmann, R. F. Link, R. S. Pieters & G. S. Rising (Eds.), Statistics: A guide to the unknown (3rd ed., pp. 178–188). Pacific Grove, CA: Wadsworth.
  • Casabianca, J. M., Lockwood, J. R., & McCaffrey, D. F. (2015). Trends in classroom observation scores. Educational and Psychological Measurement, 75(2), 311–337. Congdon, P. J., & McQueen, J. (2000). The stability of rater severity in large-scale assessment programs. Journal of Educational Measurement, 37, 163-178.
  • Demirbilek, M. (2015). Social media and peer feedback: What do students really think about using Wiki and Facebook as platforms for peer feedback? Active Learning in Higher Education, 16(3) 211–224
  • Dochy, F. Segers, M., & Sluijsmans, D. (1999). The use of self-, peer and coassessment in higher education: A review. Studies in Higher Education, 24(3), 331-350
  • Engelhard, G. (1994). Examining rater errors in the assessment of written composition with a many‐faceted Rasch model. Journal of Educational Measurement 31 (2), 93-112.
  • Engelhard, G. (1996). Evaluating rater accuracy in performance assessments. Journal of Educational Measurement, 33, 56–70.
  • Engelhard, G., & Myford, C. M. (2003). Monitoring faculty consultant performance in the Advanced Placement English Literature and Composition Program with a many-faceted Rasch model (Research Rep. 03-01). Princeton, NJ: Educational Testing Service
  • Falchikov, N. (1995) Peer feedback marking: developing peer assessment. Innovations in Education and Training International, 32, 175-187.
  • Falchikov, N. (2001). Learning together: Peer tutoring in higher education. London: Routledge Falmer.
  • Falchikov, N., & Goldfinch, J. (2000). Student peer assessment in higher education: A meta-analysis comparing peer and teacher marks. Review of Educational Research, 70, 287–322.
  • Gabrielson, S., Gordon, B., & Engelhard, G. (1995). The effects of task choice on the quality of writing obtained in a statewide assessment. Applied Measurement in Education 8(4), 273-290.
  • Heyman J. E., & Sailors J. J. (2011). Peer assessment of class participation: applying peer nomination to overcome rating inflation. Assessment & Evaluation in Higher Education, 36(5), 605-618.
  • Hafner, J. C., & Hafner, P. M. (2003). Quantitative analysis of the rubric as an assessment tool: an empirical study of student peer-group rating. International Journal of Science Education, 25(12), 1509–1528.
  • Harik, P., Clauser, B. E., Grabovsky, I., Nungester, R. J., Swanson, D., & Nandakumar, R. (2009). An examination of rater drift within a generalizability theory framework. Journal of Educational Measurement, 46, 43-58.
  • Hoskens, M., & Wilson, M. (2001). Real-time feedback on rater drift in constructed response items: An example from the Golden State Examination. Journal of Educational Measurement, 38, 121–146.
  • Kane, J. S., & Lawler, E. E (1978). Methods of peer assessment. Psychological Bulletin, 85, 555-586.
  • Linacre, J. M. (2002). What do infit and outfit mean. Rasch Measurement Transaction, 16(2), 878.
  • Love, K. G. (1981). Comparison of peer assessment methods: Reliability, validity, friendship bias, and user reaction. Journal of Applied Psychology, 66(4),451-457.
  • Lumley, T., & McNamara, T. E (1995). Rater characteristics and rater bias: Implications for training. Language Testing, 12 (1), 54-71.
  • Lunz, M. E., & Stahl, J. A. (1990). Judge consistency and severity across grading periods. Evaluation and the Health Professions, 13, 425--444.
  • McKinley, D.,&Boulet, J. R. (2004). Detecting score drift in a high-stakes performance-based assessment. Advances in Health Sciences Education, 9, 29–38.
  • McLaughlin K., Ainslie M., Coderre S., Wright B., & Violato C. (2009). The effect of differential rater function over time (DRIFT) on objective structured clinical examination ratings. Medical Education, 43, 989–992.
  • McNamara, T. F. (1996). Measuring second language performance. Harlow, UK: Addison Wesley Longman Limited.
  • McQueen, J., & Congdon, P. J. (1997). Rater severity in large-scale assessment: Is it invariant? Paper presented at the annual meeting of the American Educational Research Association, Chicago, IL.
  • Messick, S. (1994). Alternatıve modes of assessment, unıform standards of valıdıty. ETS Research Report Series, 2,1-22.
  • Myford, C. M. (1991). Judging acting ability: The transition from notice to expert. Paper presented at the American Educational Research Association, Chicago IL.
  • Myford, C. M., & Wolfe, E. M. (2003). Detecting and Measuring Rater Effects Using Many-Facet Rasch Measurement: Part I. Journal of Aplied Measurement, 4(4), 386-422.
  • Myford, C. M., & Wolfe, E. M. (2004). Detecting and Measuring Rater Effects Using Many-Facet Rasch Measurement: Part II. Journal of Applied Measurement, 5(2), 189-227.
  • Myford, C. M., & Wolfe, E. M. (2009). Monitoring rater performance over time: A framework for detecting differential accuracy and differential scale use. Journal of Educational Measurement, 46, 371-389.
  • Park, Y. S. (2011). Rater drift in constructed response scoring via latent class signal detection theory and item response theory . Unpublished doctoral dissertation
  • Rowan, B., Harrison, D. M., & Hayes, A. (2004). Using instructional logs to study elementary school mathematics: A close look at curriculum and teaching in the early grades. Elementary School Journal, 105, 103-127.
  • Sadler, P. M., & Good, E. (2006). The impact of self and peer-grading on student learning. Educational Assessment, 11, 1–31.
  • Scruggs, T. E., & Mastropieri, M. A. (1998). Tutoring and students with special needs. In K. J. Topping & S. Ehly (Eds.), Peer-assisted learning (pp. 165–182). Mahwah, NJ: Lawrence Erlbaum Associates.
  • Scullen, S. E., Mount, M. K., & Goff, M. (2000). Understanding the latent structure of job performance ratings. Journal of Applied Psychology, 85, 956-970.
  • Somervell, H. (1993) Issues in assessment, enterprise and Higher Education: the case for self-, peer and collaborative assessment. Assessment & Evaluation in Higher Education, 18(3), 221-233.
  • Topping, K. J. (1998). Peer assessment between students in college and university. Review of Educational Research, 68, 249–276.
  • Topping, K. J. (2003). Self and peer assessment in school and university: Reliability, validity and utility. In Optimizing new modes of assessment: In search of qualities and standards, Edited by: Segers, M. S. R., Dochy, F. J. R. C. and Cascallar, E. C. 55–87. Dordrecht, The Netherlands: Kluwer Academic.
  • Topping, K. J. (2005). Trends in peer learning. Educational Psychology, 25, 631–645.
  • Topping, K. J. (2009). Peer Assessment. Theory Into Practice, 48(1), 20-27
  • Topping, K. J., & Ehly, S. (Eds.). (1998). Peer assisted learning. Mahwah, NJ: Lawrence Erlbaum Associates
  • Tseng, S. C., & Tsai, C.-C. (2007). On-line peer assessment and the role of the peer feedback: a study of high school computer course. Computers & Education, 49, 1161-1174
  • Weaver II, R. & Cotrell, H.W. (1986). Peer evaluation: a case study. Innovative Higher Education, 11, 25-39.
  • Wilson, M., & Case, H. (2000). An examination of variation in rater severity over time: A study of rater drift. Objective measurement: Theory into practice, 5, 113-134.
  • Wolfe, E. W., Moulder, B. C., & Myford, C. M. (2001). Detecting differential rater functioning over time (DRIFT) using a Rasch multi-faceted rating scale model. Journal of Applied Measurement, 2, 256–80.
  • Wolfe, E.W., & Myford, C. M. (2009). Monitoring Rater Performance Over Time: A Framework for Detecting Differential Accuracy and Differential Scale Category Use. Journal of Educational Measurement, 46(4), 371–389.
  • Wolfe, E.W., Myford, C. M., Engelhard Jr. G., & Manalo, J. R. (2007). Research Report No. 2007-2 Monitoring Reader Performance and DRIFT in the AP® English Literature and Composition Examination Using Benchmark Essays
  • Wright, B. D., & Masters, G. N. 1982. Rating Scale Analysis. Chicago: MESA Press
  • Yang, R. (2010). A Many-Facet Rasch Analysis of Rater Effects on an Oral English Proficiency Test. Unpublished doctoral dissertation
  • Yang, Y. & Tsai, C.C. (2010). Conceptions of and approaches to learning through online peer assessment. Learning and Instruction, 20, 72-83
Year 2017, Volume: 8 Issue: 4, 469 - 489, 29.12.2017
https://doi.org/10.21031/epod.328119

Abstract

References

  • Bond, T. G., & Fox, C. M. (2015). Applying the Rasch Model: Fundamental Measurement in the Human Sciences. Mahwah, NJ: Lawrence Erlbaum.
  • Eckes,T. (2011). Introduction to Many-Facet Rasch Measurement. Analyzing and Evaluating Rater-Mediated Assessments. Frankfurt am Main: Peter Lang.
  • Braun, H. I. (1988). Understanding scoring reliability: Experiments in calibrating essay readers. Journal of Educational Statistics, 13, 1–18.
  • Braun, H. I., & Wainer, H. (1989). Making essay test scores fairer with statistics. In J. Tanur, F. Mosteller,W. H. Kruskal, E. L. Lehmann, R. F. Link, R. S. Pieters & G. S. Rising (Eds.), Statistics: A guide to the unknown (3rd ed., pp. 178–188). Pacific Grove, CA: Wadsworth.
  • Casabianca, J. M., Lockwood, J. R., & McCaffrey, D. F. (2015). Trends in classroom observation scores. Educational and Psychological Measurement, 75(2), 311–337. Congdon, P. J., & McQueen, J. (2000). The stability of rater severity in large-scale assessment programs. Journal of Educational Measurement, 37, 163-178.
  • Demirbilek, M. (2015). Social media and peer feedback: What do students really think about using Wiki and Facebook as platforms for peer feedback? Active Learning in Higher Education, 16(3) 211–224
  • Dochy, F. Segers, M., & Sluijsmans, D. (1999). The use of self-, peer and coassessment in higher education: A review. Studies in Higher Education, 24(3), 331-350
  • Engelhard, G. (1994). Examining rater errors in the assessment of written composition with a many‐faceted Rasch model. Journal of Educational Measurement 31 (2), 93-112.
  • Engelhard, G. (1996). Evaluating rater accuracy in performance assessments. Journal of Educational Measurement, 33, 56–70.
  • Engelhard, G., & Myford, C. M. (2003). Monitoring faculty consultant performance in the Advanced Placement English Literature and Composition Program with a many-faceted Rasch model (Research Rep. 03-01). Princeton, NJ: Educational Testing Service
  • Falchikov, N. (1995) Peer feedback marking: developing peer assessment. Innovations in Education and Training International, 32, 175-187.
  • Falchikov, N. (2001). Learning together: Peer tutoring in higher education. London: Routledge Falmer.
  • Falchikov, N., & Goldfinch, J. (2000). Student peer assessment in higher education: A meta-analysis comparing peer and teacher marks. Review of Educational Research, 70, 287–322.
  • Gabrielson, S., Gordon, B., & Engelhard, G. (1995). The effects of task choice on the quality of writing obtained in a statewide assessment. Applied Measurement in Education 8(4), 273-290.
  • Heyman J. E., & Sailors J. J. (2011). Peer assessment of class participation: applying peer nomination to overcome rating inflation. Assessment & Evaluation in Higher Education, 36(5), 605-618.
  • Hafner, J. C., & Hafner, P. M. (2003). Quantitative analysis of the rubric as an assessment tool: an empirical study of student peer-group rating. International Journal of Science Education, 25(12), 1509–1528.
  • Harik, P., Clauser, B. E., Grabovsky, I., Nungester, R. J., Swanson, D., & Nandakumar, R. (2009). An examination of rater drift within a generalizability theory framework. Journal of Educational Measurement, 46, 43-58.
  • Hoskens, M., & Wilson, M. (2001). Real-time feedback on rater drift in constructed response items: An example from the Golden State Examination. Journal of Educational Measurement, 38, 121–146.
  • Kane, J. S., & Lawler, E. E (1978). Methods of peer assessment. Psychological Bulletin, 85, 555-586.
  • Linacre, J. M. (2002). What do infit and outfit mean. Rasch Measurement Transaction, 16(2), 878.
  • Love, K. G. (1981). Comparison of peer assessment methods: Reliability, validity, friendship bias, and user reaction. Journal of Applied Psychology, 66(4),451-457.
  • Lumley, T., & McNamara, T. E (1995). Rater characteristics and rater bias: Implications for training. Language Testing, 12 (1), 54-71.
  • Lunz, M. E., & Stahl, J. A. (1990). Judge consistency and severity across grading periods. Evaluation and the Health Professions, 13, 425--444.
  • McKinley, D.,&Boulet, J. R. (2004). Detecting score drift in a high-stakes performance-based assessment. Advances in Health Sciences Education, 9, 29–38.
  • McLaughlin K., Ainslie M., Coderre S., Wright B., & Violato C. (2009). The effect of differential rater function over time (DRIFT) on objective structured clinical examination ratings. Medical Education, 43, 989–992.
  • McNamara, T. F. (1996). Measuring second language performance. Harlow, UK: Addison Wesley Longman Limited.
  • McQueen, J., & Congdon, P. J. (1997). Rater severity in large-scale assessment: Is it invariant? Paper presented at the annual meeting of the American Educational Research Association, Chicago, IL.
  • Messick, S. (1994). Alternatıve modes of assessment, unıform standards of valıdıty. ETS Research Report Series, 2,1-22.
  • Myford, C. M. (1991). Judging acting ability: The transition from notice to expert. Paper presented at the American Educational Research Association, Chicago IL.
  • Myford, C. M., & Wolfe, E. M. (2003). Detecting and Measuring Rater Effects Using Many-Facet Rasch Measurement: Part I. Journal of Aplied Measurement, 4(4), 386-422.
  • Myford, C. M., & Wolfe, E. M. (2004). Detecting and Measuring Rater Effects Using Many-Facet Rasch Measurement: Part II. Journal of Applied Measurement, 5(2), 189-227.
  • Myford, C. M., & Wolfe, E. M. (2009). Monitoring rater performance over time: A framework for detecting differential accuracy and differential scale use. Journal of Educational Measurement, 46, 371-389.
  • Park, Y. S. (2011). Rater drift in constructed response scoring via latent class signal detection theory and item response theory . Unpublished doctoral dissertation
  • Rowan, B., Harrison, D. M., & Hayes, A. (2004). Using instructional logs to study elementary school mathematics: A close look at curriculum and teaching in the early grades. Elementary School Journal, 105, 103-127.
  • Sadler, P. M., & Good, E. (2006). The impact of self and peer-grading on student learning. Educational Assessment, 11, 1–31.
  • Scruggs, T. E., & Mastropieri, M. A. (1998). Tutoring and students with special needs. In K. J. Topping & S. Ehly (Eds.), Peer-assisted learning (pp. 165–182). Mahwah, NJ: Lawrence Erlbaum Associates.
  • Scullen, S. E., Mount, M. K., & Goff, M. (2000). Understanding the latent structure of job performance ratings. Journal of Applied Psychology, 85, 956-970.
  • Somervell, H. (1993) Issues in assessment, enterprise and Higher Education: the case for self-, peer and collaborative assessment. Assessment & Evaluation in Higher Education, 18(3), 221-233.
  • Topping, K. J. (1998). Peer assessment between students in college and university. Review of Educational Research, 68, 249–276.
  • Topping, K. J. (2003). Self and peer assessment in school and university: Reliability, validity and utility. In Optimizing new modes of assessment: In search of qualities and standards, Edited by: Segers, M. S. R., Dochy, F. J. R. C. and Cascallar, E. C. 55–87. Dordrecht, The Netherlands: Kluwer Academic.
  • Topping, K. J. (2005). Trends in peer learning. Educational Psychology, 25, 631–645.
  • Topping, K. J. (2009). Peer Assessment. Theory Into Practice, 48(1), 20-27
  • Topping, K. J., & Ehly, S. (Eds.). (1998). Peer assisted learning. Mahwah, NJ: Lawrence Erlbaum Associates
  • Tseng, S. C., & Tsai, C.-C. (2007). On-line peer assessment and the role of the peer feedback: a study of high school computer course. Computers & Education, 49, 1161-1174
  • Weaver II, R. & Cotrell, H.W. (1986). Peer evaluation: a case study. Innovative Higher Education, 11, 25-39.
  • Wilson, M., & Case, H. (2000). An examination of variation in rater severity over time: A study of rater drift. Objective measurement: Theory into practice, 5, 113-134.
  • Wolfe, E. W., Moulder, B. C., & Myford, C. M. (2001). Detecting differential rater functioning over time (DRIFT) using a Rasch multi-faceted rating scale model. Journal of Applied Measurement, 2, 256–80.
  • Wolfe, E.W., & Myford, C. M. (2009). Monitoring Rater Performance Over Time: A Framework for Detecting Differential Accuracy and Differential Scale Category Use. Journal of Educational Measurement, 46(4), 371–389.
  • Wolfe, E.W., Myford, C. M., Engelhard Jr. G., & Manalo, J. R. (2007). Research Report No. 2007-2 Monitoring Reader Performance and DRIFT in the AP® English Literature and Composition Examination Using Benchmark Essays
  • Wright, B. D., & Masters, G. N. 1982. Rating Scale Analysis. Chicago: MESA Press
  • Yang, R. (2010). A Many-Facet Rasch Analysis of Rater Effects on an Oral English Proficiency Test. Unpublished doctoral dissertation
  • Yang, Y. & Tsai, C.C. (2010). Conceptions of and approaches to learning through online peer assessment. Learning and Instruction, 20, 72-83
There are 52 citations in total.

Details

Journal Section Articles
Authors

Bengu Börkan

Publication Date December 29, 2017
Acceptance Date November 21, 2017
Published in Issue Year 2017 Volume: 8 Issue: 4

Cite

APA Börkan, B. (2017). Akran Değerlendirmesinde Puanlayıcı Katılığı Kayması. Journal of Measurement and Evaluation in Education and Psychology, 8(4), 469-489. https://doi.org/10.21031/epod.328119