Article Title

Inter- and Intra-Rater Consistency: Armies of Graduate TAs Grading in First Year


In the introductory chemistry course for first year students at the University of Guelph, written answer questions are included on midterm and final exams, despite the logistical hurdles involved in grading 1800-2400 students’ solutions. The process was improved with the development of a customized Scantron® form on which students wrote their answers. Teaching assistants (TAs) graded students’ work and bubbled in the grades on the sheet which was then read into the computer. As well as improving the grade entry process, it also allowed for students to have their results (and grader comments) conveniently emailed to them. For this study, we have used one of the Scantron® fields as a “TA identifier” to correlate grading with the specific grader. This allows for comparison between TAs’ grading rate, grade averages, variances, and distributions. Group and individual trends were also observed over time. Averages for four questions increased by 0.1-2.4 % points between the first and second hour of grading, and the probability of the more extreme findings occurring without a correlational link was 53-92%, based on c2 tests. In the same manner, the probabilities that differences in distributions between the population and a particular TA’s sampling occurred by chance ranged from ~0-99 %. We discuss these results and how they may impact our confidence in the final grade assigned to a particular student. We also aim to use these results to develop new statistical treatments of inter-rater consistency for large sample sizes that require minimal to no exams to be graded multiple times, resulting in saving time when studying a large number of graders.

This document is now available on OJS