Last month I posted an item about two studies demonstrating that student evaluations may not be the best way to measure either student learning or instructor effectiveness. One of those studies was co-authored by Philip Stark, chair of the statistics department at the University of California, Berkeley. He has now co-authored with Anne Boring and Kellie Ottoboni another study which adds gender bias to the list of such evaluations’ documented flaws. Colleen Flaherty reports on the study on Inside Higher Ed this morning. Here are her opening paragraphs:
There’s mounting evidence suggesting that student evaluations of teaching are unreliable. But are these evaluations, commonly referred to as SET, so bad that they’re actually better at gauging students’ gender bias and grade expectations than they are at measuring teaching effectiveness? A new paper argues that’s the case, and that evaluations are biased against female instructors in particular in so many ways that adjusting them for that bias is impossible.
Moreover, the paper says, gender biases about instructors — which vary by discipline, student gender and other factors — affect how students rate even supposedly objective practices, such as how quickly assignments are graded. And these biases can be large enough to cause more effective instructors to get lower teaching ratings than instructors who prove less effective by other measures, according to the study based on analyses of data sets from one French and one U.S. institution.
The full study is available online and includes the following abstract:
Student evaluations of teaching (SET) are widely used in academic personnel decisions as a measure of teaching effectiveness. We show:
- SET are biased against female instructors by an amount that is large and statistically significant
- the bias affects how students rate even putatively objective aspects of teaching, such as how promptly assignments are graded
- the bias varies by discipline and by student gender, among other things
- it is not possible to adjust for the bias, because it depends on so many factors
- SET are more sensitive to students’ gender bias and grade expectations than they are to teaching effectiveness
- gender biases can be large enough to cause more effective instructors to get lower SET than less effective instructors.
These findings are based on nonparametric statistical tests applied to two datasets: 23,001 SET of 379 instructors by 4,423 students in six mandatory first-year courses in a five-year natural experiment at a French university, and 43 SET for four sections of an online course in a randomized, controlled, blind experiment at a US university.
The study concludes:
In two very different universities and in a broad range of course topics, SET measure students’ gender biases better than they measure the instructor’s teaching effectiveness. Overall, SET disadvantage female instructors. There is no evidence that this is the exception rather than the rule. Hence, the onus should be on universities that rely on SET for employment decisions to provide convincing affirmative evidence that such reliance does not have disparate impact on women, under-represented minorities, or other protected groups. Because the bias varies by course and institution, affirmative evidence needs to be specific to a given course in a given department in a given university. Absent such specific evidence, SET should not be used for personnel decisions.
Indeed, given the by now enormous literature confirming the at minimum problematic utility of student evaluations of teaching as a meaningful measure of either instructor effectiveness or student learning, the only remaining justification for their continued employment is if one believes students are nothing but “customers” who need to be kept content, rather than learners who need to be inspired, guided, and, well, taught.