BY JOHN W. LAWRENCE
In an article published in the May/June 2018 issue of Academe entitled Student Evaluations of Teaching Are Not Valid, I briefly reviewed the literature on whether student evaluations of teaching (SET) are good measures of teaching effectiveness. They are not. First, SET scores reflect race, gender, age and other biases of students. As detailed in another article in the same issue of Academe by Ronald Cordero entitled Surveys of Student Opinion: Portal for Prejudice, using SET scores in personnel decisions can result in prejudicial treatment of faculty members. Second, in the few randomized experimental studies that have been done testing whether SET scores predict future student academic performance, SET scores were negatively correlated with student academic performance. That is, the students taught by professors who received relatively lower SET scores at Time 1 (e.g., French 101) performed better at Time 2 (e.g., French 102). Thus, there is no evidence that professors receiving high SET scores are better teachers.
I have observed the adverse consequences of the use of SET scores in personnel decisions in two contexts. First, as departmental chair I witnessed the P&B discuss the meaning of SET scores while deliberating about reappointing faculty members. It appeared to me that as a group we used arbitrary cutoff scores to define “good” teaching. Second, as a grievance counselor, I have seen cases in which SET scores were cited as evidence that adjunct or untenured professors were poor teachers and thus their contract should be cancelled. This practice can have devastating consequences for people’s careers and livelihood.
Dropping SET scores as a metric of teaching does pose a conundrum for universities. To my knowledge, there is no easy-to-administer reliable objective method for evaluating quality teaching. Peer evaluations are often mentioned as a possible solution. But peers also have biases. And in reality, there is often an unstated power dynamic underlying “peer evaluations,” in which tenured professors are evaluating adjunct and untenured professors. Moreover, tenured professors are not necessarily expert teachers.
I am interested in hearing others’ thoughts and experiences with SET scores. What is your experience with SET scores? Have you had any experience in which the information gathered through SET scores was used constructively? Do you have opinions on how to create valid measures of quality teaching? Do you think there are more effective ways to promote better student learning outcomes such as fostering learning communities or lower class sizes?
Below I have listed some of the papers I referenced in the Academe article.
Boring, A., Ottoboni, A. & Stark, P. B. (2016). Student evaluations of teaching (mostly) do not measure teaching effectiveness. ScienceOpen Research, 1-11. doi: 10.14293/S2199-1006.1.SOR-EDU.AETBZC.v1
Braga, M., Paccagnella, M., and Pellizzari, M. (2014). Evaluating students’ evaluations of professors. Econ. Educ. Rev. 41, 71–88. doi: 10.1016/j.econedurev.2014.04.002
Carrell, S. E., & West, J. E. (2010). Does professor quality matter? Evidence from random assignment of students to professors. J. Polit. Econ., 118, 409–432. doi: 10.1086/653808
Kornell, N. & Hausman, H. (2016). Do the best teachers get the best ratings? Front. Psychol. 7:570. doi: 10.3389/fpsyg.2016.00570
Stark, P. B.& Freishtat, R. (2014). An evaluation of course evaluations. ScienceOpen Research, 1-7. doi: 10.14293/S2199-1006.1.SOR-EDU.AOFRQA.v1
Stroebe, W. (2016). Why good teaching evaluations may reward bad teaching: On grade inflation and other unintended consequences of student evaluations. Perspect Psychol Sci., 11, 800-816. doi: 10.1177/1745691616650284
Youmans, R. J. & Jee, B. D. (2007). Fudging the numbers: Distributing chocolate influences student evaluations of an undergraduate course. Teaching of Psychology, 34, 245-247. doi: 10.1080/00986280701700318
Guest blogger John W. Lawrence teaches psychology at the City University of New York College of Staten Island. He is also a grievance counselor for the union representing CUNY faculty and staff, the Professional Staff Congress.
Articles from the current and past issues of Academe are available online. AAUP members receive a subscription to the magazine, available both by mail and as a downloadable PDF, as a benefit of membership.
A problem with using SET scores is that every questionnaire I’ve seen in 30+ years of teaching asks the wrong questions, especially ones students are not competent to answer. I’ve advocated for questionnaires that ask students what they do know, and what only students know. They should be questions like 1. How often is the instructor late for class, 2. How often does the instructor deviate from the syllabus, 3. Do assignments cover required course material, etc. I’ve seen some very good teachers who regularly get low scores, and I’ve seen some very bad teachers pass through, because their colleagues don’t know how often they cancel class, show up late, or fail to talk about relevant course material. By the way, my efforts to reform the questionnaires have never succeeded.
I agree and use truly helpful questions on the SETs that I control. These SETs ask how often class gets cancelled, and if the teacher helped motivate you to learn. The SETs also asks whether the student did a library research assignment, etc. I’d be curious why your efforts to reform the SET questions have been unsuccessful. Who blocks your efforts and on what grounds? I did not face that problem, fortunately!
My efforts never got past the department level. I’m not sure whether it was just inertia, or something more particular.
It is difficult to see good alternatives to the status quo (and asking questions about deviating from the syllabus strikes me as a terrible standard to impose on teachers–it’s often good to deviate). One question I have: is most of the problem with SET simply an issue of the incentive to inflate grades for better ratings? If we adjust SET by accounting for the expected average grade, would that solve many of these problems? Why don’t universities do that?
Who said deviating from the syllabus is bad? Also, I have been at places where grades were used to evaluate the SET scores. It seemed to make little difference. The point of changing the questionnaire is to get objective information that students can supply.
There are so many issues related to Student Evaluation of Teaching (SET): lack of validity & reliability, treating ordinal data like it is continuous, how they are managed, how they are used, their inherent bias, their assumption that there is a singular view about the instructor and course, their tendency to make instructors more lenient graders, and how they reduce critical thinking within courses. One of my favorite articles about SET is from 1959.* Yes, we’ve known that SET are not valid measures of the quality of teaching for six-decades! We could argue that the alternatives are not viable but the truth is that SET are “easy” to implement and use. Administrators (and even faculty) seem willing to disregard SET’s inherent issues. I have several theories about why institutions of higher education continue to use SET – it could be a little bit of this is how we’ve always done it… it could be that most administrators in HE are white males and they feel like the biases don’t affect them. In reality, there are probably combinations of reasons why we continue to use SET. I can offer an alternative which at the least indicates that for one course there are multiple, divergent viewpoints – https://www.academia.edu/36718383/Improving_Student_Evaluation_of_Teaching_Determining_Multiple_Perspectives_within_a_Course_for_Future_Math_Educators.
*Vanderpol, J. (1959). Student Opinion—Sacred cow or booby trap? Journal of Teacher
Education, 10(4), 401-412.