Everyone knows course evaluations are flawed (but we still use them)

Mark Reid

Friday 12 April 2024

They are unreliable, biased, misogynistic and discriminatory: lecturers are fed up with course evaluations. ‘This has been known for 50 years, and still we keep using them.’

Course evaluations are flawed and lead to great dissatisfaction, an enquiry by Mare among lecturers and programme committees reveals. The response rate is low and answers are often biased or sexist. It is also questionable to what extent the assessments are reliable, as online evaluation forms can be filled out by anyone and multiple times.

‘Here, theses are assessed digitally and the response rate for evaluations is dramatically low,’ says Cynthia van Vonno, chair of the Political Science Programme Committee. It is a common complaint. During the Covid-19 period, the university switched to digital course evaluation. Although they have gone back to in-person education, some programmes and faculties have not returned to paper evaluations.

And programme committees notice this in the number of completed evaluations. ‘You can easily get a response from only five students,’ says Van Vonno. ‘That’s why we, as a programme committee, insisted that if there is a joint concluding moment for a course, there should be paper evaluations. Otherwise, they’re of no use to us.’

Philosopher Frank Chouraqui is also troubled by the low response rate. As chair of the Basic Teaching Qualification (BKO) committee of the Faculty of Humanities, it is his responsibility to assess whether academics in training are qualified to give lectures, partly on the basis of evaluations. ‘The number of respondents has gone down, and that increases the bias in the answers. If you have a group of two hundred students with only twenty of them filling out an evaluation, statistically, the results are useless. Based on that, we as the BKO committee can’t reach a reasoned judgement.

‘We receive fewer evaluations, and the evaluations we do receive are more likely to be biased’

‘On top of that, it’s only the most enthusiastic and the most critical students who do fill out the survey. Statistically speaking, it’s to be expected that within those two extremes, there are more students who also have a bias, for example a gender bias. So as a committee, we receive fewer evaluations that we can use to avoid our own bias, and the evaluations we do receive are more likely to be biased themselves.’

To the extent that the evaluations are indeed completed, Van Vonno is also dissatisfied. ‘The first few questions on the evaluation form were laid down at the highest level and designed to allow for comparison between programmes. As an institute or faculty, we’re not allowed to change those. However, we feel that many of those questions are not at all relevant: they’re so open that you could assess someone based on their clothing or personality. They’re not really of much use to a programme committee. We also wonder what exactly it is that they're comparing.’

DISCRIMINATION

Last year, the council of the Faculty of Governance and Global Affairs (FGGA) already raised questions about the quality of evaluations. Students left offensive comments, such as ‘The lecturer sucked ass!’ or ‘Liked that tight little dress she had on during the last lecture’.

The lecturers Mare is speaking to now also say they sometimes receive discriminatory comments. ‘I don’t want to go into too much detail about all the things I’ve seen, but I’ve definitely read comments that were hurtful or specifically directed at me,’ says Aris Politopoulos, chair of the Archaeology Programme Committee. ‘And I’m a white man, so it’s quite difficult to say anything really discriminatory. However, those comments are a very small minority, maybe only a handful over the last nine years. But I hear from colleagues that they receive discriminatory comments in evaluations quite often.’

‘Sometimes students write in their evaluations that a lecturer is bad because they received a bad grade or they don’t feel motivated,’ says philosopher Chouraqui. ‘In the negative evaluations, I also see a lot of students who are used to dominating a conversation. If a lecturer – especially a female lecturer – doesn’t allow that, it seems they complain about it.’

‘It’s true that you could fill out the form multiple times’

‘We definitely observe a bias against women,’ agrees Marcel van Daalen, chair of the Astronomy Programme Committee. ‘According to research, this has been known for 50 years. And still, we keep using the same system.’

Astronomy does not use online assessments, so the response rate is high there. But that does not solve the problems, says Van Daalen. ‘It makes it somewhat easier to pick out the outliers. If you have a low response rate with only those extremes, that’s obviously a big problem. But even with a high response rate, there is still a clear bias against women. It’s something we don’t really know how to solve.’

ANONYMOUS

The fact that the evaluation forms are anonymous does not just mean that students feel free to make hurtful comments about lecturers. Online forms can undermine the reliability of all evaluations. Because several lecturers distribute online evaluations with a link or QR code that are the same for all students and do not require them to log in. This guarantees the anonymity of the contributors, but it also means that vindictive students are able to leave very negative comments multiple times, or even that a lecturer can give themselves a good score.

Van Vonno is aware of this, but responds nonchalantly nonetheless. ‘It’s true that you could fill out the form multiple times. So if you dislike a lecturer or you’re a huge fan, that might affect the results. But since the response rate for the digital forms is so dramatically low at Political Science, I’m not too worried about it.’

Text continues below

MEN SCORE HIGHER THAN WOMEN

Mario de Jonge is a researcher at the Leiden University Graduate School of Teaching (ICLON), where, among other things, he examines the effectiveness of course evaluations, also in collaboration with LLInC, the university body responsible for evaluations.

‘There is a large number of study programmes where student engagement is quite low. Because those evaluations are so frequent, there is a certain level of evaluation fatigue among students. As a result, you see that they don’t really pay attention when filling them out. Suppose a scheduled guest lecturer doesn’t show up for a lecture; a majority of the students will still fill out an assessment on the lecturer’s teaching.

‘A meta-analysis shows that there is in fact little evidence of a link between how students assess the quality of teaching and the actual performance. Some studies even suggest that there may be an inverse correlation. If a lecturer raises the bar, evaluations may drop despite these higher demands proving beneficial in the long run.

‘There are also various forms of bias. In general, men are rated higher than women. And if you’re a bit more lenient when it comes to grading, word quickly gets around and your evaluation can go up. Once, an experiment was conducted where chocolate biscuits were handed out. That also resulted in higher evaluations.

‘One might wonder whether a grade as an evaluation is of much use to a lecturer, or whether it’d be more useful to be given concrete guidance on how to adjust your teaching. Now, the only thing you know is that something was rated poorly.

‘If a course has been taught by the same lecturers for four consecutive years, does it need to be re-evaluated every year? You might also choose to evaluate it less frequently. And when you do, you might choose a broader and more intensive approach.’

Despite their criticisms, all lecturers agree that it is important to ask students for their opinions. As a member of project group Academia in Motion, Van Daalen is actively looking for an alternative. ‘We need to find a way to evaluate teaching that has fewer biases and provides more information. It’s important that we receive feedback from students, but we want to consider more ways of assessing teaching. Think of lecturers evaluating each other’s lectures, for example. That way, you get a good sense of whether the course and teaching methods align with the rest of the curriculum. That is something that’s currently lacking.’

Alternatives

Politopoulos is already experimenting with other forms of evaluation. ‘In my lectures, I try to create an open discussion where everyone is free to express their thoughts on the course. This comes with certain drawbacks of course, like the fact that it’s not anonymous, but I believe those discussions are very useful. Students openly express their criticisms and share useful tips for improvement. I’ve never encountered any kind of discrimination in those discussions, people are much more respectful in person. Plus, students feel they are taken more seriously in an adult discussion.’

‘I don’t think we as the BKO committee want to get rid of evaluation forms altogether,’ says Chouraqui. ‘But to be honest, I’ve never met anyone at the faculty who wasn’t sceptical about the evaluations. Even in the best possible case, you can’t escape useless comments, because you still want students to have their say. That being the case, we as readers need to be wise.’

‘REMOVING RACISM IS NOT UP TO US’

Mare asked the Leiden Learning & Innovation Centre (LLInC), the university body responsible for evaluations, to comment on the lecturers’ concerns.

Data and AI manager Michiel Musterd explains that ‘the usefulness of results in the case of a very low response rate depends on a number of factors. Not only the absolute number of completed evaluations but also the uniformity of the ratings plays a role. In general, quantitative evaluations are not necessarily of scientific value, but mainly serve a signalling function to identify possible lower-scoring aspects of a course.’

The LLInC is aware that evaluations are sometimes completed multiple times, according to Musterd. ‘We offer the faculties three different options for evaluations: on paper, via a generic link and via a personal link. A personal anonymous link can only be used once. In the case of paper evaluations, it’s already slightly less foolproof, but it would be difficult to fill out multiple forms because students are only handed one evaluation form. With a generic link, this becomes a little easier, but it still requires a considerable effort from students to fill out the full form multiple times and in such a way that it does not stand out statistically during processing. Ultimately, it’s up to the faculties or study programmes themselves to choose which way they want to evaluate and to make the trade-off between, among other things, the risk of students filling out the form multiple times and the ease of distributing the evaluation forms.’

Regarding hurtful or discriminatory comments that lecturers sometimes receive, Musterd says the following: ‘We understand that this can be highly unpleasant for lecturers. However, the students themselves are responsible for what they write, even if it’s anonymous. We provide the programmes with uncensored evaluation reports. They can choose to remove sexist, racist or offensive comments themselves before sending the report to other concerned parties, such as lecturers. It’s not up to us to change or remove answers.’

In collaboration with Mario de Jonge, LLInC is conducting experiments into other forms of assessment, for example, an evaluation during the course instead of at the end, where students themselves can choose which topics they want to comment on in an open-ended question. Those answers can then inform the lecturer about possible areas of concern.

DISCRIMINATION

ANONYMOUS

Alternatives

Lees ook

Please rotate your phone, our website will look much better that way!