Automated decision-making, and unintended consequences

Amongst the many dramatic events of 2020, the controversy surrounding the release of A-level results in England, Wales and Northern Ireland stands out as particularly memorable. When the examinations planned for the summer were cancelled due to COVID-19 it became necessary to find an alternative way to produce student grades. The schools regulator Ofqual developed a standardisation algorithm to do this but when the resulting grades were released, they proved so unpopular that they were abandoned less than a week later. The episode serves to illustrate how unintended negative consequences can arise from automated decision-making processes.

The Ofqual algorithm was designed to moderate teacher-estimated grades whilst also limiting grade inflation. It was set to produce results that were similar (on a school and national level) to previous years. When the results were released, there was immediate outcry because large numbers of students – around 40% – received lower grades than their predictions. This left many fearful of losing University places and job opportunities. In addition, whilst the results did limit grade inflation on a national level, private schools experienced a larger increase in the number of students getting top grades and many felt that state school students were more affected by downgraded marks than private school students.

It is possible to see how these problems arose. The algorithm was designed to moderate a ranked list of teacher-estimated grades and produce results that were similar to previous years for each school. This meant that, once a school had met its ‘quota’ for a certain grade, students with the same estimated grade but ranked below others in the school, were at risk of being downgraded. This had very severe impacts for large numbers of individual students. It also failed to acknowledge schools that had achieved improvements in performance from previous years. These rapidly improving schools were more likely to be in the state sector so this is one reason why the algorithm appeared to disadvantage state school students. A further reason is that the algorithm gave greater weighting to teacher-estimated grades where class sizes were very small. This was designed to accommodate for fluctuations that would be more pronounced where only a small number of students were taking the exam. Small class sizes tend to be more common in the private sector and teacher-estimated grades tended to be more generous than the algorithm-moderated marks, so this was an additional way in which the algorithm had different consequences for different students.

Algorithmically-driven processes for automated decision-making are becoming increasingly prevalent in modern society and offer various efficiency benefits. However, as the A-level case vividly illustrates, they frequently have unintended consequences that can be to the detriment of Individuals and communities. These consequences can be systematic and disadvantage particular demographic groups. Ofqual stated that the algorithm could not be biased as information about whether a school was private or state was not included; however other factors such class size and previous performance can serve as proxies for this information and be consequential for the outcomes delivered.  Amongst these negative concerns a more positive implication that can be drawn from the A-level case is the extent to which the general public engaged with it. There was massive media coverage at the time and a great of deal of public interest and discussion over the process through which the results had been generated. The word algorithm itself became more of a household term. This can be seen as a very positive step since wide-scale public engagement is necessary to foster a societal-level discussion about the role we want automated decision-making to play in our lives.

This article was originally written for Inspired Research, the newsletter of the Department of Computer Science at the University of Oxford