Metrics: An Addendum on RAE / REF

Not everything that counts can be counted, and not everything that can be counted counts...

We have had overwhelming support from a wide range of academics for our paper on why metrics are inappropriate for assessing research quality (200+ as of June 22nd). However, some have also posed interesting follow-up questions on the blog and by email which are worth addressing in more depth. These are more REF-specific on the whole and relate to the relationship between the flaws in the current system and the flaws in the proposed system. In my view the latter still greatly outweigh the former but it is useful to reflect on them both.

Current REF assessment processes are unaccountable and subjective; aren’t metrics a more transparent, public and objective way of assessing research?

The current REF involves, as the poser of the question pointed out, small groups of people deliberating behind closed doors and destroying all evidence of their deliberations. The point about the non-transparency and unaccountability of this process is an important one to keep in mind.

The question is then posed, are metrics more transparent, public and objective? On a surface level, metrics are more ‘transparent’ because they are literally visible (public) and given a number, making them easily rankable. But what they represent, as we argued in our paper, is fundamentally non-transparent given the wide variety of reasons there might be for citing work, and more besides those we cited. In fact, it is the very simulation of transparency in the use of a numerical marker that becomes threatening to the act of actually reading work for assessment purposes. To the extent that they are ‘objective’ – e.g. independent of the judgement of individual subjects – I would argue that this is not the case; rather they are the aggregation of potentially contradictory and opaque individual uses and judgements which are themselves shaped by disciplinary norms, prejudices, etc. Not only are they ultimately subjective, but the judgements on which they are based are not even all being made towards the same end. In this sense, it is not like aggregating individual preferences – because citations are not a mark of preference.

To repeat a little of the paper, absolutely fundamental to this question is what you want to measure when you assess research. If you want to measure ‘originality, significance and rigour’ as the current definition of quality implies, then you cannot use methods always ultimately based on counting citations.

Rather than simply raw citation counts, could composite figures, using or comprised of journal impact factor and h-index be better indicators?

A lot has been written about whether journal impact factors are a mark of journal quality or not, particularly by our esteemed colleagues in the ‘hard’ sciences. I defer to their experience for the large part; my experience of journals in International Relations is that high impact factors do not always correlate with a high quality, open and honest peer review process, nor do they indicate the most intellectually productive way to publish a lot of cutting edge work. In my institution and others, talk has been of creating a list of ‘top ten target journals’ that we must publish in. My understanding is that this is solely based on listing impact factors. Apparently this is more systematic and widespread in other disciplines.

For my own part, such a strategy would likely serve to immediately shape my intellectual output in particular ways depending on the journal – to engage with debates that I consider solved or moribund, to potentially adopt particular methodological strategies or philosophical underpinnings, to forget about interdisciplinary avenues of research that I have been pursuing. NB this is also a problem that exists with the current REF because of the casual practice of prospectively associating a likely * rating with the venue of publication, but it will be strongly reinforced by any move to impact factors and citation counts with no wiggle room for the judgement of the reviewer.

Moreover, journal impact factors are also mediated strongly through the corporate influence of the publishing houses in the market, who use search engines to position their journals prominently and widely in the ‘scholarly market’. The important role of money and corporate strategy in the system of creating journal impact factors is a further warning against attempting to use impact factors as a mark of quality. Criticism of the h-index is also widespread, and I will not repeat that immediately here.

However, it is important to re-iterate that although this are different kinds of counting, even if refined, what they are actually counting is again, fundamentally, not anything to do with quality. 

REF is assessing departments, rather than individuals – might the ‘noise of measurement’ be evened out when aggregated? On average, are works with more citations ‘better’ / higher quality than those without? What might we make of the apparent relationship between RAE scores and H-index scores at an aggregated level?

This is an interesting question, and I note Dorothy Bishop’s work on it, where a correlation is demonstrated between h-indices at a departmental level and the amount of income awarded by the RAE 2008. She and other studies also note that having a panel member is a predictor of getting more income. This begs the question of whether the problems with citations at an individual level get appropriately averaged out at a collective level.

The issue that we raised in our paper was that a non-blinded peer review for RAE/REF was likely to be subject to some of the similar pre-conceptions and prejudices as metrics – i.e. it would judge the names, institutions and journal homes rather than the work itself. So, in our field, the professor at the University of Cambridge writing in the mainstream journal on what drives US foreign policy is automatically judged more favourably than the PhD student from the University of Coimbra writing on the media representation of the Diego Garcia case in an area studies journal, absolutely regardless of the content of the work. In this sense it is unsurprising to see some correlations between h-index and RAE income, if both are strongly shaped by pre-conceptions about the work rather than the work itself.

But this doesn’t get rid of the issue, assuming that at least some of the ‘peer review’ element of previous RAEs were meaningfully dissociated from individual pre-conceptions (can we assume this? J) A key question, as Stephen points out, is the level of variance around this correlation, and whether it is meaningful. The paper written for Universities UK last time by Jonathan Davies argued that although there was a correlation, the variance could be so wide that it was unable to reliably predict merely ‘average’ research from ‘world-leading’ research when applied to individual outputs in various subjects. The spread here might precisely indicate that all of the contradictory things that are going on when we cite work are in fact going on. The upward trend might be related to a common sense of research quality and good environments helping researchers do more, but it might also be related to all of the peripheral factors, including prejudices and preconceptions of various kinds, and the skills of various researchers at self-promotion. There is no obvious reason to reach one conclusion over another here.

It is noted that we do not attempt to quantify the variance of the peripheral factors in our paper; but that is because it is actually impossible to do without going to each case and finding out why particular works were cited in a particular instance, and then seeing if this worked out against out indicators of quality (originality, significance, rigour, as assessed by experts in the relevant sub-field). On this exact question of whether and how there is a correlation between other measures of quality and metrics, the only honest position to take would be an agnostic one.

At this point, I would also re-iterate some of the arguments from the consequences of adopting metrics in research assessment for researchers and institutions. As noted before, a formal move to metrics would immediately narrow the kinds of research I might pursue and where it was published – our institutions would have no qualms in formalising the informal practice of pressuring people to publish in particular outlets, and this would also put an intense amount of pressure on particular journals in terms of volume of submissions. We would also see widespread investment in self-promotion technologies and energies. All of this is likely to be of detriment to thoughtful, innovative and independent research.

What are the alternatives that we might pursue, given the flaws with the present REF, and with metrics?

The big question, embedded in a wide range of political questions about the relationship of government to universities, of both to ‘society’, and of all of these to the circulation of money, entitlement, obligation and accountability. It is also seemingly sometimes performed through an unhelpful polarisation between nostalgic dinosaurs of the past and the bureaucratic techno-fantasists of the future. For my part, I am interested in pursuing a higher education agenda which is democratic in orientation; to this extent it should serve communities and the wider public for the purposes of education, discussion, critique and consciousness-raising; it should cultivate knowledges that help us make sense of the worlds we live in; it should be a safe and autonomous space for all of the above. To this extent, the engagement of universities with its various constituencies should be real, and to some extent a pressure in this direction appears to emerge from the broader neoliberalisation discourse. However, and unsurprisingly, the modalities of governance precisely inhibit it from accomplishing many of these things in a meaningful way.

Putting all of this aside for the moment, and assuming that we all agree on the need for research assessment, and assuming we are only talking about journal articles for now (another big assumption, especially given the political economy of journal publishing), there could be some ways of making qualitative assessments more transparent and honest.

The first step is of course to remove author and publisher information from submissions. Whilst of course they may still be recognisable I think this makes an important statement and most panel members will take the appropriate cue. A further simple but radical thing to do for journal articles might be to actually publish the anonymised peer reviews and an account of the publishing process alongside the actual article (rounds of review, editors’ letters), or at very least submit them to an assessment panel. I would have assessment panel members write a brief note as to why they awarded the grade they did (we demand no less from ourselves when marking undergraduate exams). This would have the benefits of opening up both research assessment and journals’ editorial processes to the academic communities which they serve. This of course might create all sorts of secondary problems which I haven’t yet fully thought through – mostly for the people administering judgements, but my instinct is that actually these secondary problems already exist behind closed doors in the REF. Actually, by putting academic judgements out there in public, we would be forced to reckon with our disciplinary communities and with public onlookers and I think it would keep us honest and dedicated. This doesn’t solve the onerousness of the task – but then if we think there are good reasons for carrying out detailed research assessment in order to allocate public research funding based on quality then this should be appropriately resourced (expand the panels; give them fewer pieces and more time).

I also think that it would be a good idea to reduce the number of pieces for the cycle in order to allow bigger and more interesting works to be developed (especially in my discipline), particularly in the context where auditing and teaching demands are increasing. It is interesting that members of HEFCE themselves have openly argued that academics are publishing too much. I find myself simultaneously excited and annoyed by what is churned out of an ever-expanding publishing machine, clearly in the UK driven by the research assessment culture.

To summarise then, the current system is flawed, a move to metrics even more so. We can do better, I think, even within the dubious parameters of the current system. But the real prize is a better and more open political discussion about the place of the universities in our society. The mainstream political parties have found themselves profoundly unable to disagree with each other on this front, all accepting a bureaucratic mantra of accounting and evaluating as the primary task of government, and universities as only a productive extension of a ‘knowledge economy’ in which all outputs are quantifiable, eventually in terms of GDP. The task for us now is to articulate and practice alternative visions of the university which remain true to its transformative and democratic potential as a space of collective inquiry and learning.

6 thoughts on “Metrics: An Addendum on RAE / REF

  1. Pingback: Why Metrics Cannot Measure Research Quality: A Response to the HEFCE Consultation | The Disorder Of Things

  2. Thank you for this (and the previous paper as well).
    May it be that the underlying issue to these issues is that, as far as research evaluation is concerned, there is a call for a judgement somewhere. And, when a judgement is called, there exist decisions to privilege some dimensions over others. It is why a judgement is grounded on some values and there are no absolute values but a few (i.e. universal values such as the respect of life and human dignity).
    We may be reassured by the flow of numbers in metrics (and alti-metrics as well) and think that they guarantee objectiveness: yet, as this paper shows, metrics simply shift the judgement from the evaluation process to the design of metrics themselves.
    In other words, we need to decide where we want to go with research (and this is a political decision) and then openly debate which focuses on the values (I am thinking at Flyvbjerg’s considerations on phroneses). We may then come to the point that there is no such thing as objectiveness, but a continuous negotiation on the values we privilege. And we may find out that calls for objectiveness are often the way to conceal judgement within the process – and thus create an image of impartiality for privileging some values over others.

    Like

  3. Pingback: Impact of Social Sciences – The rejection of metrics for the REF does not take account of existing problems of determining research quality.

Leave a comment