Why Metrics Cannot Measure Research Quality: A Response to the HEFCE Consultation

Pacioli Euclid Measurement

Update 24th June: 7,500+ views, 100s of shares, 200+ signatories! And a new post with some responses to further issues raised.

The Higher Education Funding Council for England are reviewing the idea of using metrics (or citation counts) in research assessment. We think using metrics to measure research quality is a terrible idea, and we’ll be sending the response to them below explaining why. The deadline for receiving responses is 12pm on Monday 30th June (to metrics@hefce.ac.uk). If you want to add an endorsement to this paper to be added to what we send to HEFCE, please write your name, role and institutional affiliation below in the comments, or email either ms140[at]soas.ac.uk or p.c.kirby[at]sussex.ac.uk before Saturday 28th June. If you want to write your own response, please feel free to borrow as you like from the ideas below, or append the PDF version of our paper available here.


Response to the Independent Review of the Role of Metrics in Research Assessment
June 2014

Authored by:
Dr Meera Sabaratnam, Lecturer in International Relations, SOAS, University of London
Dr Paul Kirby, Lecturer in International Security, University of Sussex

Summary

Whilst metrics may capture some partial dimensions of research ‘impact’, they cannot be used as any kind of proxy for measuring research ‘quality’. Not only is there no logical connection between citation counts and the quality of academic research, but the adoption of such a system could systematically discriminate against less established scholars and against work by women and ethnic minorities. Moreover, as we know, citation counts are highly vulnerable to gaming and manipulation. The overall effects of using citations as a substantive proxy for either ‘impact’ or ‘quality’ could be extremely deleterious to the standing and quality of UK academic research as a whole.

Why metrics? Why now?

1. The rationale for looking at metrics as a “potential method of measuring research quality and impact” (Consultation letter, section 1) is somewhat opaque in the consultation letter. This letter notes that some people may use metrics to assess research, and that the Secretary of State wishes to look at the issue again. The previous review on the matter in 2008/9 concluded that the ‘data was insufficiently robust’ to adopt their use. 

2. To speak more precisely, we might consider the following underlying rationales as driving this general interest:

  • The research assessment exercises conducted at a national level (RAE 2008; REF 2014) and at institutional levels are difficult, time-consuming, expensive and laborious because they consume large quantities of academic energy. Universities and academics themselves have complained about this.
  • Ministers, civil servants, research administrators and managers might prefer modes of assessment that do not require human academic input and judgement. This would be cheaper, not require academic expertise and would be easier to administer. This would facilitate the exercise of greater administrative control over the distribution of research resources and inputs.
  • Moreover, in an age of often-digitised scholarship, numerical values associated with citations are being produced – mostly by data from large corporate journal publishers – and amongst some scholarly communities at some times they are considered a mark of prestige.

3. This present consultation proposes to take views on the use of metrics – for the most part meaning citation counts – to prospectively incorporate these into mechanisms of research assessment once more. In particular, they want to look at ‘research quality and impact’ as areas in which research should be assessed.

4. We suggest that it is imperative to disaggregate ‘research quality’ from ‘research impact’ – not only do they not belong together logically, but running them together itself creates fundamental problems which change the purposes of academic research.

5. We also want to note a contradiction in different reasoning for using metrics. On the one hand, one position seems to be that we should be using metrics as a source of ‘big data’ we don’t currently have to produce different judgements about what good academic research is. On the other hand, the argument is that metrics do actually replicate the outcomes of peer review processes so approximate a cheaper and quicker way of doing the same thing. There is an important tension here: the former reasoning implies we want to change what we think good academic research is and a downgrading of peer review processes; the latter implies that peer review is still the key standard for assessing research but we want to do it (or something like it) more quickly. The Review team need to make a clear determination on which of these objectives it is pursuing.

Using metrics for measuring impact: what are we actually measuring? 

6. Why do academics cite each others’ work? This is a core question to answer if we want to know what citation count metrics actually tell us, and what they can be used for. Possible answers to this question include:

  • It exists in the field or sub-field we are writing about
  • It is already well-known/notorious in our field or sub-field so is a useful reader shorthand
  • It came up in the journal we are trying to publish in, so we can link our work to it
  • It says something we agree with/that was correct
  • It says something we disagree with/that was incorrect
  • It says something outrageous or provocative
  • It offered a specifically useful case or insight
  • It offered a really unhelpful/misleading case or insight

7. As an example, an extremely widely cited piece in the field of International Relations is Samuel Huntington’s book on ‘The Clash of Civilizations’. This has been one of the most controversial pieces in the discipline, and has probably been cited for all of the reasons above (he initially published a short version in Foreign Affairs journal). As of today, GoogleScholar lists 22,353 citations to the book or article. Amongst these citations are an extremely large number of ‘negative’ ones criticising the research and critiquing the piece for its gross simplifications, inflammatory political claims, selective and problematic reading of the historical record, cultural essentialism and neglect of multiple other issues such as the global economy. After 9/11 however, various non-academic readers seized on some of the broad arguments to suggest a perennial struggle between Christianity and Islam, as validated by a famous Harvard professor (with no academic background on either of these religions). This no doubt has contributed to a political climate which has facilitated military interventions in the Middle East and more aggressive attitudes towards religious diversity from members of different religions. On the other hand, much more detailed and nuanced work exists based on solid historical evidence and knowledge of contemporary relations, which will have many fewer citations due to publishing outlet, the profile of the author, and the less outrageous, if much more rigorous, findings. These accumulated citations to Huntington clearly indicate that the texts have been central to networks of scholarly argument about world politics in recent decades, and we might learn much from that fact. But this is no measure of quality, not even one of ‘popularity’ (if we understand that to carry positive connotations).

Metrics and the measurement of impact

8. Based on the analysis in points 6 and 7 above, it is clear that citation counts can be one way of thinking about the generic ‘impact’ of an academic piece on a field. However, in their current form they cannot properly differentiate between ‘positive’ impact or ‘negative’ impact within a field or sub-discipline – i.e. work that ‘advances’ a debate, or work that makes it more simplistic and polarised. Even where there is some inclusion of ‘positive’ or ‘negative’ evaluation, such crude forms of voting miss the complexities of much scholarly work (such as where others might find the empirical discussion useful, but reject the theoretical framing or inferences drawn). Without such fine-grained information on the actual contribution of a piece to a debate, it would be very short-sighted to suggest that aggregate citations are any grounds for awarding further funding or prestige. Indeed, the overall pressure it creates is simply to get cited at all costs. This might well lead to work becoming more provocative and outrageous for the sake of citation, rather than making more disciplined and rigorous contributions to knowledge.

9. Moreover, we must be clear to differentiate between this kind of academic ‘impact’ and the public ‘impact’ sought in terms of the present REF case studies. Citations can tell us about academic citations – themselves a mixture of good and bad – but they can tell us very little about the public engagement and contribution made by particular pieces of work for non-academic communities in society. To the extent that the ‘impact case studies’ in the REF genuinely seek to open the door for academic work to better engage with the society in which it is embedded, citation counts cannot be used as a way of judging this at all. This is especially the case where academics are trying to work with small-scale and grassroots organisations rather than governments or international organisations. Wider forms of alternative metrics like number of social media shares extend the definition of impact, but are also likely to be driven by controversy, and are even less likely to reflect the underlying academic quality of pieces (since the audience is generally less expert than for scholarly citations).

Metrics and the measurement of research quality

10. It should be further evident that because of what citation counts actually measure, these are not an appropriate proxy for research quality. The current REF asks its panel members to apply criteria of ‘originality, significance and rigour’. These are broadly the same kind of criteria that expert peer reviewers apply when reviewing book manuscripts or journal articles.

11. On ‘originality’ – work may be cited because it is original, but it may also be cited because it is a more famous academic making the same point. Textbooks and edited collections are widely cited because they are accessible – not because they are original. Moreover, highly original work may not be cited at all because it has been published in a lower-profile venue, or because it radically differs from the intellectual trajectories of its sub-field. There is absolutely no logical or necessary connection between originality and being cited.

12. On ‘significance’ – ‘significance’ also seems to imply the need for broad disciplinary recognition of the contribution. To this extent, we might expect ‘significant’ work to have a high citation count; however having a high citation count does not mean that the work is ‘significant’. In addition, using citation counts will systematically under-count the ‘significance’ of work directed at more specialised sub-fields or technical debates, or that adopts more dissident positions. Moreover, when understood through the problems discussed in point 8, it becomes clear that ‘significance’ can be a distinctly ambiguous category for evaluating research quality. If we understand ‘significance’ as ‘academic fame’ then there is some kind of link with citation counts. However, if we understand ‘significance’ as ‘the development of the intellectual agenda of the field’ (REF panel criteria), then citation counts are not an appropriate proxy. In addition, as is well-known, in fields with long citation ‘half-lives’ – particularly arts and humanities, present research assessment cycles are far too short for the ‘significance’ of the work to emerge within citation counts, if it was going to do so.

13. With regard to ‘rigour’, there is also no necessary connection between citation counts and this aspect of research quality. To the extent that citation counts in part depend on how widely-read a journal is, and to the extent that widely-read journals may apply exacting peer review standards, and to the extent that these peer reviews are focused on the ‘rigour’ of a piece, there is again a potential or hypothetical link between a citation count and ‘rigour’. However, there are a lot of intervening variables within here, not least those discussed in point 6, which would disrupt the relationship between the number of times a piece is cited and how rigorous it is. To the extent that more ‘rigorous’ pieces may be more theoretically and methodologically sophisticated – and thus less accessible to ‘lay’ academic and non-academic audiences, there are reasons to believe that the rigour of a piece might well be inversely related to its citation count. To summarise, citation counts are not a reliable indicator of rigour.

14. Overall then, upon close examination the relationship between citation counts and our historic and current definitions of academic research quality is extremely weak in logic, and problematic in practice. Notwithstanding that in certain disciplines the practice of using citations as a proxy for quality as taken hold, the practice is itself fundamentally flawed and should not be encouraged, much less institutionalised within national, international or institutional research assessment contexts.

15. That REF panellists and other academics may informally use the reputation of a journal as a quick means of judging a piece on which they are unable or unwilling to provide detailed expert opinion does not mean that this is a good idea. One argument for the use of metrics has been that quantitative and qualitative measures sometimes mirror each other. However, this may be explained often by the fact that qualitative assessments often themselves take place under flawed conditions which do not entail double-blind peer review; rather the review of pieces in which one already knows the author and the publishing outlet tends in practice to lead to shortcut decisions which confirm prejudices – and not academic judgements – reflected in citation counts.

Potential consequences of using citation metrics as an indicator for research impact and/or quality in research assessment 

16. Whilst our concerns are with the basic logic of attempting to use citation counts as a proxy for research quality and impact, there are also a number of troubling potential consequences of research assessment of moving in this direction as a widespread practice. We focus here on problems of inherent conservatism, structures of academic discrimination and emerging practices of gaming/manipulation, although this is a non-exhaustive list.

17. If we use metrics as a mode of assessing research quality or impact, we potentially introduce a further conservative bias into the field by favouring the work of already-famous scholars. Whilst they may be famous for an ongoing and productive research agenda, they may also be famous generally for work produced many years ago which has generated a lot of citations. This is an indication of ‘reputation’ in general, but for the purposes of choosing who/what to fund or one’s professional contribution, this introduces further prejudices against less established scholars, who really do need to compete on a level playing-field in terms of the quality of their ideas and findings. This will over time lead to a greater concentration of research funding and prestige in a smaller circle of people – not the most innovative researchers.

18. This problem of conservatism is compounded when we look at the systematic under-citation of women and minority groups. Recently, the large international TRIPS survey found evidence of a massive bias against citing women in the field of International Relations.[1] We also know that academics more generally carry sexist and racist biases as evidenced through experiments for hiring processes and judging academic quality.[2] Reasonably assuming that these prejudicial attitudes drive citation count differences as well, the move to metrics and away from peer review processes would compound (or at the very least mirror) the effects of these prejudices and embed them into research assessment.

19. The last issue to consider is the gaming of citation counts. It has been demonstrated already effectively that GoogleScholar can be gamed with ease and with dramatic effects.[3] One counter-argument is that other metrics are harder to game, and that companies like Thomson Reuters police issues such as self-citation in their journal rankings. We do not argue that existing methods for measuring research quality are pure, or desirable. However, once systematic gaming sets in, it is increasingly difficult for any ranking system to keep itself ‘clean’. As long as GoogleScholar remains game-able – and Google have not shown any interest in trying to change that, following its commercial model – then it will also affect any ‘clean’ rankings, as people using Google to look for references will be presented with Google’s top articles and works first. In turn, this is likely to generate more ‘real’ citations for a piece based on its gaming of the GoogleScholar rankings. The closer the link between citations or altmetrics and assessments of quality, the greater the incentive for academics and their managers to game those metrics. In and of itself this should be a huge reason against using citation counts as a means of assessing research in any meaningful and serious way.

Conclusions

20. Overall, the academic community as a whole should resist the adoption of citation metrics as a means by which to make conclusions about either research impact or research quality. They are not logically connected to either issue, contain systematic biases against different researchers and are all too easily manipulated, particularly by corporate rankings providers. They should certainly not become institutionalised in national, international or institutional practices.

21. It is, of course, difficult and time-consuming to assess academic research by having experts read it and carefully evaluate it against complex and demanding criteria, ideally under conditions of anonymity. That is as it should be. That is the whole point about good academic work and this cannot be automated or captured by present, or even future, citation counts. Simply because the market produces products, and because some people use them, does not mean that these are the things that we actually want or need for the purposes we have in mind. If we really are committed to using research assessment practices to fund the best quality, most innovative and most publicly engaged work, then citation counts are not the way to do it. Rather, we will end up funding not just those whose work is genuinely transformative, original and field-defining (assuming these qualities earn them high citations), but those who are best at self-promotion and rankings manipulation, and who are privileged by existing structures of prejudice.


[1] Daniel Maliniak, Ryan Powers and Barbara F. Walter (2013). The Gender Citation Gap in International Relations. International Organization, 67, pp 889-922. doi:10.1017/S0020818313000209. See open version here: http://politicalviolenceataglance.files.wordpress.com/2013/03/the-gender-citation-gap-in-ir.pdf

[2] Ilana Yurkiewicz, (2012) ‘Study Shows Gender Bias in Science is Real. Here’s Why It Matters’, Scientific American, available at http://blogs.scientificamerican.com/unofficial-prognosis/2012/09/23/study-shows-gender-bias-in-science-is-real-heres-why-it-matters/; April Corrice (2009) ‘Unconscious Bias in Faculty and Leadership Recruitment: A Literature Review’, Association of American Medical Colleges, available at http://www.hopkinsmedicine.org/diversity_cultural_competence/pdf/Unconscious%20Bias%20in%20Faculty%20and%20Leadership%20Recruitment%20A%20Literature%20Review.pdf

[3] Phil Davis, (2012), ‘Gaming Google Scholars Citations: Made Simple and Easy’, Scholarly Kitchen blog. Available at: http://scholarlykitchen.sspnet.org/2012/12/12/gaming-google-scholar-citations-made-simple-and-easy/

178 thoughts on “Why Metrics Cannot Measure Research Quality: A Response to the HEFCE Consultation

  1. Wonderful letter; please add my name in support: Nicola Smith, Senior Lecturer in Political Science, University of Birmingham

    Like

    • Thanks for the insightful analysis, please add my name to the list,
      Henrike Donner, Senior Lecturer in Anthropology, Oxford Brookes University

      Like

  2. Ludicrously Gradgrindian approach which is utterly unsuited to work in the humanities/social sciences and I suspect to much pure/applied science too

    Like

  3. Don’t necessarily agree with all you say, but more than happy to be associated with the overall argument. I thought this idea was (rightly) dead and buried in the Arts, Humanities and Social Sciences.

    Like

  4. Happy to support this initiative. Charles Devellennes, Lecturer in Political and Social Thought, University of Kent

    Like

  5. Reblogged this on Simone Tulumello and commented:
    Blog The Disorder of Things is sharing a response to the call of the UK Higher Education Funding Concil about the use of metrics in research assessment.
    Are metrics (i.e. citation counts) useful for assessing “quality” (which is different from impact on the academic environment)? According to Meera Sabaratnam and Paul Kirby, they are not (and there’s a lot of reasons for agreeing with them). The debate is still open and especially people from the UK may be interested adding an endorsement to the response or joining the debate.

    Like

  6. Pingback: Why Metrics Cannot Measure Research Quality: A Response to the HEFCE Consultation | AESOP Young Academics

  7. Excellent demonstration of the problems of citations as metric of quality in general and in the social sciences/humanities in particular. Please add me to the list. Lauren Wilcox, University Lecturer in Gender Studies, University of Cambridge.

    Like

    • I totally agree. The observations you make are by no means limited to social sciences.Please add my name.

      We probably all have our favourite examples of ignored academic work that has been rediscovered by later generations of scholars. Many important papers are not recognised for their importance at the time, and are thus not cited heavily until much later. My current favourite is from statistics.

      Almost all stats books that mention binomial confidence intervals and statistical tests employ a “Wald” model that E.B. Wilson proved was wrong in a paper in 1927. His paper was published in an exceedingly reputable journal (the Journal of the American Statistical Association), but it was effectively ignored at the time. As a result, a generation of researchers were informed by text books that never referred to Wilson’s score interval. Even now, Wilson’s interval is not widely-known.

      The paper was rediscovered around 1998 and popularised retrospectively by medical statisticians, notably R.G. Newcombe.* These new papers have become foundational in what is sometimes referred to as “the new statistics”.

      However, the number of papers that have made incorrect statistical generalisations as a result of this oversight is likely to be very large!

      Sean Wallis, Senior Research Fellow, Survey of English Usage, London; UCL UCU President and UCU NEC member.

      *See also http://corplingstats.wordpress.com/2012/03/31/z-squared/

      Like

  8. Completely agree we should refuse citation metrics. Actually we should refuse the entire concept of the REF. Our colleagues abroad have learned to recognise and dismiss “REF stuffer” articles produced by academics in this country desperate to fulfil their quota. In other words, the REF actually reduces esteem for British academic work outside the UK. It doesn’t make sense to try to fix the atrocity we already live with by adding on a further atrocity.

    Like

  9. I agree with the issues raised in this letter and do not believe that metrics are a suitable way of assessing the quality and usefulness of social science research.

    Anne Roemer-Mahler, Lecturer in International Relations, University of Sussex

    Like

  10. Please add my name in support of this letter: Monica Greco, Senior Lecturer, Department of Sociology, Goldsmiths, University of London.

    Like

  11. Thank you for writing this, I agree, particularly your points around structural prejudices/biases in academia. Sarah Bulmer, Lecturer in Politics, University of Exeter.

    Like

  12. Happy to lend my name to this, thanks for such a considered response.

    Victoria Basham, Senior Lecturer in Politics, University of Exeter

    Like

  13. Great letter. Please add my name: Caroline Holmqvist, Research Fellow, Swedish Institute of International Affairs and Senior Lecturer, Swedish Defence College.

    Like

  14. Thanks for your hat work on this and OA. More than happy to support so please add my name to the list. Alex Prichard, Politics, University of Exeter

    Like

  15. Please add my name, Sally Connolly, Assistant Professor of Poetry, University of Houston. I was educated in the UK and one of the reasons I teach in the US is that I think the Research Assessment exercise and the metrics used to calculate these data are a nonsense.

    Like

  16. Agree with the thrust of this. Metrics will never capture quality in social sciences and humanities. Professor Emma Murphy, School of GOvernment and International Affairs, Durham University.

    Like

  17. Great work – please add my name in support: Paul Harrison, Lecturer in Human Geography, Durham University

    Like

  18. Please add my name to the signatures, Bettina Schmidt, Director of Graduate Studies in Theology, Religious studies and Islamic studies, University of Wales Trinity Saint David and member of a REF sub-panel

    Like

  19. Thanks for putting this together and please add my name in support: Berit Bliesemann de Guevara, Senior Lecturer, Department of a International Politics, Aberystwyth University

    Like

  20. When I was studying Roman Law, the most cited scholars were the ones whose arguments were brilliantly wrong. Citation is no measure of quality, sometimes it’s actually the opposite, as your paper incisively demonstrates. Bravo!

    Like

  21. This is absolutely spot on and I fully endorse it. marta iniguez de heredia, Teaching Associate, University of Cambridge.

    Like

  22. Pingback: What is the measure of scientific ‘success’? | Climate Etc.

  23. Many thanks for preparing and circulating this excellent response to an ill-conceived and dangerous plan (that seems to just keep coming back); I’d be delighted to be able to attach my name to this submission. Malcolm MacLean, Reader in the Culture & History of Sport/Associate Dean, Quality & Standards, University of Gloucestershire

    Like

  24. Please add my name. Rossella Ferrari, Senior Lecturer in Modern Chinese Culture and Language, SOAS, University of London

    Like

  25. exemplary reply to a very crude, if predictable, proposal.
    thank you, Meera.
    barbara pizziconi, Senior Lecturer in Japanese Applied Linguistics, SOAS, University of London

    Like

  26. Thanks for your work. I agree that metrics are not a suitable way to assess the quality or usefulness of humanities research, and endorse your overall line of argument. Please add my name: Pekka Vayrynen, Professor of Moral Philosophy, University of Leeds.

    Like

    • Thank you for writing this thoughtful and rigorous response for the consultation. Please add my name in endorsement of the argument against the use of metrics in judging the quality of research. Dr Mike Hayler, School of Education, University of Brighton

      Like

  27. I completely agree. Please include my name. Dr Gergely Juhász, Lecturer in Theology and Biblical Studies, Liverpool Hope University

    Like

  28. I could not agree more, and thank you for writing such a clear and compelling response. We badly need to have this letter read! Jenna Ng, Lecturer in Film and Interactive Media, University of York

    Like

  29. Pingback: Use of metrics in research assessment – University of Bath UCU

  30. Depressing that this is rearing its head again when all these reasoned objections are well-rehearsed. Count me in: Antoine Bousquet, Senior Lecturer in International Relations, Birkbeck, University of London

    Like

  31. Pingback: Metrics and the Humanities | An und für sich

  32. Pingback: Why Metrics Cannot Measure Research Quality: A ...

  33. A clear and comprehensive rebuttal of the rather weakly substantiated proposal to use bibliometrics in research quality assessment.

    I would add that gaming the system has moved on to more sophisticated reciprocal citation cabals which would be impossible to control for even among the more sophisticated ranking systems — incentivising opportunism over a commitment to quality.

    Like

  34. I endorse the comments made by Sabaratnam and Kirby. A problematic and damaging proposal from HEFCE which may well be the death knell for Arts and Humanities research in Britain, Dr. Andrew Jones, Reader in Archaeology, University of Southampton

    Like

  35. Your points 7 and 10-15 indicate (correctly, in my view) that citation counts are problematic proxies for research quality. Also worth noting is that citation counts are not even very convincing proxies for research *impact*. I hope the self-citation is forgivable under the circumstances:

    “What sort of information do citation counts and indices provide, in the broadest terms? They may be understood as proxies for research impact in the first instance, and via research impact, as indirect proxies for research quality. Irrespective of discipline, their informativeness in these roles depends on empirical assumptions that can be quite fragile. Citation impact and research quality are of course very different things, the links between which can be obscure (Mryglod et al. 2013). But even the connection between impact and citation is profoundly complicated. Some of the factors discussed in the following section provide reason to doubt that real intellectual impact implies measurable literature citations, in many contexts, while other analyses suggest that the converse is equally dubious. That is, even extensive citation need not reliably signify a real intellectual impact, inasmuch as analyses indicate an alarming degree of irrelevant citation, and identical miscitations that propagate through some literatures; it is probable that citations are often being copied and pasted without the papers themselves being read (Todd and Ladle 2008; Simkin and Roychowdhury 2003). Jointly these reflections suggest that, even in the most favourable contexts of application, citation indices convey less information about impact than one might otherwise suppose” (Kenyon 2014, pp. 251-2; ‘Defining and Measuring Research Impact in the Humanities, Social Sciences and Creative Arts in the Digital Age.’ Knowledge Organization 41.3: 249-257).

    Like

    • A brilliant response to a badly thought out and damaging proposal.. please sign my name Prof Geraldine Harris, Lancaster University

      Like

  36. Please add my name in support of this commendable letter: Dr Paul Anderson, Research Fellow, School of Law, University of Warwick

    Like

  37. Pingback: Why Metrics Cannot Measure Research Quality | Feminist Philosophers

  38. Whilst there is much to agree with in the response and nearly everything it says is true, it fails to deal with a fundamental issue. The problem is not really whether metrics are better than peer review but whether (at least for the social sciences) the bloated process of assessment is proportionate in terms of the amount of funding distributed. I would guess that in some departments the real cost of REF preparation outweighs the income produced. There is also an issue of whether an effectively non-transparent and non-anonymous process of peer review is really a good way of assessing research quality. Both will produce anomalies and there is a question of which produces the most. Even more fundamentally is the issue of the assessment of impact and whether peer review of case studies really is a rigorous mechanism of assessment. So whilst metrics are imperfect I am not sure that rejecting them without considering the process of assessment overall really makes much sense.

    Like

    • We can, I think, easily agree that there are major flaws with the Research Excellence Framework in both principle and practice. The question of how research is to be assessed, and how it is to be rewarded, is indeed fundamental, and deserves wide and serious discussion. But HEFCE are not consulting on radical changes to that underlying system (in terms of the link between quality assessments and funds), or asking for views on what would constitute a better overall system. They are seeking views on the addition of metrics to that system.

      As we indicate, this is somewhat paradoxical since the reason for a turn to metrics cannot be both that they provide important new information and also that they mirror the results you would get from a REF for less money. The question thus becomes just how imperfect metrics are. If they promise marked improvements on current quality measures (and the REF is at least supposed to be anonymous) then there may be reasons to implement them. We have tried to suggest reasons why metrics of various kinds will fall short of that promise. The sense or otherwise of that position will depend significantly on what kind of counter-arguments can be offered for the value of citation (or similar) as a measure of quality.

      Like

  39. Wonderful, thanks very much. Hopefully PhD signatures are worth including.

    Mr James Camien McGuiggan, PhD candidate, University of Southampton.

    Like

  40. Please add my name to your statement, to which I fully agree.

    Michael Dreher, Lecturer in Mathematics, Heriot-Watt University Edinburgh

    Like

    • Worrying developments once again. Please add me to the response. Christine Berberich, Senior Lecturer in English, Centre for Studies in Literature, University of Portsmouth.

      Like

    • Alexander Jacoby, Senior Lecturer in Japanese Studies, Oxford Brookes University. I concur with your analysis and ask you to add my name to your list of supporters.

      Like

  41. A big thank you to the authors of the paper. I am in full support.
    Susan Newman, senior lecturer in economics, University of the West of England

    Like

  42. Excellent criticisms. Please add my name to the letter: Thomas Lynch, Senior Lecturer in Philosophy and Ethics, Department of Theology and Religious Studies at the University of Chichester.

    Like

  43. Excellent analysis of the problem. Happy to add my support. Brian Robertson, Reader, Dept.of Medicine, Imperial College London.

    Like

  44. I fully support this important letter. Please add my name: Patrice Haynes, Lecturer in Philosophy, Liverpool Hope University

    Like

  45. Yet another ludicrous development…Please add my name: Mark Frost, Senior Lecturer in English Literature, University of Portsmouth

    Like

  46. I am happy to endorse the views of this blog post. So do add my name onto your list: Dr Mikko Kuisma, Senior Lecturer in International Relations, Oxford Brookes University.

    Like

  47. Pingback: Should metrics be used to assess research performance? A submission to HEFCE

  48. Disclaimer: I am a member of the HEFCE Steering Group that is conducting the metrics review but I comment here in as an individual — I do not speak for the committee.

    This is a very interesting and closely argued submission that makes many valid points. But there is one thing about it that has been nagging at me and I wanted to raise a question that I hope might help amplify the discussion.

    You give many particular instances of where citation counts fail to capture research quality (or impact — the two are not the same even if they may have some interrelations). For example, Huntington’s book that has over 20,000 citations but appears, for a number of reasons, to be of dubious value. That seems fair enough and it is quite right to point out other potential pitfalls in the use of citations. It is easy to see how, in the case of individual authors or individual works, citation counting can be very problematic as a proxy for quality.

    But what seems to be to be missing here (and it is perhaps ironic!) is any attempt to quantify the magnitude of the problem. This strikes me as important because the REF is not assessing individuals but departments, groups of people. When considering populations, some of the noise of measurement is averaged out by looking at the whole. For example, of all the monographs in International Relations that have received 20,000+ citations, what proportion are considered by the community of scholars to be of high quality? Do they all suffer the problems of Huntington’s tome or is there a percentage that would be thought to be valuable. And, on average, are publications with higher citation counts considered to be better than those that attract few citations?

    If one is looking in aggregate at the output from a given department, then is there any merit in totalling citations and comparing them with the count from a department of the same discipline (and size)? Could it be that many of the known issues with citations for individual papers are washed out by treating the numbers as representative of the likely distribution of the quality of output from a relatively large group of scholars?

    I am bound to say that I don’t know the answers to these questions though there is an interesting analysis of departmental H-indices by psychologist Prof Dorothy Bishop that I think is worth looking at. (see also http://totheleftofcentre.wordpress.com/2011/09/28/ref-prediction/).

    Finally, I don’t suppose for a moment that raising this question addresses all the concerns surrounding citation metrics (such as the risk of gaming, the prejudicial effects on women or minorities). But I still think it’s a question worth discussing.

    Like

  49. Please add my name and information to the list in support of this. Lars K Hallstrom, PhD. Dept of Social Science and Alberta Centre for Sustainable Rural Communities, University of Alberta. lars.hallstrom@ualberta.ca

    There are both methodological and ideological dimensions to this issue, and to the recent challenges presented to science, peer-review and “impact assessment” . In particular, we must consider (as someone who has some background in impact assessment) how to measure the impact of IDEAS…over time, culture, language and communities of practice, theory and discipline. Beyond the biases noted above, we cite from a variety of positions and stances, all of which reflect various implicit and explicit influences and impacts upon theory, method, models and practices. Citation counts are “interesting” at best, but let us also try to imagine an academic world where ALL research must have “impact” – and what it would mean to get to that bizarre point.

    Like

  50. Perhaps we might try to understand the problem before we attempt to quantify its magnitude (four responses above).

    Please add my name in support. Richard Smith, Professor of Education, University of Durham.

    Like

    • I thought the problem — of whether or not one might use citations in any meaningful estimate of research quality — had been sufficiently well stated by the authors of the post.

      My query about how common are some of the possible problems mentioned as examples of difficulty seems a reasonable one and I would be interested to see responses.

      Like

  51. Well put. Please add my support.
    Dr. David Porreca
    President, Faculty Association of the University of Waterloo
    Department of Classical Studies

    Like

  52. Pingback: Metrics: An Addendum on RAE / REF | The Disorder Of Things

  53. Thank you – a brilliant analysis and critique. Please add my name: Lindiwe Dovey, Africa Department, SOAS (University of London).

    Like

  54. Thanks to the authors for this. Please add my name in support: Ty Solomon, Lecturer in International Relations, University of Glasgow.

    Like

  55. Pingback: Impact of Social Sciences – Is the fear of metrics symptomatic of a deeper malaise? Fiefdoms and scapegoats of the academic community.

  56. Pingback: Who’s Afraid of the Big Bad Metric?

  57. Your excellent document stimulated me to write my own response, from the perspective of a former Medical Research Council Scientist and former pension fund trustee,
    What if metrics, and indeed research assessment were a clinical trial, testing a new drug, or a vaccine for the whole population, or a test for a disease. Would the design ever be approved? I suspect it wouldn’t stand a chance. There is no adequate definition or baseline measurement of what the process is designed to detect. If excellence were defined, how well would metrics detect it? 50% success rate? 55%? 85%?
    As a pension fund trustee, I gradually came to realise that a lot of authoratative wisdom (especially on risk) was just plain wrong. However, it is entirely acceptable for a good investment manager to have no more than a success rate of 55% in choosing investments. Most do worse than a tracker fund. Would getting it right 55% of the time be acceptable for metrics?
    A paradox of investment is that if you buy shares in large, highly regarded, “excellent” companies you will more often lose. If you buy shares in smaller unfashionable companies, you are more likely to win. What is the evidence that rewarding “excellence” gives the best return on investment?
    Anyway, good luck with your document. I hope it isn’t dismissed as just more moaning by Arts and Humanities.

    Like

    • Thank you Michael – much appreciated! Would also love to read your own response – would you be happy to email it to me? ms140[at]soas.ac.uk

      Like

  58. Pingback: Open Science & Altmetrics Monthly Roundup (June 2014) | Impactstory blog

  59. Please add my name in agreement with this post.
    Associate Research Fellow, Department of Politics, Exeter University.

    Like

  60. Pingback: Impact of Social Sciences – The rejection of metrics for the REF does not take account of existing problems of determining research quality.

  61. Pingback: A Modest Defense of Citation Metrics

  62. Pingback: Why Metrics Cannot Measure Research Quality: A Response to the HEFCE Consultation | Allegra

  63. Pingback: Graham K. Brown

  64. Pingback: What could altmetrics add to the REF exercise? | Altmetric.com

  65. Pingback: Can Metrics Be Used Responsibly? Why Structural Conditions Push Against This | The Disorder Of Things

  66. Pingback: Impact of Social Sciences – Can metrics be used responsibly? Structural conditions in Higher Ed push against expert-led, reflexive approach.

  67. Pingback: Fantoni alla Camera: l’ASN è meritocratica e contro le baronie, no a cambiamenti sostanziali – ROARS

Leave a comment