Wednesday 24 June 2009

What do bibliometrics actually add to research evaluation?

Firstly, the reason that I haven't posted in an age is that I've been in Norway, interpreting seismic data for the new project I'm working on. Hopefully I can now post a bit more regularly, as I should actually be in Manchester for a few consecutive weeks, for the first time this year.

Regular readers will know that I like to whinge about the increasing use of statistical indicators (bibliometrics) to evaluate research performance. Previously in England, research performance has been evaluated by the Research Assessment Exercise, a cumbersome and involved system based around expert peer review of research. Currently, HEFCE (the body that decides how scarce research funding is allocated to English universities) is looking into replacing this with a cumbersome and involved system based around bibliometrics and "light-touch" peer review. To this end, a pilot exercise using bibliometrics and including 22 universities has been underway. An interim report on the pilot is now available.

Essentially, three approaches have been evaluated:

i) Based on institutional addresses: here papers are assigned to a university based on the addresses of the the authors, as stated in the paper. This would be cheap to do, as it would need no input from the universities.

ii) Based on all papers published by authors. In this approach, all papers written by staff selected for the 2008 RAE were identified. This requires a lot of data to be collected.

iii) Based on selected papers published by authors. Again, this approach used all staff selected for the 2008 RAE, but only used the most cited papers.

For each approach, the exercise was conducted twice: once using the Web Of Science (WoS) database, and once using Scopus. The results were then compared with those from the 2008 RAE.

Well, the results are interesting, if you like this sort of thing. It is clear that the results can be very different from those provided by the RAE, whichever method was used, although the "selected papers" method tends to give the closest results. It is also notable that the two different databases give different results, sometimes radically so; Scopus seems to consistently give higher values than WoS. Workers in some fields complained that they made more use of other databases, such as the arXiv or Google Scholar (it's worth noting that the favoured databases are proprietary, while the arXiv and Google Scholar are publically accessible).

In general, the institutions involved in the pilot preferred the "selected papers" method, but it seems that none of the methods produced particularly convincing results. According to the report (paras 66 and 67):

In many disciplines (particularly in medicine, biological and physical sciences and psychology), members reported that the ‘top 6’ model (which looked at the most highly cited papers only) generally produced reasonable results, but with a number of significant discrepancies. In other disciplines (particularly in the social sciences and mathematics) the results were less credible, and in some disciplines (such as health sciences, engineering and computer science) there was a more mixed picture. Members generally reported that the other two models (which looked at ‘all papers’) did not generally produce credible results or provide sufficient differentiation.

One of the questions here is what is meant by "reasonable" or "credible" results? The institutions involved in the pilot seem to assume that the best results are the ones that most closely match those of the RAE. I suspect this is because the large universities that currently receive the lion's share of research funding are not going to support any system that significantly changes the status quo.

The institutions involved in the pilot seem to think that bibliometrics would be most useful when used in conjunction with expert peer review. From the report:

Members discussed whether the benefits of using bibliometrics would outweigh the costs. Some found this difficult to answer given limited knowledge about the costs. Nevertheless there was broad agreement that overall the benefits would outweigh the costs – assuming a selective approach. For institutions this would involve a similar level of burden to the RAE and any additional cost of using bibliometrics would be largely absorbed by internal management within institutions. For panels, some members felt that bibliometrics might involve additional work (for example in resolving differences between panel judgements and citation scores); others felt that they could be used to increase sampling and reduce panels’ workloads.

According to the interim report, the "best" results (i.e. those most closely matching the results of the RAE) were obtained using a methodology that will have a similar administrative burden as the RAE. Even then the results had "significant discrepancies". So, if the aim of the pilot was to get similar results to the RAE with a lesser administrative burden, it seems that the pilot exercise has failed on both counts. So if bibliometrics don't seem to add much to the process, it's worth considering what they might take away. For which, see my previous post...