Saturday, 21 November 2009
Gimpy has now made my day by posting this story about homeopaths concerned that the British parliament's Science and Technology Committee will conclude that there is no evidence in favour of homeopathy. Instead of marshalling the evidence and trying to make a scientific case, the homeopaths are trying an intention experiment to influence the committee in favour of homeopathy.
There's just nothing to add to this: it's shear barking mad lunacy, and you have to think that maybe these people are their own worst enemies.
Monday, 26 October 2009
Well, every day is rainy in Manchester. Perhaps predictably, if you look at the data this conclusion is based on, you can see that there isn't exactly an astounding difference in rainfall between different days of the week. Certainly, statistically significant differences have not been demonstrated. The research has some interesting things to say about how rainfall patterns seem to have changed over the last 30 years or so: Manchester is somehow managing to get wetter. This is consistent with warmer temperatures, as more water vapour can be moved around when temperatures are higher. This is interesting stuff, and excellent for illustrating local changes in climate for a science festival. So why emphasise that Tuesday is supposedly the wettest day of the week, when the data surely don't convincingly support that? I suppose the university press office thinks that wet Tuesdays are a more interesting story than local climate changes, but I don't think I would agree.
A quick google shows that the story has been picked up by the Express (Why Tuesday is the day you'll most likely need your umbrella) and the Telegraph (Tuesday is the rainiest day), which are clearly based around the press release. I think a better press release might have been headlined "Manchester getting rainier", which is interesting and also has the benefit of being supported by the data.
Wednesday, 23 September 2009
Friday, 18 September 2009
The key point of the report is that there is some confusion among researchers about what exactly it is they're supposed to be doing. There are conflicting and unclear messages form different bodies about what sort of research contributions are valued. The perception is that the only thing that really counts in terms of research assessment is peer-reviewed journal articles. Other contributions, such as conference proceedings, books, book chapters, monographs, government reports and so on are not valued. As a result, the proportion journal articles compared to other outputs increased significantly between 2003 and 2008. A couple of comments by researchers quoted in the report (p.15):
[There is] much more emphasis on peer reviewed journals …Conferences, working papers and book chapters are pretty much a waste of time … Books and monographs are worth concentrating on if they help one demarcate a particular piece of intellectual territory.
There is a strong disincentive to publish edited works and chapters in edited works, even though these are actually widely used by researchers and educators in my field, and by our students.
This is certainly the impression I get from my own field. In fact, I have been advised by senior colleagues to target high-impact journals, rather than, for example, special publications. I have never received any formal guidance on what research outputs are expected of me, but the prevailing atmosphere gives the impression that it's all about journal articles. After publishing a couple of things from my PhD, it took another three years to publish anything from my first post-doc. I worried about that: it seemed that the numerous conferences and internal company reports and presentations I produced over that time counted for nothing career-wise.
The report makes it clear that, in the case of the RAE, it is more perceptios than the reality causing the problem: the RAE rules meant that most outputs were admissible, and all would be treated equally. But it's perceptions that drive the way researchers respond to research assessment. Clearer guidance is needed.
An interesting point brought up by the report is how, when there is more than one author for a journal article, the list of authors is arranged. In my field, authors are typically listed in order of contribution, so I was surprised to find that this is by no means always the case. In some fields, especially in the humanities and social sciences, authors are commonly listed alphabetically. In some cases, the leader of the research group is listed first, in other cases last. And there are various mixtures of listing by contribution, grant-holding and alphabetic order. There is even a significant minority where papers based on work done by students have the student's supervisor as first author! This means that there is no straightforward way of apportioning credit to multiple authors of a paper, something that David Colquhoun has already pointed out. This is a huge problem for any system of assessment based on bibliometrics.
The report also examines how researchers cite the work of other people. Other researcher's work should be cited because it forms part of the background of the new research, because it supports a statement made in the new paper, or as part of a discussion of how the new paper fits into the context of previous research. Crucially, this includes citing work with which the authors disagree, or that is refuted or cast into doubt in the light of the new work (p.30):
Citing somebody often indicates opposition / disagreement, rather than esteem and I am as likely to cite and critique work that I do not rate highly as work I value.
So any system that relies on bibliometric indicators is likely to reward controversial science as much as good science (not that those categories are mutually exclusive, but they don't completely overlap either).
Researchers are perfectly clear that a system based on bibliometrics will cause them to change their publication behaviour: 22% will try to produce more publications, 33% will submit more work to high-status journals, 38% will cite their collaborators work more often, while 6% will cite their competitors work less often. This will lead to more journal articles of poorer quality, a the decline of perfectly good journals that have low "impact", and corruption in citation behaviour. In general, researchers aren't daft, and they've clearly identified the incentives that would be created by such a system.
The report presents a worrying picture of research, and scientific literature, distorted by the perverse incentives created by poorly thought-out and opaque forms of research assessment. It can be argued that scientists who allow their behaviour to be distorted by these incentives are acting unprofessionally: I wouldn't disagree. But for individuals playing the game, the stakes are high. Perhaps we ought to be thinking about whether research is the place for playing games. It surely can't lead to good science.
Wednesday, 16 September 2009
Please find attached NSS results by Faculty, School and JACS Level 3 subjects. Also included is a mapping document to accompany the JACS report to assist you in understanding which programmes of study are included under each heading. The Word document, 'APPENDIX 06-Surveys - NSS Table EPS.doc' shows the data that will be included in the OPR documentation.
Please note that the data is FOR INTERNAL USE ONLY.
I have no idea what NSS, JACS or OPR mean, so this e-mail makes no sense to me whatsoever. I seem to be getting an increasing number of these things, all with acronyms I've never heard of.
Firstly, Ben Goldacre and Respectful Insolence discuss the case of two papers, recently published in Medical Hypotheses, that were so bad they were withdrawn by publishers Elsevier. Given that Elsevier happily publishes Homeopathy, the fanzine of the Faculty of Homeopathy, this should give pause for thought. Medical Hypotheses is a bit of an oddity: it does not send papers out for peer review. Rather, they are approved solely by the editor of the journal, one Bruce Charlton. It appears that many papers are approved within days, sometimes hours, of being submitted, suggesting that there is very little scrutiny of the papers.
The two papers are one by Duesberg et al., and one by Ruggiero et al., both of which seek to deny the magnitude of the AIDS crisis. Seth Kalichman of the Denying Aids blog did an experiment by sending the manuscript out for blind peer review. All three "reviewers" rejected the manuscript on the basis that it was filled with logical flaws and mis-representations of the published literature.
This Article-in-Press has been withdrawn pending the results of an investigation. The editorial policy of Medical Hypotheses makes it clear that the journal considers "radical, speculative, and non-mainstream scientific ideas", and articles will only be acceptable if they are "coherent and clearly expressed." However, we have received serious expressions of concern about the quality of this article, which contains highly controversial opinions about the causes of AIDS, opinions that could potentially be damaging to global public health. Concern has also been expressed that the article contains potentially libelous material. Given these important signals of concern, we judge it correct to investigate the circumstances in which this article came to be published online. When the investigation and review have been completed we will issue a further statement. Until that time, the article has been removed from all Elsevier databases. The Publisher apologizes for any inconvenience this may cause. The full Elsevier Policy on Article Withdrawal can be found at http://www.elsevier.com/locate/withdrawalpolicy.
The second example is a paper published in Proceedings of the National Academy of Sciences, amusingly known as PNAS. This is a venerable and respected journal, but it has a little-known wrinkle: members of the National Academy of Sciences are allowed to bypass formal peer review by "communicating" papers for other researchers. This is how the PNAS "Information for Authors" page describes the process:
An Academy member may “communicate” for others up to 2 manuscripts per year that are within the member's area of expertise. Before submission to PNAS, the member obtains reviews of the paper from at least 2 qualified referees, each from a different institution and not from the authors' or member's institutions. Referees should be asked to evaluate revised manuscripts to ensure that their concerns have been adequately addressed. The names and contact information, including e-mails, of referees who reviewed the paper, along with the reviews and the authors' response, must be included. Reviews must be submitted on the PNAS review form, and the identity of the referees must not be revealed to the authors. The member must include a brief statement endorsing publication in PNAS along with all of the referee reports received for each round of review. Members should follow National Science Foundation (NSF) guidelines to avoid conflict of interest between referees and authors (see Section iii). Members must verify that referees are free of conflicts of interest, or must disclose any conflicts and explain their choice of referees. These papers are published as “Communicated by" the responsible editor.The paper in question is was submitted via this communication process. It was written by Donald Williamson, a retired academic from the University of Liverpool, and suggests that butterflies and caterpillars orginated as different species:
I reject the Darwinian assumption that larvae and their adults evolved from a single common ancestor. Rather I posit that, in animals that metamorphose, the basic types of larvae originated as adults of different lineages, i.e., larvae were transferred when, through hybridization, their genomes were acquired by distantly related animals.
The paper has been criticised on the basis that it contains no supporting data for what is, after all, a fairly extraordinary hypothesis. Not only that, but it turns out that it had previously been rejected by seven different journals.
In both Medical Hypotheses and PNAS, the defence seems to be that there needs to be some mechanism by which speculative ideas that go against current mainstream opinion can be presented and discussed. This seems fair enough, but is anything gained by publishing hypotheses that are not supported by any data, or papers that are logically flawed and contain mis-representations? In both these cases, it seems that the papers would not have been published had they been reviewed properly.
Wednesday, 26 August 2009
It shows the roads - familiar to many Mancunians - which the Soviets felt were wide enough to carry tanks including Washway Road, the Mancunian Way, and Princess Road.
I biked down Washway Road last night on my way home from a training ride, and it's quite hard to imagine columns of Soviet tanks clanking along it towards Stretford. Though I suppose if there ever had been a Soviet invasion, Washway Road would be even more of a post-nuclear wasteland than it currently appears. Apparently, this map is only 35 years old, and it's very strange to think that so recently there were plans for a Soviet Manchester.
Friday, 21 August 2009
Society of Homeopaths defends ineffective treatment for condition that kills 2 million children a year...
I'm not going to write a huge amount about this, because it is ably covered elsewhere. But the World Health Organisation finally came out today, and clearly stated that homeopathy should not be used for life-threatening conditions such as AIDS, malaria, TB and childhood diarrhoea.
The Society of Homeopaths (SoH) respond by saying that treating AIDS, malaria and TB would contravene their ethical guidelines. These guidelines are a joke, as you can see by perusing the Quackometer and Gimpy's blog. But apart from that, they say:
The Society of Homeopaths, the UK’s largest body of registered homeopaths, is concerned to learn, in an online article by the BBC (“WHO warns against homeopathy use’), that the World Health Organisation (WHO) has issued caution against the use of homeopathy for childhood diarrhoea following a letter by the charity Sense About Science.
They go on to cherry-pick and distort the research that has actually been conducted on homeopathy for childhood diarrhoea.
Well, who's surprised by that? Regular readers will be aware of just a few of the ways that homeopaths try to distort the evidence that homeopathy doesn't work. The point here is that the SoH, a supposedly professional organisation, is trying to defend a totally ineffective treatment for a disease that kills 2 million children every year. Homeopathy can't do anything to help, and using it instead of proper medical treatment could cost lives. I don't really mind if homeopaths sell ineffective sugar pills to the worried well in Alderley Edge, but this press release is delusional and irresponsible, and reasonable people should have no truck with this sort of dangerous nonsense.
Tuesday, 18 August 2009
This year I joined Stockport Clarion CC, and I've been riding the odd weekday 10-mile time-trial with no great success. But the event I was really looking forward to was the club hill climb championships. Hill climb courses are typically between several hundred yards and a couple of miles long, and they're usually steep, with gradients of 10-20%. Just to be different, ours is on the Cat and Fiddle road, between Macclesfield and the eponymous pub that stands, hemmed in by peat bog, at the road summit. That's about 6.5 miles of climbing, but at an average gradient of less than 4%. There are steeper railway bridges around here, but there's still about 335 m of height gain involved. The course starts in the outskirts of Macclesfield, opposite a bus stop, and the first 3 miles or so is a fairly steady 4.8%. The next mile and a half is very bendy, contains some short downhill sections, and is nearly flat on average. The final mile and a half takes you up onto Axe Edge Moor proper, climbing at 3.5% or so.
I had never ridden up the climb before, but I had a gameplan of sorts. The first section was the bit where my light weight would give me an advantage, so I would ride hard, but not flat out: the key is not to overdo it, what with there still being 3 miles to go. I would then use the fast section to take a bit of a breather, riding at slightly less than 10-mile pace, since you don't gain much time there anyway. Then it would be eyeballs out over the last mile and a half to the summit. What gears would I use? Hell if I knew. I would figure it out as I went along.
The race was last night, and I actually really enjoyed it. The conditions could scarcely have been better. It was clear and dry, about 18 C, and there was a moderate tailwind. There were only 8 riders, and I was off Number 2, with a slower rider starting one minute before me. I started well, settling in very quickly, and once I had emerged from the trees at the bottom of the climb I could see my minute man almost all of the time. I was clearly gaining on him, and that gave me a psychological boost, as I finally reeled him in on a sharp right-hand bend about a mile and a half in, just before Walker Barn. I was first on the road now, a nice feeling, and my legs were holding up just fine. After Walker Barn, into the fast part of the course, I switched into the big ring, and tried to maintain a slightly-slower-than-10-mile-time-trial pace. My biggest problem here was gear selection. The gradient changes so often and there are so many sharp curves that I felt like I was using nearly every gear on the bike, and I was changing front rings too often: I should have picked one and stuck to it. But I still felt good, and I knew I had something left for the last mile and a half. No strategy involved here; just eyeballs out until the summit. There was a guy out for a training ride in front, and that gave me something to chase. Round the last bend, and there was the welcome sight of the Cat and Fiddle Inn. One last leg-breaking effort, and I was past the timekeeper in 26 minutes dead, gasping like a freshly landed fish. It was about a minute faster than I'd hoped for, so I couldn't have been happier.
It was glorious at the summit, with views across the Cheshire Plain and the Peak District. Unfortunately, the pub is closed on Monday nights, so there was no chance of a swift half before the 25 mile ride back home. At least the first seven or so was downhill.
Friday, 7 August 2009
In the BMJ study, Santiago Moreno and colleagues look at anti-depressants. This is a good area to look at, because of the availability of data that was submitted to the Federal Drug Administration in the United States. Legal requirements enforce submission of ALL data to the FDA, so the authors consider the FDA dataset to be unbiased (although not necessarily complete). This unbiased dataset can then be compared to the data that is available in published journal articles.
The comparison is done with our old friend the funnel plot. This plots the standard error for a trial against the size of the effect that the trial found. The authors of the BMJ study added a new twist by contouring the funnel plot for statistical significance: at a glance it can be seen where studies fall in terms of statistical significance.
The results are dramatic. Of the 74 trials registered with the FDA, 23 were not published. In the FDA data, there is a wide spread of results across the contours marking 1%, 5% and 10% levels of significance. When only the published data are considered, there is a clear cut-off at the contour for the 5% significance level, which is typically used in clinical trials to establish statistical significance. That strongly suggests that publication bias is a serious problem in the set of published trials: trials with statistically non-significant results have been systematically excluded.
What this means is that the published literature over-estimates the benefit of anti-depressants. It doesn't show that anti-depressants don't work: meta-analysis of the FDA data still shows a beneficial effect. The point is that the real benefit (as shown by the FDA data) is less than the benefit you would expect if you looked only at the published literature. Anti-depressants work less well than you might think, but they still work.
The problem is, of course, that most of the time we only have the published literature to work with. So the BMJ paper adds a useful visual technique for identifying publication bias as a likely problem, even if we don't have access a bias-free dataset for comparison. There is no doubt that biases exist in published data; the response from medical science, as seen here, is to try to identify and account for these biases. Regular readers will know that the response of CAM research is to manipulate the data in order to pretend that the problem doesn't exist.
Wednesday, 5 August 2009
First things first. Is there any evidence that homeopathy is effective for treatment of eczema? A quick search on Pubmed showed this to be the most recent (2003) relevant review. Incidentally, Pubmed is a freely accessible service, unlike many of the journal databases used by academics, and anyone with an internet connection can do the same thing as me and come up with the presently available evidence in a couple of minutes. Here's what the summary says:
Alternative methods are commonly used in patients with dermatologic diseases, with homeopathy being one of the most common. Homeopathy was developed by Samuel Hahnemann (1755–1843) and is based on the law of similars and the law of infinitesimals. It is a regulatory therapy where high dilutions of particular compounds are thought to induce a counterreaction in the organism. In dermatology, homeopathy is often used in atopic dermatitis, other forms of eczema, psoriasis, and many other conditions. To date, however, there is no convincing evidence for a therapeutic effect. There are only a few controlled trials, most of them with negative results. The few studies with positive results have not been reproduced. Acceptance by the patient seems largely based on counseling and emotional care rather than on objective responses to the homeopathic drugs.Not particularly convincing, is it?
Beech is at least unlikely to do himself any harm by trying homeopathy. However, real harm can result when people use homeopathy and avoid real medicine, as this tragic case in Australia shows. Nine-month old Gloria Thomas died after her eczema allowed an infection to get out of control. Her parents had been treating her with homeopathy instead of real medicine: they were later convicted of manslaughter by gross criminal negligence.
But the main reason that I have a bad feeling about this, is that even if Beech gets better, it won't prove anything about homeopathy (and the same goes if he doesn't see any improvement, to be fair). This is essentially an uncontrolled case report, and there would be no way of showing that any improvement resulted from the homeopathy, rather than, say, the natural cyclicity of eczema, or the placebo effect. Whatever happens to Beech, it isn't going to trump the evidence from the most recent systematic review of the evidence from clinical trials. So Beech's experiment cannot add anything to what we laughingly call the "debate" over the efficacy of homeopathy.
Still, good luck to him. I hope he does get some relief from his symptoms, but if he does the chances of it having anything to do with homeopathy are vanishingly small.
Wednesday, 29 July 2009
Beware the spinal trap
Some practitioners claim it is a cure-all, but the research suggests chiropractic therapy has mixed results - and can even be lethal, says Simon Singh.
You might be surprised to know that the founder of chiropractic therapy, Daniel David Palmer, wrote that '99% of all diseases are caused by displaced vertebrae'. In the 1860s, Palmer began to develop his theory that the spine was involved in almost every illness because the spinal cord connects the brain to the rest of the body. Therefore any misalignment could cause a problem in distant parts of the body.
In fact, Palmer's first chiropractic intervention supposedly cured a man who had been profoundly deaf for 17 years. His second treatment was equally strange, because he claimed that he treated a patient with heart trouble by correcting a displaced vertebra.
You might think that modern chiropractors restrict themselves to treating back problems, but in fact some still possess quite wacky ideas. The fundamentalists argue that they can cure anything, including helping treat children with colic, sleeping and feeding problems, frequent ear infections, asthma and prolonged crying - even though there is not a jot of evidence.
I can confidently label these assertions as utter nonsense because I have co-authored a book about alternative medicine with the world's first professor of complementary medicine, Edzard Ernst. He learned chiropractic techniques himself and used them as a doctor. This is when he began to see the need for some critical evaluation. Among other projects, he examined the evidence from 70 trials exploring the benefits of chiropractic therapy in conditions unrelated to the back. He found no evidence to suggest that chiropractors could treat any such conditions.
But what about chiropractic in the context of treating back problems? Manipulating the spine can cure some problems, but results are mixed. To be fair, conventional approaches, such as physiotherapy, also struggle to treat back problems with any consistency. Nevertheless, conventional therapy is still preferable because of the serious dangers associated with chiropractic.
In 2001, a systematic review of five studies revealed that roughly half of all chiropractic patients experience temporary adverse effects, such as pain, numbness, stiffness, dizziness and headaches. These are relatively minor effects, but the frequency is very high, and this has to be weighed against the limited benefit offered by chiropractors.
More worryingly, the hallmark technique of the chiropractor, known as high-velocity, low-amplitude thrust, carries much more significant risks. This involves pushing joints beyond their natural range of motion by applying a short, sharp force. Although this is a safe procedure for most patients, others can suffer dislocations and fractures.
Worse still, manipulation of the neck can damage the vertebral arteries, which supply blood to the brain. So-called vertebral dissection can ultimately cut off the blood supply, which in turn can lead to a stroke and even death. Because there is usually a delay between the vertebral dissection and the blockage of blood to the brain, the link between chiropractic and strokes went unnoticed for many years. Recently, however, it has been possible to identify cases where spinal manipulation has certainly been the cause of vertebral dissection.
Laurie Mathiason was a 20-year-old Canadian waitress who visited a chiropractor 21 times between 1997 and 1998 to relieve her low-back pain. On her penultimate visit she complained of stiffness in her neck. That evening she began dropping plates at the restaurant, so she returned to the chiropractor. As the chiropractor manipulated her neck, Mathiason began to cry, her eyes started to roll, she foamed at the mouth and her body began to convulse. She was rushed to hospital, slipped into a coma and died three days later. At the inquest, the coroner declared: 'Laurie died of a ruptured vertebral artery, which occurred in association with a chiropractic manipulation of the neck.'
This case is not unique. In Canada alone there have been several other women who have died after receiving chiropractic therapy, and Edzard Ernst has identified about 700 cases of serious complications among the medical literature. This should be a major concern for health officials, particularly as under-reporting will mean that the actual number of cases is much higher. If spinal manipulation were a drug with such serious adverse effects and so little demonstrable benefit, then it would almost certainly have been taken off the market.
Simon Singh is a science writer in London and the co-author, with Edzard Ernst, of Trick or Treatment? Alternative Medicine on Trial. This is an edited version of an article published in The Guardian for which Singh is being personally sued for libel by the British Chiropractic Association.
Friday, 24 July 2009
First of all, what does it mean to say that an article is peer reviewed? If an article is peer reviewed, it has been checked over by scientists who work in a similar field to the submitted article. A submitted article will usually be sent to two or three reviewers, who will each read the paper and submit a report to the journal editor. The editor will then decide whether the article merits publication.
On the face of it, this would seem to imply that certain standards are being met. But there is some evidence that this isn't necessarily the case. For example, this article (found via Ben Golacre's miniblog) suggests that in Chinese journals, only 5.9-7.7% of supposed randomised controlled trials reported in peer reviewed articles had adequate procedures for randomisation. A lack of adequate randomisation means that there is a good chance of introducing bias into your trial, and it ought to be one of the first things a reviewer would check. While the article specifically addresses trials published in Chinese journals, I don't think there's any compelling reason to think that things are dramatically different in what we laughingly call the west. Anecdotally, anyone who spends time wading through journals as part of their day job will be able to come up with several examples of utterly dreadful papers that should never have been published. This is without looking at pseudojournals, such as those that concentrate on complementary and alternative medicine, where articles on quackery are peer reviewed by other quacks.
So, if peer review can't tell you whether a trial described as randomised is in fact randomised, what can it tell you? Does it really act as any kind of guarantee of minimum quality? I would suggest not.
That is not to say that peer review is useless as it stands. In my fairly limited experience, papers that I have submitted have always been significantly improved by peer review. But surely there's a way of making peer review "fit for purpose", to use the current jargon?
This post was prompted by a discussion at the Bad Science forum, where the idea of applying industrial-style quality assurance to journal articles was raised. This would mean that there would be some sort of checklist that a reviewer would have to go through, and this would be checked to make sure it had been done. It would not be much use to do this informally; there would need to be some formal way of doing it.
In fact, this is not too far from what already happens, in many cases. I've just got a review back in which the reviewers have answered a series of yes/no questions (in addition to their more detailed written comments). For example, "Are the interpretations and conclusions sound and supported by the interpretation provided?", and "Is the paper logically organised?". For the example of trials published in Chinese journals above, there could be a question like "Is the methodology appropriate for testing the specified hypotheses?". Again, there would have to be some checks that this had been adequately done; this is really what the journal editor should be doing. At present, I think the role of the editor is often too weak. They do little more than send out and receive reviews. This is probably not surprising, given that editors are usually working more or less voluntarily and tend to have plenty of other things that they need to do. And it is not always the case: there are many excellent editors who make a lot of effort to engage with the papers they are handling, and the reviewer's comments on them. But if the role of editors were beefed up, such that they spend time formally checking that reviews have been carried out adequately, then peer review might actually provide the quality guarantee that we seem to think it should.
That might require actually paying editors and reviewers for their time. This would be a fairly radical step, but if it led to a professionalisation of the journal reviewing and editing process it would probably be a good thing. And if it led to a reduction in the number, and an increase in the quality, of papers published, that would not be a bad thing either.
Wednesday, 1 July 2009
Perhaps unsurprisingly, this is related to a terrible homeopathy study [Hill et al., The Veterinary Record 164:364-370], this time on the treatment of skin conditions in dogs. It's another example of homeopaths continuing to do small, badly designed studies, when plenty of large and properly conducted studies, and systematic reviews and meta-analyses of those studies, show that homeopathy doesn't work. The letter I am involved in is one of three letters that were published criticising the study: they can be found, with the author's reply, at The Veterinary Record 164: 634-636 [apologies for the lack of links: there's no DOI for these that I can find]. There is also an excellent discussion of the paper, and some of the responses to it, over at JREF.
The design in this study is truly extraordinary. Initially, 20 dogs with skin problems were recruited to the study. All were treated with individualised remedies by a homeopath. In 15 cases, the dog owners reported no improvement. In 5 cases, the owners reported a significant improvement. Not looking good for homeopathy so far. Still, the five improved dogs were said to have responded well to homeopathy, and went on to phase 2, which was a proper randomised and blinded placebo-controlled trial. Unfortunately, one dog had to be euthanased before the trial could happen, and another dog's skin problems had resolved completely after the first stage, leaving only three dogs in phase 2. Supposedly, those dogs did better with homeopathy than with placebo, thus justifying, as ever, "further research".
This is possibly the easiest study to criticise that I've ever seen. Put simply, the first phase lacks a control group, so improvements cannot be attributed to homeopathy. There is simply no evidence that the five dogs recruited to phase 2 actually responded to homeopathy, rather than just improved spontaneously. Then the second phase of the trial includes only three dogs. There is no way to interpret the results of such a tiny, underpowered study. Those are the main problems, but there are others. For example, all the dogs were on some kind of conventional medication, so that cannot be ruled out as contributing to any improvement.
The only reasonable conclusion from the study is that there is no strong evidence that homeopathy did anything for the dogs in the trial. But the paper concludes that the improvement seen in the five dogs (which again cannot be attributed to homeopathy on the basis of this study) is enough to justify further research. No doubt the paper will also be spammed all over the internet by the likes of Dana Ullman, as proof positive that homeopathy works. Hopefully the letter I'm a co-author on, along with the two other letters critical of the study that were published, will go some way to addressing that. The signs are not good, though. The original Hill et al. paper included the statement that "Different homeopathic remedies and different
dilutions of the same remedy have been distinguished from each other using Raman and infrared spectroscopy, even though all should contain nothing but water", with a reference to "Rao and others, 2007" [In fact, Rao et al. did not even claim that infrared spectroscopy showed any difference]. Regular readers will know that Rao and colleagues did nothing of the sort, and that to describe their paper as "discredited" would be something of an understatement. In the world of homeopathy, discredited papers never die. They are just recycled for use with audiences who don't know that they've been discredited. I suspect that this one will be no different.
As an aside, my favourite part of this study is that "constitutional signs" of each of the dogs, as used by the homeopath to pick a remedy, are listed [Table 2 of the paper]. For dog number 16, these are listed as:
Desires chicken; oranges aggravate
A clairvoyant dog! And this was published in a respected veterinary journal.
Wednesday, 24 June 2009
Regular readers will know that I like to whinge about the increasing use of statistical indicators (bibliometrics) to evaluate research performance. Previously in England, research performance has been evaluated by the Research Assessment Exercise, a cumbersome and involved system based around expert peer review of research. Currently, HEFCE (the body that decides how scarce research funding is allocated to English universities) is looking into replacing this with a cumbersome and involved system based around bibliometrics and "light-touch" peer review. To this end, a pilot exercise using bibliometrics and including 22 universities has been underway. An interim report on the pilot is now available.
Essentially, three approaches have been evaluated:
i) Based on institutional addresses: here papers are assigned to a university based on the addresses of the the authors, as stated in the paper. This would be cheap to do, as it would need no input from the universities.
ii) Based on all papers published by authors. In this approach, all papers written by staff selected for the 2008 RAE were identified. This requires a lot of data to be collected.
iii) Based on selected papers published by authors. Again, this approach used all staff selected for the 2008 RAE, but only used the most cited papers.
For each approach, the exercise was conducted twice: once using the Web Of Science (WoS) database, and once using Scopus. The results were then compared with those from the 2008 RAE.
Well, the results are interesting, if you like this sort of thing. It is clear that the results can be very different from those provided by the RAE, whichever method was used, although the "selected papers" method tends to give the closest results. It is also notable that the two different databases give different results, sometimes radically so; Scopus seems to consistently give higher values than WoS. Workers in some fields complained that they made more use of other databases, such as the arXiv or Google Scholar (it's worth noting that the favoured databases are proprietary, while the arXiv and Google Scholar are publically accessible).
In general, the institutions involved in the pilot preferred the "selected papers" method, but it seems that none of the methods produced particularly convincing results. According to the report (paras 66 and 67):
In many disciplines (particularly in medicine, biological and physical sciences and psychology), members reported that the ‘top 6’ model (which looked at the most highly cited papers only) generally produced reasonable results, but with a number of significant discrepancies. In other disciplines (particularly in the social sciences and mathematics) the results were less credible, and in some disciplines (such as health sciences, engineering and computer science) there was a more mixed picture. Members generally reported that the other two models (which looked at ‘all papers’) did not generally produce credible results or provide sufficient differentiation.
One of the questions here is what is meant by "reasonable" or "credible" results? The institutions involved in the pilot seem to assume that the best results are the ones that most closely match those of the RAE. I suspect this is because the large universities that currently receive the lion's share of research funding are not going to support any system that significantly changes the status quo.
The institutions involved in the pilot seem to think that bibliometrics would be most useful when used in conjunction with expert peer review. From the report:
Members discussed whether the benefits of using bibliometrics would outweigh the costs. Some found this difficult to answer given limited knowledge about the costs. Nevertheless there was broad agreement that overall the benefits would outweigh the costs – assuming a selective approach. For institutions this would involve a similar level of burden to the RAE and any additional cost of using bibliometrics would be largely absorbed by internal management within institutions. For panels, some members felt that bibliometrics might involve additional work (for example in resolving differences between panel judgements and citation scores); others felt that they could be used to increase sampling and reduce panels’ workloads.
According to the interim report, the "best" results (i.e. those most closely matching the results of the RAE) were obtained using a methodology that will have a similar administrative burden as the RAE. Even then the results had "significant discrepancies". So, if the aim of the pilot was to get similar results to the RAE with a lesser administrative burden, it seems that the pilot exercise has failed on both counts. So if bibliometrics don't seem to add much to the process, it's worth considering what they might take away. For which, see my previous post...
Tuesday, 5 May 2009
Downtown El Tor.
Fossilised burrows in Miocene syn-rift rocks. There's a lot of this in the study area, which usually means that structures that would help to understand the depositional environment are obscured.
Part of the field area. To the right are rocks of the Precambrian basement. In the foreground, a major normal fault separates those Precambrian rocks from Nubian sandstone, Eocene carbonate units, and Miocene syn-rift calc-arenites.
Wednesday, 8 April 2009
Meanwhile, the reply by original authors Rutten and Stolper is an exercise in evasion and obfuscation, and doesn’t really address most of the points that I made. This seems to be fairly typical (and to be fair isn’t only restricted to non-science like homeopathy). In their original paper, Rutten and Stolper claimed that “Cut-off values for sample size [i.e. the number of subjects in a trial, above which the trial was defined as “large”] were not mentioned or explained in Shang el al's [sic] analysis”. This is simply not true. So what do Rutten and Stolper have to say about this embarrassing error?
“Wilson states that larger trials were defined by Shang as “Trials with SE [standard error] in the lowest quartile were defined as larger trials”. According to Wilson this was done to predefine 'larger trials'. We agree with Wilson that this is indeed a strange way of defining 'larger trials', but it is perfectly possible to simply define larger studies a priori according to sample size in terms like 'above median' as we suggested in our paper. Shang et al did not mention the sensitivity of the result to this choice of cut-off value: if median sample size (including 14 trials) is chosen homeopathy has the best (significantly positive) result, if 8 trials are selected homeopathy has the worst result. In the post-publication data they mentioned sample sizes but not Standard Errors. Isn't it odd that the authors did not mention the fact that homeopathy is effective based on a fully plausible definition of 'larger' trials, but stated that it is not effective based on a strange definition of 'larger', but that this was not apparent because of missing data?”
So, nothing there about how they failed to properly read the paper to check what Shang et al.’s definition of larger trials was, while essentially accusing them of research misconduct. Instead, they shift the goalposts and decide that they don’t like the definition that was provided. Now, it certainly would be possible to define larger studies as being “above median” sample size. By doing this you would be including studies of smaller size than would be included using Shang’s definition. As is well understood, and as Shang et al. clearly showed, including studies with smaller sample size will give you more positive but, crucially, less reliable results. So I don’t think it was particularly odd that Shang et al. failed to abandon their definition of larger trials in favour of someone else’s definition, published three years later, that would inevitably lead to less reliable results. Rutten and Stolper state that using 8 larger, high quality trials gives the worst results for homeopathy: but to get a positive result, you would have to include at least 14 trials, as Ludtke and Rutten show in another paper in the Journal of Clinical Epidemiology. And, again, it was perfectly apparent what definition Shang et al. used to define larger trials: it is clearly stated in their paper.
OK, so why use standard error rather than simply using sample size directly, as Rutten and Stolper want to do? In meta-analyses, a commonly used tool is a funnel plot. This plots, for each study included in the analysis, standard error against odds ratio. The odds ratio is a measure of the size of the effect of the intervention being studied. If the value is 1, there is no effect. If it is less than one, there is a positive effect (the intervention outperformed placebo), if greater than one there is a negative effect (placebo outperformed the intervention). The plot is typically used to identify publication bias (and other biases) in the set of trials: to simplify, if the plot is asymmetric, then biases exist. Using their funnel plot of 110 trials of homeopathy (Figure 2 in the Lancet paper), Shang et al. were able to show, (to a high degree of statistical significance, p<0.0001)that trials with higher standard error show more positive results. It then makes perfect sense to screen the trials by standard error rather than sample size, because it has been demonstrated that standard error correlates with odds ratio. Of course, you could plot sample size against odds ratio, but that is not the recommended approach.
Rutten and Stolper also claim to be "surprised" that one apparently positive trial of homeopathy was excluded from Shang's analysis. Since it was excluded based on the clearly stated exclusion criteria, I didn't find that surprising myself. How do Rutten and Stolper respond?
"We were indeed amazed that no matching trial could be found for a homeopathic trial on chronic polyarthritis by Wiesenauer. Shang did not specify criteria for matching of trials. We would expect the authors to explain this exclusion because Wiesenauer's trial would have made a difference in meta-regression analysis and possibly also in the selection of the eight larger good quality trials".
This routine is now wearily familiar. Someone makes a claim that Shang et al. didn’t do something, in this case specify criteria for matching of trials; I check the Lancet paper, and find that claim to be false. What did Shang have to say about matching of trials? On page 727, they say “For each homoeopathy trial, we identified matching trials of conventional medicine that enrolled patients with similar disorders and assessed similar outcomes. We used computer-generated random numbers to select one from several eligible trials of conventional medicine”. And, of course, the authors did explain why the trial was excluded; it met one of the pre-defined exclusion criteria. To me, that seems clear enough. As it stands, Rutten and Stolper’s point is nothing more than an argument from incredulity. They are amazed! Amazed that no matching trial could be found. But they haven’t actually found one to prove their point. It’s possible that this Weisenauer trial might have made a difference to the selection of 8 large, high quality trials. But I doubt it would have made any significant difference to the meta-regression analysis, which was based on 110 trials.
Having wrongly accused Shang et al. of doing a bad thing by defining sub-groups post-hoc, Rutten and Stolper applied all kinds of post-hoc rationalisations for excluding trials they don’t like. For example, they decided to throw out all the (resoundingly negative) trials of homeopathic arnica for muscle soreness in marathon runners, on the basis that homeopathy is not normally used to treat healthy people, and these trials therefore have low external validity. I argued that Shang et al. had to include those studies, since they met the inclusion criteria and did not meet the exclusion criteria. On what basis could they exclude them? From Rutten and Stolper, answer came there none:
"Wilson's remark about prominent homeopaths choosing muscle soreness as indication is not relevant. Using a marathon as starting point for a trial is understandable from a organisational point of view, although doubt is possible about external validity. Publishing negative trials in alternative medicine journals is correct behaviour. There is, however, strong evidence that homeopathic Arnica is not effective after long distance running and homeopathy as a method should not be judged by that outcome".
Yes, publish the negative trials. But why shouldn’t the negative trials be included in a meta-analysis? Because they’re negative, and that just can’t be right? I don’t see any rationale here for excluding these trials.
Rutten and Stolper also take the tine-honoured approach of arguing about statistics:
“…the asymmetry of funnel-plots is not necessarily a result of bias. It can also occur when smaller studies show larger effect just because they were done in a condition with high treatment effects, and thus requiring smaller patient numbers”.
I think this is nonsense, but anyone with more statistical knowledge should feel free to correct me. If the high treatment effects are real, then the larger studies will show them as well, and there will be no asymmetry in the funnel plot. The smaller studies are always going to be less reliable than the larger ones.
Finally, Rutten and Stolper conclude that:
"The conclusion that homeopathy is a placebo effect and that conventional medicine is not was not based on a comparative analysis of carefully matched trials, as stated by the authors".
Homeopaths do want this to be true, but no matter how many times they repeat it, it continues to be false. I think the problem is that they have become fixated on the analysis of the subgroup of larger, higher quality trials, which was only one part of the analysis. The meta-regression analysis for all 110 vs 110 trials gave the same results; the analysis of the “larger, higher quality” subgroup merely lends support to those results. So after all that palaver, there’s still no reason to think that there is anything particularly wrong with the Shang et al. Lancet paper, and there is certainly no excuse for accusing its authors of research misconduct.
Friday, 20 March 2009
After a long and generally fruitless attempt at corresponding with the Elsevier production department (which has been outsourced to India, incidentally), I finally received a PDF proof of the paper in which the geological map was reproduced at A3 size. All well and good. Until the final version of the paper was published [paywall: for God's sake, don't pay $31.50 for this...if you really want a copy, e-mail me and I'll send you a PDF], and the map was back to A4 size, with much of the fine detail lost as a result.
Now, surely it isn't on for Elsevier to unilaterally make changes to an article without consulting the authors about it. I know some people who have been involved in editing this journal, and it seems they are unhappy with how it is being run by Elsevier. As Dr Aust points out, companies like Elsevier charge large amounts of money for papers, in just about the only example of publishing in which the authors don't want to be paid for producing all the content. Elsevier makes massive profits out of journal publishing, gets to hide all of the content behind ridiculous paywalls, and doesn't even make a particularly good job of the journal production. There must be a better way.
Thursday, 19 March 2009
In the RAE, departments are ranked by the proportion of research they have in five different categories, as follows:
4*: Quality that is world-leading in terms of originality, significance and rigour.
3*: Quality that is internationally excellent in terms of originality, significance and rigour but which nonetheless falls short of the highest standards of excellence.
2*: Quality that is recognised internationally in terms of originality, significance and rigour.
1*: Quality that is recognised nationally in terms of originality, significance and rigour.
Unclassified: Quality that falls below the standard of nationally recognised work. Or work which does not meet the published definition of research for the purposes of this assessment.
The three departments faced with closure had no research ranked in category 4*. According to Times Higher Education, "The university has questioned whether this is “acceptable” for a member of the Russell Group of 20 research-led institutions".
So, how did the threatened departments do overall? Here's their breakdown from the 2008 RAE (source):
Statistics: 4*, 0%; 3*, 35%; 2*, 50%; 1*, 15%; UC, 0%.
Politics and Communication: 4*, 0%; 3*, 15%; 2*, 55%; 1*, 25%; UC, 5%.
Philosophy: 4*, 0%; 3*, 25%; 2*, 60%; 1*, 15%; UC, 0%.
These results are surely not disastrously bad. In all cases, the vast majority of research is ranked at 3* and 2* levels: that is, it is considered to be internationally excellent or internationally recognised. Is this really such a poor performance that it requires the closure of the departments?
The threat of closure of these departments raises the question of what a university is actually for. If it only exists to receive as much research funding as possible, then closure is a perfectly sensible action. But if you consider the university as a community of scholars, with everyone (from undergraduates to professors) learning from each other, then closing these departments is going to contribute to the narrowing of the university experience for everyone. Is that really what the University of Liverpool wants to acheive? And is that what the Russell Group is supposed to be about?
Friday, 13 March 2009
It's difficult to over-emphasise the seriousness of this. Recommendations about best practice for pain management have been made on the strength of these studies. It is now not clear that those recommendations are appropriate. Until further studies are done to sort this mess out, people are going to be denied the best possible standard of care. Bad evidence has consequences.
What is particularly galling about this case is that it was not uncovered through the scientific method. Peer review didn't uncover it, and neither did a failure to independently replicate Reuben's results. In fact, it was eventually uncovered because it was noticed that Reuben did not have approval to conduct research on human subjects for two abstracts he had submitted for presentation. The scientific community has nothing to be proud of here. Fair enough, it's largely impossible for peer review to spot fraud: there has to be a degree of trust that the data presented is not simply fabricated. But fraudulent research has entered the literature, and had recommendations based on it. Make no mistake about it, this is a massive failure. It's no good saying that the scientific method ensures that such fraud will eventually be discovered: it didn't ensure it in this case, and by now the damage is done. The science based medicine community needs to urgently consider how this sort of thing can be prevented in future.
Here's an example of bad science having consequences in the real world. Take-up of the MMR (measles, mumps, rubella) vaccine fell in the 1990s, following research by Andrew Wakefield that suggested a connection between autism, bowel problems and the MMR vaccine. To say that this research has now been discredited is really an understatement. Not only was the research incompetent, but there is strong evidence that it was fraudulent too. All the authors of the article, apart from Wakefield and another who could not be contacted, retracted the interpretation that MMR was linked to autism and bowel problems. This piece of bad science has led, indirectly but inexorably, to three children in Manchester having to be hospitalised with measles (which is not a harmless childhood illness). That is why this stuff matters.
There is no evidence that the MMR vaccine causes autism. If you don't get your child vaccinated, not only are you putting them at risk, you're putting other children at risk. MMR is safe: tell your friends.
Monday, 2 March 2009
Here I am with my new friend, the stuffed polar bear that has pride of place in the entrance lobby of the StatoilHydro office. Apparently the bear doesn't have a name yet. That can be a project for my next visit (which will probably be in June). This polar bear isn't that big, but I reckon it could still rip your face off pretty good.
Saturday, 21 February 2009
This is how I looked when I found out the good news. I confess that I poured myself a generous Scotch in celebration.
I'm going to be in Norway for the next week for work reasons, so I won't be posting anything over that time, but then posting has been pretty patchy this year. Thanks to all those who have continued to read anyway.
Tuesday, 17 February 2009
The Rutten and Stolper paper, and a companion paper in the Journal of Clinical Epidemiology by Ludtke and Rutten, were the subject of a press release titled "New Evidence for Homeopathy" claiming to cast doubt on the Shang meta-analysis. Perhaps I should issue a press release titled "New Evidence Against Homeopathy". Then again, maybe it would be better titled "New Evidence Against Homeopaths".
Now, despite apearances, I have to say that the subject of meta-analyses of homeopathy is not one that particularly fascinates me. It's just that a number of prominent homeopaths have made claims that the Shang study is flawed and/or fraudulent. In checking the claims that have been made, mainly be simply checking the Shang paper and its supplementary data, I have almost invariably found that they are false. Apgaylard has found similarly. I find it amazing that these false accusations have propagated across the internet and been accepted as truth, without anyone apparently doing the most basic of fact checking.
Sunday, 18 January 2009
Jeremy Sherr is a prominent homeopath who claims to treat AIDS patients in Tanzania with homeopathy, based on the usual poor quality anecdotal evidence and wishful thinking. He recently caused a stir in the badscience blogosphere, with a blog entry in which he mused about conducting a trial of homeopathy in AIDS patients. The mooted trial is transparently unethical, as pointed out by a number of sceptical bloggers (notably in the comments on Sherr's blog, and at Gimpy's blog, here and here, and The Lay Scientist, here and here), and as we'll see below. Now, Sherr likes to edit his blogposts and delete comments, but let's have a look at what Sherr had to say about the trial.
I am happy to go for a simple trial initially, treating AIDS patients who are not taking ARVs. There is no shortage of patinets who, although they have been offered ARVs, have chosen not to take them, usually because of the serious and debilitating side effects. There are plenty of statistics on ARV treatment and patients with no treatment at all that we can compare to.
Why is this unethical? I would say for three reasons. Firstly, it is a general principle of medical ethics that patients in a clinical trial should not be denied proven treatments for their condition. Clearly, in this trial AIDS patients would be denied ARVs. Sherr seems to think that this is OK because his patients have decided not to take ARVs themselves, but this is, I think, irrelevant. You would still be running a trial in which the subjects are not receiving the best possible standard of care. The issue of informed consent is again critical here; patients would need to be informed that not taking ARVs could be severely damaging to their health.
Secondly, the trial as mooted will not provide any useful information, because there is no control group. Whatever happens in the trial, it is impossible to say that it happened because of homeopathy, rather than sources of bias in the trial design. Since the trial could not provide any usable information, it would be unethical.
Thirdly, I would say we have enough evidence and knowledge about homeopathy to say that it is not going to cure AIDS. Given that there is no likelihood of a true positve result, it is unethical to involve patients in a clinical trial. Informed consent comes into play again here: patients in the trial ought to be told that the current evidence shows that there's essentially no chance of homeopathy having any beneficial effects beyond placebo.
Sherr says, in the recent blog post in which he calls the waaaaambulance over the criticism he has received, that "Any research I may undertake will be subject to rigorous ethical review of the highest standard". Hopefully that will in fact happen, in which case the mooted trial will surely not go ahead. What is disturbing is that Sherr has stated in the past, referring to research protocols and ethics review, that "You have to find willing partners and get a protocol through an ethics committee, and you need to talk their language. I hope it will work but if not, I will just go and do it on a small scale myself - I am determined to do that". This is the most telling comment, I think: it makes it clear that Sherr is not really interested in medical ethics, except as a hoop he must reluctantly jump through in order to experiment on terminally ill patients. And if he can't get ethics approval, he'll just do it anyway.
So much for Sherr. Disturbingly, however, he seems not to be anomalous in CAM circles in his total lack of any sense of ethics. A review of a book "Complementary and Alternative Medicine: Ethics, the Patient, and the Physician” has just been posted on the Science Based Medicine blog. The reviewer comments that "We do not read a word about how to approach a patient who has suffered damage due to CAM, or how to approach those who have stopped their regular treatment" [emphasis mine]. One would have thought that this would be a key issue for any book purporting to address ethics in CAM.
As one of the commenters to the Science Based Medicine piece astutely points out, CAM is a "deprofessionalization phenomenon". Researchers in the field of CAM seem to have no idea about research ethics, and no idea about the linked issue of how to conduct good research. If a text on ethics in CAM is so careless of these important ethical questions, how can we expect CAM practitioners to be any more careful?
UPDATE: There is now an excellent and comprehensive post on the Sherr saga at Respectful Insolence...
Saturday, 10 January 2009
In CAM research this pattern is often not followed. Once systematic reviews and meta-analyses start to show that there is no evidence that the CAM treatment works, more small trials of poor methodology are conducted, many of which inevitably have (spurious) positive results. This allows CAM advocates to claim that there is lots of evidence in favour of their intervention, because they don't bother to account for study size and quality.
There is a fine example of this CAM tradition in the latest issue of Homeopathy. A systematic review, published in 2006, of homeopathy for treatment of allergic rhinitis concluded that "Some positive results were described in rhinitis with homeopathy in good-quality trials, but an equal number of negative studies counterbalance the positive ones. Therefore it is not possible to provide evidence-based recommendations for the use of homeopathy to treat allergic rhinitis, and further randomized controlled trials are needed". Well, perhaps: some would argue that the prior probability (close to nil) and currently existing evidence are enough to conclude that homeopathy does not work for allergic rhinitis (or, indeed, anything else). Be that as it may, it should be clear that the only useful new evidence would come from large and well-conducted RCTs. So what do Maria Goossens and a football team of colleagues do in the latest issue of Homeopathy? Why, publish a "prospective, open, non-comparative study" on homeopathy and allergic rhinitis, of course.
The methodology of the study consists of "treating" some patients suffering symptoms of allergic rhinitis with homeopathy, and getting them to fill in a quality of life questionaire at the start of the study, and after three weeks and four weeks. The physicians involved also assessed the severity of symptoms at baseline, three weeks, and four weeks. Unsurprisingly, the study found that people felt better with homeopathic treatment. But the methodological problems with this study are straightforward to point out. There is no control group. As a result, there can be no randomisation or blinding. Don't take my word for it; here's what the authors say in the discussion of their paper:
We did not distinguish between intermittent and persistent allergic rhinitis. All patients with intermittent allergic rhinitis (symptoms present less than four consecutive weeks a year) will be better after four weeks without any treatment. Patients who consult a homeopathic physician for allergic rhinitis usually have been suffering for a long time and from severe symptoms as the high level of the RQLQ score at baseline indicates. This study cannot be conclusive because there is no control group. Neither the physician, nor the patient was blinded. We cannot conclude that the degree of certainty of the physician about the appropriateness of the homeopathic prescription of a homeopathic remedy and the physician’s impression whether he had sufficient information about the patient’s condition influenced the outcome...it is not possible to draw a conclusion on the effect of the homeopathic treatment. This would require an RCT. To evaluate the effect of homeopathic treatment for allergic rhinitis an RCT should be performed.
So there you have it. The study cannot come to any useful conclusions. And, in the introduction to the paper, the authors write "This study was originally considered as a preliminary to a Randomized Clinical Trial (RCT) comparing standard conventional therapy with homeopathy (non-inferiority study). The RCT was never performed because sponsorship was withdrawn".
OK, that's life. Sometimes planned research funding fails to come off. These things happen. But why then publish the pilot study? Methodologically, it is useless, and it could never have added anything to the previously existing evidence from RCTs and systematic reviews. This study would never have been published anywhere other than a CAM journal, where scientific usefulness can take a back seat to an ideological desire to publish any evidence that looks as though it is in favour of homeopathy, no matter how methodologically weak it is, and in defiance of the higher level evidence that already exists.
Well, no-one is going to die from allergic rhinitis, so how much does it matter? The problem is that homeopaths don't stop at self-limiting conditions like hayfever. Some insist that homeopathy is a complete system of medicine and it can cure anything, including AIDS and malaria. Ben Goldacre's miniblog points to Jeremy Sherr's blog, for example, where Sherr is preparing to begin an unethical experiment on AIDS sufferers. This is a long road of madness, to be sure, but it begins where people believe they can cure hayfever through the use of magic sugar pills.
Friday, 9 January 2009
In any case, that was the last ever RAE. It has been a fairly cumbersome process, involving expert peer review of the research contribution of research institutions, that has been a real burden on the academics who have had to administer it. I’m sure there are few who will mourn its passing. Now the world of English academia is waiting, like so many rats in an experimental maze, to find out what will replace the RAE. The replacement will be a thing called the Research Excellence Framework, or REF, and at this stage exactly what it will involve is fairly sketchy. However, it will be based on the use of bibliometrics (statistical indicators that are usually based on how much published work is cited in other publications) and “light-touch peer review”.
What kind of bibliometric indicators are we talking about? Last year HEFCE (the Higher Education Funding Council for England, the body that evaluates research and decides who gets scarce research funding) published a “Scoping study on the use of bibliometric analysis to measure the quality of research in UK higher education institutions” produced by the Centre for Science and Technology Studies at the University of Leiden, Netherlands. I’ve spent a fair amount of time reading through this, and in some ways I was encouraged. It’s clear that some thought has gone into creating bibliometric indicators that are as sensible as possible: I was dreading a crude approach based around impact factors, which have already done so much damage to the pursuit of good science. The authors of the “scoping study” came up with an “internationally standardised impact indicator”: I will abbreviate this as ISII for concision. The ISII takes the average number of citations for publications for the academic unit you are interested in (this might be a research group, an academic department or an entire university), and divides it by a weighted, field-specific international reference level. The reference level is calculated by taking the average number of citations for all publications in a specific field: if the publication falls under more than one field (as many will in practice), the reference level can be calculated as a weighted average of the number of citations generated by publications in all the fields in question. So, if the ISII for your research group comes out as 1, you’re average, if above 1, better than the average, and if below 1, worse than the average. The authors of the scoping study say that they regard the ISII as being “the most appropriate research performance indicator”, and suggest that a value of >1.5 indicates a scientifically strong institution. They also suggest a threshold of 3.0 to identify research excellence. It seems that the HEFCE is expecting to adopt the ISII as the main research performance indicator, according to their FAQs, where they say “We propose to measure the number of citations received by each paper in a defined period, relative to worldwide norms. The number of citations received by a paper will be 'normalised' for the particular field in which it was published, for the year in which it was published, and for the type of output”. However, they are still deciding what thresholds they will use to decide which institutions are producing high-quality research.
All well and good. If you insist that bibliometric indicators are necessary, this is probably as good a way as any of generating those data. However, there are some problems here, as well as philosophical difficulties with the entire approach.
Firstly, what is it we are trying to measure? In theory, what HEFCE wants to do is evaluate research quality. But the ISII does not directly measure research quality. Like any indicator based on citation rates, it is measuring the “impact” of the research: how many other researchers published papers that cited the research. It ought to be clear that while this should reflect quality to some degree, there are significant confounding factors. For example, research that is done in a highly active topic is likely to be cited more than research in which fewer groups are working. This does not mean that work in less active topics is of intrinsically lower quality, or even that it is less useful.
Secondly, there is an assumption that the be-all and end-all of scientific research is publication in peer-reviewed journals that are indexed in the Web of Science citation database published by Thomson Scientific. This a proprietary database that lists articles in the journals that it indexes, and also tracks citations. Criteria for journals to be included are not in the public domain (although the scoping report suggests these are picked based on their citation impact, p. 43). A number of journals that I would not consider to be scientifically reputable are included. For example, under the heading of Integrative and Complementary Medicine, the 2007 Journal Citation Reports (a database that compiles bibliometric statistics for journals in the citation database) includes 12 journals, including Evidence Based Complementary and Alternative Medicine (impact factor 2.535!) and the Journal of Alternative and Complementary Medicine (impact factor 1.526). This reinforces the point made above: it would be possible to publish outright quackery in either of these journals, have it cited by other quacks in the quackery that they publish, and get a respectable rating on the ISII. The ISII can’t tell you that this is a vortex of nonsense: it only sees that other authors have cited the work. It is also true that not all journals are included in the citation index: for example, in my own field the Bulletin of Canadian Petroleum Geology fails to make the cut, although it has always published good quality research. Although the authors of the scoping report make clear that it is possible to expand bibliometrics beyond the citation database, this will take much more effort and it seems that HEFCE will not take this route. So we will be relying on a proprietary and opaque database to make decisions on future research funding. A further point is that it is not clear how open access publications will be incorporated in the citation index: in principle there is no reason that this can’t happen, but can we be sure it will?
Thirdly, there is the assumption that research output can only be evaluated in terms of published articles in peer-reviewed journals. I’m not sure that this accurately reflects the actual research output of many scientists. For example, most of us put a lot of effort into presentations at scientific conferences, chapters in books, or government reports that will never make it into a citation database. This has become a problem for things like, in my own field, the special publications of the Geological Society of London. These are volumes that collect recent research on specific topics, and they generally contain excellent research. But they aren’t included in citation databases and they have no impact factor. This has led to a lack of interest in publishing results in these special publications, because they don’t tick the right boxes in terms of publication metrics. This is surely a bad thing. A similar problem occurs with things like government open-file reports. These are not, in general, pieces of world-class, cutting edge research. But that does not mean that they are useless or that they have no value. For example, good regional geological work can allow mineral exploration to be better targeted, benefiting the local economy. Yet that kind of work is ignored in a framework that only considers journal articles: HEFCE says only that “We accept that citation impact provides only a limited reflection of the quality of applied research, or its value to users. We invite proposals for additional indicators that could capture this”. To me, research quality and value cannot be measured by bibliometric indicators. It can only be evaluated by reading the research, understanding its context within the totality of pre-existing research, and understanding how it contributes to new understanding. That is, it can only be evaluated through peer review.
Which brings me to my fourth point; there are some questions about the role of peer review within the REF. HEFCE says that “the scoping study recommends that experts with subject knowledge should be involved in interpreting the data. It does not recommend that primary peer review (reading papers) is needed in order to produce robust indicators that are suitable for the purposes of the REF”. However, I’m not convinced that this accurately summarises what is written in the scoping report, which says “In the application of indicators, no matter how advanced, it remains of the utmost importance to know the limitations of the method and to guard against misuse, exaggerated expectations of non-expert users, and undesired manipulations by scientists themselves…Therefore, as a general principle we state that optimal research evaluation is realised through a combination of metrics and peer review. Metrics, particularly advanced analysis, provides the tools to keep the peer review process objective and transparent. Metrics and peer review both have their strengths and limits. The challenge is to combine the two methodologies in such a way that the strengths of one compensates for the limitations of the other”.
Finally, there is a hint of conflict of interest in the preparation of the scoping report by the Centre for Science and Technological Studies: according to their website, the centre is involved in selling "products" based on its research and development in the area of bibliometric indicators. Their report in favour of bibliometric indicators might allow them to drum up significant business from HEFCE.
At present, the proposals for the REF are at a fairly early stage, but the use of bibliometric indicators seems to be entrenched, and there will be a pilot exercise on bibliometric indicators this year. However, this is based on “expert advice” that consists of a single report from an organisation that makes money by creating bibliometric indicators. While academia in general might welcome the proposals on the grounds that they will be less burdensome than the RAE and give everyone more time to do research, I don’t think many academics will be kidding themselves that the bibliometric indicators involved actually tell us much about research quality and usefullness.