Fake news, lies, fraud, errors and statistics. St.Emlyn’s

fraud statistics researchOne of the most inspiring talks at SMACC Dub was delivered by John Carlisle on the subject of fraud in medical publishing. It was inspiring for many reasons, the topic is of real interest to anyone who thinks that we should base our practice on the scientific method, but also because it was an absolutely fabulous delivery. He’s a really inspiring presenter and if you’re looking for someone to speak at one of your conferences, give him a call.

You can listen to the talk from the SMACC site by clicking on the link below slides and more are here

Anyway, this week Mike Charlesworth pointed us in the direction of a much more ambitious analysis of papers in a range of anaesthetic journals and also in JAMA and the NEJM. This is BIG NEWS folks. If you are interested in science and EBM then you really need to have a read and then spend some time reflecting on what this means. The paper is Data fabrication and other reasons for non-random sampling in 5087 randomised, controlled trials in anaesthetic and general medical journals1. There is also an excellent editorial in the same journal here2.

The abstract is shown below, but as always we urge you to read the full paper which is open access at the moment (so there is no excuse 😉 ).

Tell me about the paper

This is a huge analysis that specifically looks at randomised controlled trials. RCTs are the foundation of evidence based medicine and rely on the fact that the patients are fairly recruited, that the data is analysed well and that they are an honest representation of what happened. Sadly we know that this is not always the case. Research errors do occur of course, but sadly there is also lots of evidence of research fraud out there. The reasons are complex but perhaps the pressure to publish and to build academic careers based on the pursuit of impact factors and glory are too great a temptation.

John Carlisle is seemingly on a mission to identify those papers where there are statistical abnormalities that suggest that things are not quite as they shouyld be. Simply put this paper delivers a statistical analysis of RCTs to look for those areas where the numbers don’t really add up.

What’s the technique?

You can read the details for yourself of course3,4, and in truth it is quite tricky to work through for anyone without a reasonable graps of statistics, but in essence the Carlisle method looks at the baseline data in RCTs. The baseline data is collected randomly of course, that’s the point of an RCT, and therefore it should have an expected level of variability and uncertainty. If that variability and uncertainty is not there then that is a concern. It could be an error, a misprint or it could mean that the data has been made up or manipulated, presumably to achieve a desired result.

Of course the thing about randomness is that it is random and therefore unexpected results are really to be expected and so it’s always possible that a really bizarre result may occur by chance. We need to be careful then at setting the bar sufficiently high that it’s REALLY unlikely that the baseline data occured by chance. Indeed the paper reveals some reasons why extreme examples might occur (as a result of mislabelling of SD as SE for example). However, the distribution of some of the baseline data is really unlikely. Incredibly unlikely in fact and thus we really have to ask questions of why?

The data is described as a p value. Carlisle has stratified the papers according to how unlikely the results are to have arisen by chance. At conventional levels of significance he found reasonable numbers of unexpected results (which is what you would expect at a fairly high level), but there is an excess of papers at the very extreme levels. For example he found that about 2% of papers had distributions that should have occured on less than 1 in 10,000 occasions. That simply does not add up at all and is highly suspicious.

We know that research fraud exists and John’s original work uncovered fraud in an incredible piece of work analysing 168 RCTs in 201256, and in later studies identifying other fraudulent publications. Have a look at the retraction watch website and in particular the leaderboard here. Of note is the infamous Yoshitaka Fujii  who has 183 retractions to his name. The bottom line here is that fraud does take place and we should be worried about it and it’s fantastic that people like John are working to hold them to account.

Which trials?

Whilst the trials are not listed in the paper the publication details (year, volume, pages) are and so you can go back and work out which are which. I’ve not done this myself but has a great blog on the paper and has sourced some of the articles into a bit of a league table. Visit his excellent blog here for more. Note that some of these trials have been very influential indeed. ‘s blog also has responses from some of the major journals. They are being cautious but it’s clear that they are worried.

We should also be a little cautious in pointing fingers and crying foul to anyone on the list. As Nick Brown discussed on this blog, there are some concerns amongst some commentators about the methods and conclusions and I have to admit that a detailed critical appraisal of the method is beyond my stats abilities. I expect that we will hear more in the next few months and weeks. There is an excellent set of comments and explanations from John on Nick Brown’s blog and that’s worthy of a read too.

What happens now?

The journal has provided a list of studies ranked by how unlikely they think the results are and logically we should look to that list first to identify papers that look especially worrisome. The question is how far do we go down the list? If we only look for the most extreme results, those reported as p<0.0001 for example it may give us a specific test, but many may slip through. Similarly if a less onerous level is chosen then inevitably some honest researchers may be subjected to an uncomfortable and potentially career changing investigation.

Should we put a moratorium on time, as some journals have who decline to reaxamine papers more than 10 years old? Should we put the responsibility to check on the editors and journals, or is this a responsibility for research institutions, universities, hospitals or even the professional regulator? The editorial and paper appears to suggest that this is one for the editors and journals with Anaesthesia declaring that it would screen all RCTs using this method back in 20167, but that may just push submissions away from this journal and onto those without the will or the means to conduct this sort of analysis (Ed – there are so many journals out there of course). Personally I’m not so sure that this is an entirely editorial issue. I suspect and anticipate that if fraud is suspected (once obvious typographical and typesetting errors are resolved) then this is as arguably more a regulator and employment issue than an editorial one. As suggested in the paper, research fraud appears to be addictive and there are numerous examples of repeat offenders and even fraud networks8, although the scale of this may be hidden from the regular literature910,. Perhaps we should pay special attention to any researcher or group that has more than one paper with concerning data. A single random result may well be random, but if they crop up on a regular basis in repeated publications then that would certainly be worthy of investigation.

In the UK research fraud is a serious probity issue and there are several examples of doctors being struck off or censured for research fraud. My feeling is that there may be many researchers on the list who might be nervously fingering their registration documents this week. On the other hand it’s entirely possible that researchers may find their name on one of the lists without any underlying fraud. Whatever happens this is going to cause angst for the guilty, the innocent and those that just don’t know.

Looking to the future then perhaps we need to consider how this kind of analysis might be a requirement prior to publication, and of course making source data available for review may be another strategy to combat errors (but that presents a whole other set of concerns around data ownership).

No doubt there will be other papers looking at different topic areas (emergency medicine and critical care anyone?), and we might do further comparisons where comparing trials with explicit data monitoring committees are used. Trials with data monitoring committees would presumably have less opportunity for fraud, but perhaps not if they were not sighted or aware of a need to look for problems.

Does this paper open a Pandora’s box of research error and fraud or are there genuine errors or statistical explanations? I don’t know for sure, I’m sure it will generate a lot of debate and it will no doubt reveal many errors and almost certainly some fraud. It may also be a real problem for all of us who seek to practice EBM. We have all seen how politicians can dismiss ‘science’ as fake news for their own ends and it’s easy to see how a paper like this could be misused to denigrate all science and the hard work of research teams across the world. We all need to be careful not to extend the conclusions of this paper beyond the true scope of the analysis.

If you have an interest in evidence based medicine or science in general then you really must read this paper; then take a moment to reflect on what this means for us all. In John Carlisle’s comments on Nick Brown’s blogs he suggests that many papers on the list will be a result of error rather than fraud, but that within some of the data signals there will be the opportunity to detect fraud. How much and how often is as yet uncertain. No doubt this topic will come up in a few weeks on stage at the Temprodrom during the Publishing and the future of critical care knowledge dissemination redux debate in Berlin (Ed – part 1 from Dublin is here). There will be lots to talk about then, including this trial and also to raise the question of how we spot fraud and error in non-RCTs where this sort of analysis cannot take place.

Finally, and just in case you are some kind of weird conspiracy theorist I think it’s important to state that it’s really very, very, very unlikely for John Carlise to have faked his own data!




Before you go please don’t forget to…


Great blog from forbetterscience here

Another fab blog from Nick Brown here that questions some of the methodology

Carlisle JB. Data fabrication and other reasons for non-random sampling in 5087 randomised, controlled trials in anaesthetic and general medical journals. Anaesthesia. June 2017. doi: 10.1111/anae.13938
Loadsman JA, McCulloch TJ. Widening the search for suspect data – is the flood of retractions about to become a tsunami? Anaesthesia. June 2017. doi: 10.1111/anae.13962
Carlisle JB, Dexter F, Pandit JJ, Shafer SL, Yentis SM. Calculating the probability of random sampling for continuous variables in submitted or published randomised controlled trials. Anaesthesia. 2015;70(7):848-858. doi: 10.1111/anae.13126
Pandit JJ. On statistical methods to test if sampling in trials is genuinely random. Anaesthesia. 2012;67(5):456-462. doi: 10.1111/j.1365-2044.2012.07114.x
Carlisle JB. The analysis of 168 randomised controlled trials to test data integrity. Anaesthesia. 2012;67(5):521-537. doi: 10.1111/j.1365-2044.2012.07128.x
Carlisle JB, Loadsman JA. Evidence for non-random sampling in randomised, controlled trials by Yuhji Saitoh. Anaesthesia. 2016;72(1):17-27. doi: 10.1111/anae.13650
Klein A. What Anaesthesia is doing to combat scientific misconduct and investigate data fabrication and falsification. Anaesthesia. 2017;72(1):3-4. [PubMed]
Leistedt SJ, Linkowski P. Fraud, individuals, and networks: A biopsychosocial model of scientific frauds. Science & Justice. 2016;56(2):109-112. doi: 10.1016/j.scijus.2016.01.002
Seife C. Research Misconduct Identified by the US Food and Drug Administration. JAMA Intern Med. 2015;175(4):567. doi: 10.1001/jamainternmed.2014.7774
Fanelli D. How Many Scientists Fabricate and Falsify Research? A Systematic Review and Meta-Analysis of Survey Data. Tregenza T, ed. PLoS ONE. 2009;4(5):e5738. doi: 10.1371/journal.pone.0005738
Longo DL, Drazen JM. Data Sharing. N Engl J Med. 2016;374(3):276-277. doi: 10.1056/nejme1516564


  1. David Hartin

    Thanks Simon.

    For once I have read the paper & the accompanying blog posts. I’m afraid my statistical powers are inferior to most so I made heavy weather of most of the paper (I’m not even going to Google what the Kolmogorov-Smirnov test is, through it sounds like it should be a brand of premium vodka…) but the implications at the bedside level are stark. What are we to believe? If research fraud (or more generously, ‘error’) is as endemic as this study indicates it is, this calls into question every pill, every infusion, every treatment that we recommend. Our guidance, national & local may be based on data that is worthless. I’d hope that NICE are already planning on reviewing therapeutic guidance & the raw data it’s based on urgently. These are lives and public money we’re talking about here. The ramifications if even only a small percentage of this can be extrapolated to the entire medical publishing world is, well, gigantic.

    I have been becoming somewhat of a #therapeuticnihilist over the past few years, mainly as a result of superficial reading of the ‘evidence’ for what we do and the downright sketchy nature of most of it. I’m a persuader for doing less, as little as we can get away with because mostly, our patients will get better (or worse) without us – or despite us. Is Carlisle’s paper a first step on a path to a revolution (or revelation), to enlightenment about the nature of contemporary clinical medicine? Possibly, I guess we’ll have to wait for more retrospective reviews to decide but I’m strapping myself in.

    1. Simon Carley (Post author)

      Thanks Dave, I’m not quite so down about it. Some of the errors found may not actually affect the results of the trial or the validity of the findings (if they are simple typos for example). So I think we don’t yet know how concerned we should be as yet. I think we’ve always known that fraud is out there, it’s just that we now have a way of semi automatically looking for it.

      As for Kolmogorov-Smirnov it’s one of my favourite statistics 😉 Basically it can be used as a test of whether data is likely to be normally distributed or not. Useful to do when you are considering whether tests that assume a normal variation (e.g. t-tests). Other tests for this are available.

      A long time ago I did an MPhil in Biostatistics and Clinical Epidemiology….. I’ve forgotten a fair bit, but the Vodka test seems to have hung in there for some reason 😉


  2. Youri Y

    “emergency medicine and critical care anyone?” done in EM and under review in a EM journal !

    1. Simon Carley (Post author)

      Excellent 🙂


  3. pedrinhadeazucar

    There is at leas one naysayer :
    But I fail to elaborate a personal and versed opinion on the issue.

    Eager for comments.


Thanks so much for following. Viva la #FOAMed

Translate »