Uri Simonsohn’s “secret” paper describing the analyses he used to detect fraud in the Dirk Smeesters and Larry Sanna cases has now been submitted for publication and is available on SSRN. It’s titled “Just Post It: The Lesson from Two Cases of Fabricated Data Detected by Statistics Alone.” Simonsohn explains the analyses he used to detect and confirm the fraud and calls on journals to make the publication of raw data their default policy. Here’s the abstract:

I argue that journals should require authors to post the raw data supporting their published results. I illustrate some of the benefits of doing so by describing two cases of fraud I identified exclusively through statistical analysis of reported means and standard deviations. Analyses of the raw data provided important confirmation of the initial suspicions, ruling out benign explanations (e.g., reporting errors; unusual distributions), identifying additional signs of fabrication, and also ruling out one of the suspected fraudster’s explanations for his anomalous results. If we want to reduce fraud, we need to require authors to post their raw data.

Simonsohn also explains the measures that he took to avoid making false accusations, including replicating his analyses across multiple papers by a single author, analyzing the raw data, and contacting the authors directly about his concerns, and only then expressing his concerns, discretely, to those who are entrusted with these sorts of investigations. The paper makes it very clear that Simonsohn took a very thoughtful and conservative approach at every turn and took great pains to rule out the possibility that there could be a perfectly innocent explanation for the statistical anomalies he discovered. Here are the measures he took in point form:

- Replicate analyses across multiple papers before suspecting foul play by a given author,
- Compare suspected studies to similar ones by other authors,
- Extend analyses to raw data,
- Contact authors privately, transparently, and give them ample time to consider your concerns,
- Offer to discuss matters with a trusted statistically savvy advisor,
- Give the authors more time.
- If after all this suspicions remains, convey them only to entities tasked with investigating such matters, and do so as discretely as possible.

I’m sure there will be a lot of discussions of the techniques Simonsohn employed and whether more fraud will be uncovered by Simonsohn or others in the weeks and months to come. As I argued here, fraud is a problem that psychologists need to take very seriously, but no less seriously than the much less egregious misdemeanors that are committed, often unintentionally, on a regular basis. Luckily, we have someone like Uri Simonsohn leading the charge on both fronts. Hopefully journals will take action on Simonsohn’s recommendation to publish raw data, as well as some of the solutions he put forward with his co-authors in *Psychological Science *last year aimed at reducing false positive findings. For more discussion other important reforms to consider, see this list compiled by Chris Chambers.

## 14 comments

## 1 ping

Skip to comment form ↓

## Sanjay Srivastava

July 20, 2012 at 5:44 pm (UTC -6) Link to this comment

Many interesting things in that paper — there’s going to be a lot fo chew on. One thing that stood out is that the analysis contradicts Smeesters’ claim (made in the Erasmus report) that he had real data and only dropped subjects. It looks much more likely that Smeesters fabricated the entire dataset manually.

## Dave Nussbaum

July 20, 2012 at 7:41 pm (UTC -6) Link to this comment

I think that’s exactly right. I’ll be updating when I get a chance with a little more on the stats. In particular, I think one really easy to understand analysis of Smeesters’ WTP studies makes it really obvious that this was not just a matter of dropping a few uncooperative data points.

## Vick

July 20, 2012 at 6:53 pm (UTC -6) Link to this comment

Strange, when first announced Simonsohn said the paper would have four cases. Stapel, Smeesters + 2 others, one of which we now know is Sanna. Where’s the analysis of Stapel and the fourth -so far unamed- case?

## Joanna Anderson

July 21, 2012 at 1:02 pm (UTC -6) Link to this comment

Thank you for posting that, Dave. Simonsohn’s paper was an interesting, and actually reassuring, read. It definitely does come across that Smeesters must have done more than simply fudge or remove a few data points, which assuages some of my witch-hunt concerns. I’ll be interested to read your followup on this.

Jo

## Dave Nussbaum

July 22, 2012 at 10:04 am (UTC -6) Link to this comment

Thanks Joanna, I’ve posted a brief follow-up, I’d love to hear your thoughts. Dave

## Dorothy Bishop (@deevybee)

July 22, 2012 at 3:28 am (UTC -6) Link to this comment

I’ve written a little R program that illustrates the first part of Simonsohn’s method, which can be found here:

http://tinyurl.com/bobasdw

## Dave Nussbaum

July 22, 2012 at 10:06 am (UTC -6) Link to this comment

Thanks for sharing the link Dorothy — I would love a quick explainer of what the program does.

## Stephane

July 23, 2012 at 9:50 pm (UTC -6) Link to this comment

I am surprised the paper does not mention Benford’s law. It has been used extensively to detect manipulated data.

## Dave Nussbaum

July 24, 2012 at 8:45 pm (UTC -6) Link to this comment

Here’s a link to the Wikipedia page for Benford’s Law if anyone’s curious. Also, here’s a paper on using it to detect scientific fraud.

## Stephen

August 1, 2012 at 4:56 am (UTC -6) Link to this comment

Hi all,

this is a very interesting blog and an important topic. Given the R-Code of Dorothy, I ran some simulations testing the suggested procedure in the paper to detect fraud. Specifically, I tested whether the procedure can detect the most basic form of fraud (see the page of Richard Gill at Leiden University), namely deleting the lowest values from one sample (e.g., the experimental group) and the highest values from another sample (e.g., the control group). Across different sample sizes, I found that the performance is not as good as I thought (about 4% of the fraud cases are detected). I was wondering whether any of you has also ran some simulations and wants to share the results?

I will attach the R-code below.

Best, Stephen

## R-code

## Dorothy’s code as a function

SimonsohnDorothy <- function(mymean,mySD,myN,myiter=100000) {

require(MASS)

# determine number of groups and critical sd

ngroup=length(mymean)

critSD=sd(mySD)

# basics for the simulation

meanSD=mean(mySD)

countcrit=0

allsd=rep(0,length(ngroup))

# create samples and check whether mean of SD is lower, higher, equal to meanSD

for (i in 1:myiter) {

# create random samples for each group and compute their stdev

for (j in 1:ngroup) {

mysim=as.vector(mvrnorm(n=myN[j],mymean[j],meanSD^2))

simSD=sd(mysim)

allsd[j]=simSD

}

# compute sd of sd in groups and compare it with critSD

mysdsd=sd(allsd)

if (mysdsd < critSD) countcrit=countcrit+1

}

p=countcrit/myiter

return(p)

}

## a simulation of fraud differences (N = 20 observations in both groups

results <- matrix(NA,ncol=2,nrow=1000)

for (i in 1:1000) {

# generate two random samples

S1 <- rnorm(25,0,1)

S2 <- rnorm(25,0,1)

# delete the five lowest values from the first sample

# and the five highest values from the second sample

S1fake <- sort(S1)[6:25]

S2fake <- sort(S2)[1:20]

# test whether there is a significant difference

test <- t.test(S1fake,S2fake)

results[i,1] <- test[[3]]

# using the procedure

mymean=c(mean(S1fake),mean(S2fake))

mySD=c(sd(S1fake),sd(S2fake))

myN=c(length(S1fake),length(S2fake))

temp=SimonsohnDorothy(mymean,mySD,myN,myiter=1000)

results[i,2]=temp

}

sum(results[,1] <= 0.05) # number of significant differences

sum(results[,2] <= 0.05) # differences that are detected as fraud

sum(results[,1] <= 0.05 & results[,2] <= 0.05) # detected fraud significant differences

## Dorothy Bishop (@deevybee)

August 1, 2012 at 5:09 am (UTC -6) Link to this comment

I don’t think Simonsohn would be surprised that the method doesn’t catch all fraud – it’s a very specific kind of problem it will detect, and he has bent over backwards to avoid identifying cases except where the data are exceedingly improbable.

The net result is that if the method does detect a problem, I think you can be pretty sure it’s real, especially if it replicates across a person’s experiments and publications. If it doesn’t detect the problem, there’s still no guarantee the data are solid.

## Stephen

August 1, 2012 at 6:18 am (UTC -6) Link to this comment

Hi Dorothy,

thanks for your reply. I was just playing around with the code given

that in some blogs and articles (e.g., Science and Nature) it is

introduced as THE new fraud detection technique (although they may also speak of the other techniques–using the raw data–in the paper). Furthermore, given that

each test has a false-positive rate of 4-5% (a simulation shows that

this applies to the suggested procedure too), I was (and l am) surpised

by the behavior of it. Hence, I asked whether anyone of you has made

some simulation too.

Best, Stephen

## How to use web 2.0 social websites

September 23, 2012 at 12:32 pm (UTC -6) Link to this comment

Remarkable things here. I’m very happy to see your post. Thanks a lot and I’m looking forward to touch you. Will you kindly drop me a e-mail?

## Albert Donnay

January 4, 2014 at 2:01 pm (UTC -6) Link to this comment

In my experience investigating cases in the biomedical literature, step 5 should be a directive, not an offer, to “Consult with a trusted statistically savvy advisor”.

And at least for non-statisticians, Step 5 should precede, not follow, step 4 [contacting the authors with your concerns].

It is always good for the investigators to get an independent second opinion on questions of statistics before they contact the authors as this may spare them from making allegations without merit, which is in everyone’s best interest.

I also think item #1–checking any similar studies by the authors’ for evidence of similar misconduct–is useful to determine the extent of the rot, but a more appropriate first step is simply looking up all the co-authors on pubmed to see if they already have any corrections or retractions.

Since all fraudulent careers start with one fraudulent paper, investigators of any particular case should keep in mind that they may be looking at the authors’ first example. Given that most cheats rack up quite a few fraudulent papers before they are caught, however,

the odds of catching someone’s first fraud seem to be lower than the odds of catching their later frauds.

## Just Post It (update) » Random Assignment

July 22, 2012 at 9:32 am (UTC -6) Link to this comment

[…] « Simonsohn’s Fraud Detection Technique Revealed […]