The recent news of Dirk Smeesters’ resignation is certainly not good news for social psychology, particularly so soon after the Diedrik Stapel case, but I believe it can serve as an opportunity for the field to take important steps towards reform. The reforms that are needed the most, however, are not restricted to preventing or detecting the few instances of fraud by unscrupulous researchers who are intentionally falsifying data. What we should be more concerned about are the far less egregious, but much more common offenses that many of us commit, often unknowingly or unintentionally, and almost never with fraudulent intent.
In an excerpt from his new book that appeared in the Wall Street Journal last month, Dan Ariely (@danariely) relates an anecdote about a locksmith who explains that, “locks are on doors to keep honest people honest.” Ariely goes on to say that:
One percent of people will always be honest and never steal. Another 1% will always be dishonest and always try to pick your lock and steal your television; locks won’t do much to protect you from the hardened thieves, who can get into your house if they really want to. The purpose of locks, the locksmith said, is to protect you from the 98% of mostly honest people who might be tempted to try your door if it had no lock.
The same is true of psychologists. A very small fraction of us are “hardened thieves” who are intent on falsifying their data. This is a problem and we should do our best to identify these individuals and expunge their publications from the field. But the vast majority of psychologists are honest people who – while motivated to produce publications and to get jobs, promotions, and prestige – are genuinely committed to the pursuit of the truth. It has become increasingly clear, however, that it is easy for the pursuit of the truth to be derailed by seemingly minor methodological mistakes that yield significant effects, even where none exist. These mistakes – or doors, to stick with the locksmith’s metaphor – are usually entered into without fraudulent or malicious intent. That’s why we need locks: to keep honest people from wandering through these doors.
So why would someone honest walk through a door with no lock on it in the context of psychology research? I believe there are two reasons: rationalization and ignorance – they are able convince themselves that it’s not a major problem, or they simply don’t know any better. We could take a very significant step towards improving both of these issues by adopting the recommendations put forward by Joe Simmons, Leif Nelson, and Uri Simonsohn in their recent False Positive Psychology (PDF) paper.
The paper explains how we increase the likelihood of false positive results through seemingly minor methodological problems, including flexibility in sample size based on interim data analysis (data peeking), failure to report all the variables and conditions in a study, and lack of transparency in reporting the use of covariates and exclusion of outliers. I’ve reproduced their table of proposed solutions to these problems below, and I strongly encourage everybody to read the entire paper.
For me, “data peeking” is a perfect example of how easy it can be to rationalize practices that lead to an increased rate of false positive findings. Most people don’t realize that looking at the data before collecting all of it is much of a problem. Neither did I until recently. In actuality, peeking at your data substantially raises the probability of a false positive finding (for more discussion read Why Data Peeking is Evil by @talyarkoni).
Imagine you’ve meticulously researched and designed a study and planned to collect data from 80 participants. After running 80 participants you analyze your data and you find that the effect you were looking for is significant at the p = .08 level. What to do? You could stop there and report your findings, but you know that when you submit the paper for publication reviewers are likely to reject it because the findings aren’t quite significant at the conventional p < .05 level. Alternatively, you could collect just a little more data. It’s not hard to see how this could be rationalized: the effect is probably real, so what’s the harm in adding a few more data points? The real crime would be to waste all the money, time, and effort that you’ve invested in this study. So you collect data from 20 more subjects and your p value drops to .05 and, voila! Significant results!
The problem is that your finding isn’t really significant at the .05 level. Because your decision to terminate data collection was conditional on the significance level of the effect, you’re much more likely to get significant results than if you were to stop at a pre-determined point. When these results are published, however, there is no way for a reader to know that you didn’t plan to run 100 subjects all along.
It’s not hard to see how many perfectly honest researchers might peek at their data once in a while. Many researchers don’t realize the consequences of peeking at their data. Others may understand that you’re not supposed to peek in after every few subjects and stop collecting data the moment you get significant results, but fail to appreciate that even peeking a few times can make a big difference. Returning to Ariely’s locksmith’s analogy, they’re walking through a door with no lock.
The solutions suggested by Simmons, Nelson, and Simonsohn are a big step in the right direction. Simply making it clear to people that some of their practices are problematic removes ignorance as an excuse. Knowing what’s wrong also makes rationalization that much harder. As Dan Ariely explained in a recent interview, ambiguous rules present more opportunities for rationalization. But no solution is perfect, so I thought I’d share a few things that have been on my mind as I wrestle with these problems.
Unilateral disarmament. It’s easier to get significant results by ignoring the rules rather than following them. Therefore, without an enforcement mechanism, such as the adoption of these (or similar) solutions by institutions such as journals, there is a risk that we could create a situation in which only the most honest researchers would improve their behavior while many others would not. The perverse result would be that our journals would be disproportionately filled with publications by poorly behaved researchers who find it easier to produce significant results.
Having said that, there are reasons for optimism. Even if journals do not adopt the proposed solutions as requirements, reviewers can still bring them up. This possibility should discourage researchers from trying to sneak questionable practices past reviewers. In addition, as Joe Simmons noted in a recent presentation at APS, authors who adopt the solutions can explicitly note that they have done so in their papers, which not only serves to signal the integrity of the paper but, over time, may help to establish a positive social norm.
Make reforms less threatening. Ultimately, the success of reforms like those that Simmons, Nelson, and Simonsohn propose depends on the researchers themselves. Will they take the problems facing the field seriously, or will they try to ignore them and sweep questionable research practices under the rug? I have great faith in the integrity of most researchers, but it’s very important to acknowledge that reform poses a serious threat and that the more we are able to mitigate that threat, the more likely people will be to adopt reforms. If we are too aggressive in our reforms, or if we target individuals who have behaved no differently than their colleagues, then we risk triggering a defensive reaction that will make successful reform slower and more difficult.
I want to be clear that I think Simmons and his colleagues have gone to considerable lengths to make the adoption of the reforms they propose less threatening, but they have very little control over how others seek to enforce the solutions they suggest. For example, one commendable guideline they suggest is for reviewers to be “more tolerant of imperfections in results.” For the long term success of reform, I believe that it’s just as important that this recommendation be put into practice as any of the proposals that would minimize the rate of false positive findings.
Start now, but go slow. Another source of potential threat to researchers is the question of when do we start. The obvious answer is “now!” We have a problem that needs fixing and the problem is not going to go away on its own. But researchers have a pipeline of completed but as-yet-unpublished studies that can reach back several years. If we can’t somehow mitigate the threat that existing research will be considered compromised then we will be make it much more likely that researchers will rationalize away the need for reform.
Take, for example, the graduate student who has been working on her thesis for 4 years only to realize that she peeked at her data in one or two of her 7 studies, though she doesn’t remember which ones or how many times, since she didn’t realize it was a problem at the time. Do we expect her to discard that data? Or what about the young professor who realizes that, after years of collecting data and submitting revisions upon revisions of a paper, his methods are not completely above reproach? It doesn’t seem reasonable to reject his paper on that basis if we’re not also going to go back and retract hundreds of others of paper in the literature.
Perhaps we can take a page from Dick Thaler (@R_Thaler) and Shlomo Benartzi’s “Save More Tomorrow” plan (PDF) to nudge people in the right direction. The key to Save More Tomorrow’s effectiveness is that it doesn’t require people to make the hard choices today, when the perceived costs are the highest, but allows them to commit to making changes in the future. For methodological reform, this would entail a “starting now” approach. Any new research would follow the reformed standards, while existing research would be treated more leniently, while attempting to increase transparency, for a certain period of time, let’s say 2 or 3 or 5 years. The solution is not a perfect one, but aims at maximizing the number of researchers that would commit to the reform process rather than avoiding reform or putting it off.
One final thing to note: Simmons’ and his colleagues’ proposals are a great starting point, but there are other reforms that have been proposed that also deserve attention and I hope to return to them in the future. Among these reforms is the need to be more open with our data and the possibility of pre-registering studies before running them to help keep researchers honest (for more discussion see this post by @Neuro_Skeptic). The most important of these, in my opinion, is the need for replication. Many of the problems of false positive psychology would be reduced considerably if there were more replication in the field of psychology. Although an extensive discussion is beyond the scope of the current article, I recommend an upcoming paper (PDF) on the topic by Brian Nosek and his colleagues and this piece by Ed Yong.
The discussion about reform is an important one for us to be having. I hope that perhaps a silver lining to the Smeesters resignation is that it will underscore the urgency of this discussion. According to Martin Enserink at Science, Smeesters is convinced that his offenses were no different than those of many of his colleagues in marketing and social psychology, who he claims, “consciously leave out data to reach significance without saying so.” Maybe I’m being naïve, but I don’t think that’s true. While I believe that many researchers make methodological mistakes that they don’t recognize as problematic, very few cross the line and knowingly produce falsified results. If you share my assumption that most researchers are honest then what we need to do is to make the rules clear and unambiguous as soon as we can. There may be some hardened thieves among us, but what we really need are more locks on our doors to help keep us honest.