Apr 29

The Stapel Continuum

Diederik Stapel (Photo Credit: Jack Tummers)

Diederik Stapel (Photo Credit: Jack Tummers)

Along with many other psychologists, I’ve been closely following (and participating in) the ongoing discussion about finding ways to effectively improve the shortcomings in our field’s research methods. Given that the Stapel fraud case was an important spark to these discussions, I read Yudhijit Bhattacharjee’s article, The Mind of a Con Man, in this week’s New York Times Magazine with great interest.

Bhattacharjee paints a very humanizing portrait of Stapel and the struggles that those around him have had coming to terms with his fraud. He certainly doesn’t let Stapel off the hook, but I was disappointed that the story essentially conflates Stapel’s fraud with the sins of many other psychologists, with the difference being only a matter of degree. In doing so, it unfairly portrays many honest psychologists and presents a simplistic – and, in my opinion, deeply misguided – understanding of the real problems in psychology.

Although he calls it a “cynical point of view,” he suggests that actions like Stapel’s are on the same “continuum of dishonest behaviors that extend from the cherry-picking of data to fit a chosen hypothesis — which many researchers admit is commonplace — to outright fabrication.” I could not disagree more strongly. I will be among the first to acknowledge that current practices in psychology are in need of improvement. Some psychologists make mistakes when they should know better; others do know better but find ways to cut corners through excuse making or self-deception. But fabricating data crosses into an entirely different territory.

In Bhattacharjee’s story, Stapel explains that it is, “The extent to which I did it, the longevity of it, [that] makes it extreme. […] Because it is not one paper or 10 but many more.” Sorry, Diederik. That’s not what makes it extreme. Adding a post-hoc covariate or dropping an inconvenient outlier is a questionable research practice – one that should be eliminated – but making up fake data? That’s fraud.

Ultimately, I agree with Bhattacharjee that fraud “might represent a lesser threat to the integrity of science than the massaging of data and selective reporting of experiments.” I’m grateful that there are a growing number of scientists and scientific organizations who are making methodological reform a priority. As I’ve written previously, we need to make sure that at the very least psychologists recognize the effects that some of these practices can have. But that still doesn’t place most psychologists on the same continuum as Stapel. It’s a very big step from reporting unplanned analyses of real data to fabricating fake data from studies that were never conducted. Pretending like these are more or less the same thing is not only misleading, but it impedes efforts at reform by putting on the defensive those people whose support is critical if the reforms are to succeed.

In an article full of self-incriminating quotations, when it comes time to accuse psychologists of deliberately falsifying their data, suddenly  Bhattacharjee’s sources go silent, and no evidence of this damning indictment is presented besides what “several psychologists” told him. I find it ironic that immediately after accusing an entire field of acting in bad faith in the “pursuit of a compelling story no matter how scientifically unsupported it may be,” he is guilty of doing precisely that himself.

Related posts:


4 pings

Skip to comment form

  1. Chris Chambers

    I certainly wouldn’t place all psychologists on the same continuum as the most extreme fraudsters. By and large I think most psychologists operate with integrity and honesty. However life isn’t that simple…

    The key question seems to be this: Does extreme fraud lie at the (far) end of the same spectrum of behaviours as QRPs? Or is fraud categorically different? The answer, I feel, depends on the mind of the experimenter. It’s completely possible that a researcher can engage in QRPs with the best and most honorable of intentions, being ignorant that such behaviours are “wrong”. I suspect many of us fall into this category, which is why QRPs are so common (e.g. as shown by John et al 2012). In that case the PRACTICE is questionable but the integrity of the scientist is not in question, and I wouldn’t regard the behaviour as on the same continuum as fraud. But this is key: Once you learn that what you are doing is bad practice, will you change your behaviour? If not then you have made a categorical shift into the realm of dishonesty. Repeating the same QRP then puts you on the same continuum of behaviours that leads to Stapel. (Note that this doesn’t mean that every psychologist who knowingly engages in QRPs will inevitably end up making up the data — we all have our own moral compass and internal limits).

    This dynamic makes it difficult to know whether QRPs are on the same continuum as fraud, because to be sure we need to get inside the mind of experimenter, and each of us is our own best judge. For my part, now that I know about all the possible QRPs and how to address them, I will know that if I engage in them in the future I will be behaving dishonestly.

    I realise my view might well be the minority in psychology – many researchers who I respect (like yourself) argue that fraud is always categorically different from QRPs. Here’s a little post I wrote about this last year: http://www.scilogs.com/sifting_the_evidence/tackling-the-f-word/

    Actually, if you’ll permit me, I’ll quote from that piece here:

    Which of the following do you consider to be scientific fraud?

    1) A scientist collects 100 observations in an experiment and discards the 80 that run counter to his desired outcome

    2) A scientist runs ten experiments then selectively writes up the two that produced statistically significant effects

    Everyone would agree that the first scenario is fraudulent. Many (perhaps most) of us would also view the second scenario as fraud, or at least misconduct.

    Mathematically, the two scenarios are similar: whether I discard 80% of my data or 80% of my experiments, I end up in much the same place – with biased evidence. Morally, they are also arguably on par. After all, how is selectively reporting an experiment based on a desirable outcome any less dishonest than selecting data within an experiment for the same reason?

    But now consider a third scenario.

    3) A journal reviews ten papers on the same topic and selectively publishes the two that reported statistically significant effects

    Things just got complicated. In a breath, we’ve ascended from the quagmire of individual dishonesty to the safer terrain of groupthink and the “cultural” problem of publication bias. We can now relax in the comfort of diminished responsibility, forgetting that the incentive structure in scenario 3 drives scenario 2, which in turn encourages the more extreme scenario 1.

    1. Dave Nussbaum

      I agree with almost everything you say here, let me take on the points separately.

      First, on the question of intentionality — perhaps I am being naive, but while I recognize that there are bound to be a not-insignificant number of psychologists who intentionally set out to produce false results, I believe that my faith in the good intentions of most psychologists is not misplaced. Many common questionable practices are engaged in out of ignorance. I know that as a grad student I was taught, in perfectly good faith, to do things that I have since learned are problematic, such as testing post-hoc covariates to soak up unexplained variance. I certainly didn’t recognize that there was a problem with checking your data after running the first 20-30 subjects to make sure things were on the right track. Once I learned how problematic these practices were I changed my behavior. Maybe I’m wrong, but I’d like to believe that most psychologists would behave similarly. When the behavior is unintentional, I think it’s pretty clear it belongs on a different continuum that Stapel.

      Once you start doing the same things intentionally, I agree that you may plausibly be on the same continuum as fraud. Even here, though, I think it’s important not to conflate things to hastily. As you point out, you can quickly find yourself in grey areas with powerful forces pushing you towards self-deception — that is, believing that what you’re doing isn’t perfect but it “isn’t a big deal” or “everyone is doing it” or “I know this is right, but it won’t get published looking like this”. In addition to education (to address the first point above), this should be the primary target of reform. Journals publishing only positive results, tenure committees pushing for ever more publications irrespective of quality, and a general culture of cutting corners.

      Still, I think that self-deception, while not an excuse, is a very large step removed from fraud. As I said in the piece, addressing this is probably much more important than eliminating fraud — yet self-deceivers aren’t making a clear choice to falsify their data. But yes, here you could plausibly make a continuum argument. Looking for outliers when your experiment doesn’t work, but failing to do so when it does is not good practice (you should set exclusion criteria in advance), but it’s not nearly as bad as publishing the one in twenty studies you ran that worked. Still, the NYT magazine article makes a strong claim that these are common and deliberate practices in psychology and provides no substantiation of these claims.

      Finally, I think that for much needed reform efforts to ultimately be successful, psychologists (and other scientists) must collectively recognize the problem and resolve to address it. That doesn’t mean everyone has to be on board, but it’s critical that most people are supportive, and certainly not in opposition. Otherwise we’ll end up with a lot of smoke and not a lot of progress. These concerns have been raised many times before without much changing as a result. To that end, I think that — without letting people off the hook entirely — it’s important not to unnecessarily put people on the defensive. Calling the integrity of the entire field into question merely leads people to rally around institutions they should be trying to reform — like reasonable people who rallied around W. after 9/11. There is a better way forward — that’s why I wrote this post.

      Thanks for the comment!

  2. Rolf Zwaan (@RolfZwaan)

    I agree with Chris Chambers. Many researchers appear to have used questionable research practices (QRPs) unknowingly. This is not fraud. Using QRPs knowingly is fraud. You may not be inventing observations but you are selecting them to produce the desired inferential statistics. The problem is indeed that we may not know what was in people’s heads.

    At some point, however, it is reasonable to expect that people are informed. At that point we can stipulate that use of QRPs is fraudulent. I’m reminded (how can I not be?) of the speeding ticket I received last week near Cologne, Germany. That is, I received it last week but the speeding occurred in February. Apparently, I drove 130 km/h in a 100 km/h zone. I must have missed the sign because I did not deliberately ignore the speed limit. Nevertheless, I will have to pay the fine. It is my fault that I was not aware of the speed limit.

    So what is needed at this point is to create awareness of QRPs. To this end I like the 21-word solution proposed by Simmons, Nelson, & Simonsohn of having authors include in their method section: “We report how we determined our sample size, all data exclusions (if any), all manipulations, and all measures in the study.” It’s a good first step.

    Everyone knows inventing data is wrong. Pretty soon everyone ought to know QRPS are equally wrong.

    1. Dave Nussbaum

      Agreed. One of the strengths of Simmons, Nelson, & Simonsohn’s approach is that it makes the line in the sand much clearer. By stating that you have not engaged in the QRPs in question you can still lie, but self-deception is made impossible. It’s much harder to fudge around the edges — either you did it or you didn’t. In addition, it leaves you the leeway to do things that might be considered in ethically “grey” areas — but only if you are transparent about what you’ve done. That way, you’re not hiding anything and others are free to decide on the ultimate merits of your decisions and your research.

      Thanks for the comment!

  3. Sanjay Srivastava

    Which continuum are you talking about? The continuum of consequences or the continuum of causes?

    If you are talking about the consequences of QRPs vs fraud, then I do think there’s a case that they are on a continuum. Both lead to conclusions that have weaker evidential support than they appear to. (That support is not zero for fraud – note that Stapel was careful to fake results that were plausible. It’s just that his evidential support was in prior research and in the collective assumptions of social psychologists, not in the data at hand.)

    If you are talking about causes, i.e. the researcher’s intentions and the likelihood of the researcher engaging in future problematic behavior under various circumstances, then I think it’s murkier (and as a psychologist I want data before I generalize about mental states), but my own hunch is that they are not on a continuum. I suspect that most researchers who use QRPs have persuaded themselves that the costs are small and are outweighed by the benefits (getting “good enough” research done and not letting the perfect be the enemy of the good). I don’t think most would slide down any slope into outright fraud, which they clearly recognize as wrong. I also think there is potential for change, if people can be shown the true costs in a way that does not threaten them and produce a self-protective response. But ultimately that’s an empirical question.

    1. Dave Nussbaum

      Again, I couldn’t agree more — my reading of the NYT magazine article was that the author was making a strong case that these practices were common and deliberate, and I don’t think he has any evidence to support that claim. Here’s on such quote

      “Several psychologists I spoke to admitted that each of these more common practices was as deliberate as any of Stapel’s wholesale fabrications. Each was a choice made by the scientist every time he or she came to a fork in the road of experimental research — one way pointing to the truth, however dull and unsatisfying, and the other beckoning the researcher toward a rosier and more notable result that could be patently false or only partly true.”

      Thanks for the comment!

  4. Neuroconscience

    Thanks for this thought Dave, I think it is an important message to emphasize in what has become an extremely cynical period of (much needed) self-reflection for the psychological sciences. While I definitely agree that both outright fraud and unknowing scientific misconduct may be on the rise, I think it is important to remember that empiricism is not built on singular observations. Further we need to be careful- particularly in the current political climate- to send the message that this is a solvable problem mostly being perpetuated by a few bad apples. I do not believe that the majority of researchers are unaware of our biggest gripes- multiple comparisons, post hoc analyses, data dropping. Heck even XKCD has well known comics on most of these- and most of us have them on our department coffee machines. These things are common knowledge and I think we have a responsibility to remind journalists like Bhattacharjee that most scientists work hard to preserve the highest possible standard of integrity.This doesn’t seem particularly naive to me- like anyone else I am annoyed by bad papers, particularly ones that get good press. But the tide is rapidly changing thanks largely to post-publication peer review. The system is to some degree self-correcting, ever more so in the world of twitter and professional research blogs.

    1. Dave Nussbaum

      You raise an interesting point with coverage in the media vs. post-publication peer review — I’ve been mulling it over the past few days. In the last week or two there were two papers that got a lot of media attention and a harsh response online — the Tylenol reduces existential anxiety paper, and the fist-clenching and memory paper.

      It seems to me that while scientists took issue with the papers themselves, the wrath was fueled largely by the glib, unthinking, media fawning over them. As a result, I think the responses may have come off harsher than they would have otherwise. Still, it’s a very good sign that the field is enforcing it’s own principles — figuring out exactly how best to do so is clearly a work in progress.

      Thanks for the comment!

      1. Neuroconscience

        Yes well sadly it seems that the media will never change it’s ways- but gradually we can teach our peers that writing up your papers solely with the resulting media coverage in mind WILL backfire when you are torn to pieces in a public forum. Come to think of it we should probably start teaching our students to imagine what the media would like to hear, and then write the exact opposite…

        1. Dave Nussbaum

          We certainly need to do a better job communicating with the media and getting our stories into the hands of some of the many excellent science writers out there. I think there’s a very positive role for the media to play, but you’re definitely right that we need to avoid the headline-seekers like the plague. On the Tylenol paper, I think the lead author was a grad student that made some ill-advised comments in the press release — I’m inclined to be sympathetic, but yes, more guidance would have been very helpful.

  5. Brady Butterfield

    These are problems and I don’t disagree with that, but the more fundamental problem, in my opinion, is that very few researchers are doing work deemed important or fundamental enough for external parties to replicate, or even build on at all. I know it’s a tougher field in that sense, and that there are nicer ways to put that, but this is why it is my strong feeling (I know that is not evidence) that undiscovered fraud is more rife in social sciences than in “harder” sciences. Even now, if you wanted to fake data on a topic of “average” impact/interest, and didn’t do it in a way that would raise a statistician’s eyebrow, a collaborator’s whistle-blowing is the only real risk you run. I tend to believe the “several psychologists” the author cited.

    1. Dave Nussbaum

      You may want to check out http://retractionwatch.wordpress.com/

      1. Brady Butterfield

        That would be DISCOVERED fraud. I think the better response would be to suggest some reporting stucture / regulation wherein there is some internal validation of the work being done within a given institution. That solves most of this and, if the fraudster bodycount mounts, where this is going in any case.

        1. burntd0g

          “the more fundamental problem, in my opinion, is that very few researchers are doing work deemed important or fundamental enough for external parties to replicate,”

          I’m an outsider looking in. As such I rely on the scientific method working as advertised. As long as it is, I don’t worry about the charlatans, because I know (or thought I knew) that the scientific method was designed to root them out.

          Brady’s point is the one I care about. I’m less concerned that a phony is operating out there than I am with the lack of peer review by those who are responsible for doing it.

          Why publish papers nobody cares to scrutinize? If you care at all about public perception, consider the possibility that every untested paper is, and ought to be, the butt of a joke. Maybe the media and the public will take you more seriously only after you do.

          1. burntd0g

            Good Science doesn’t care what the motives are that produce Bad Science, and neither do I.

  6. Neuroskeptic (@Neuro_Skeptic)

    I agree with the general consensus about harm vs. intent.

    One idea I just had, though, and this is a disturbing thought – imagine a fraudster who is as ruthless as Stapel, but more risk averse. He is quite prepared to make up studies, but he realizes this is risky, so instead, he ruthlessly exploits QRPs to achieve the same end of publishing flashy results. His results are no more true than Stapel’s, but he never does anything ‘wrong’, and the only difference between him and everyone else is a matter of degree (and skill – he is a wizard at QRPs).

    Now is that guy really any better than Stapel, morally? If not, where does that leave us? If so, why…?

    1. Dave Nussbaum

      The short answer is that the person would be in the same ballpark as Stapel — see my response to Chris in the first comment above. Once you posit that someone is intentionally producing false results, then a bright line is being crossed. My issue with the article is that the author was claiming that this line was being crossed routinely, but provided no evidence to support his claim.

      As the false positive psychology paper showed (http://opim.wharton.upenn.edu/DPlab/papers/publishedPapers/Simmons_2011_False-Positive%20Psychology.pdf), you can get even obviously false results to appear statistically significant with enough trying. That’s why I’m in favor of enacting reforms, like these: http://www.davenussbaum.com/reform-from-the-bottom-up/, and I’m behind moving towards more pre-registration as you’ve suggested. The p-curve.com effort may also cause would-be cheaters to reconsider.

      But in the end, just because it’s possible to get the same result as Stapel by using lesser means, doesn’t mean anyone using those lesser means is as guilty as Stapel — although perhaps the person in your hypothetical scenario rises to his level. The rough analogy that comes to mind is that punching someone is wrong, and should be punished, but it’s not murder. However, it’s possible to kill someone by punching them enough times. That would be murder too — but it doesn’t mean anyone punching somebody should be treated like a murderer, or placed in the same category as one.

  7. Ray

    I may be oversimplifying things, but if I read about the problems of underpowered studies, QRP, and the file-drawers, I conclude that what has been done so far in “psychologial science” can be considered as “cute” and a “nice try”.

    Could it be that the only way for journals, institutions, and individual scientists to make themselves optimally believable/useful, IS to collectively adopt new regulations solving the above issues.

    Apparently, collectively adopting the general format (introduction, methods, conclusion), or writing-style rules/regulations for scientific articles are no problem for scientists/journals/institutions, but adopting rules/regulations which would probably greatly enhance the usefulness and reliability of the information IN these articles is somehow “revolutionary”. I find that strange, and in a way hilarious. The priority of things seem totally reversed: it makes no sense to me whatsoever :)

    If I was working in a scientific field, I would try and optimally make clear that my research has a maximum chance of containing useful information. The same would go for journals, and instututions. In 5-10 years time, I think nobody will even seriously look at a lot of stuff that has been done over the past decades. Might as well start doing things properly from now on.

    1. Dave Nussbaum

      I think you’re throwing out the baby with the bathwater. There is a ton of very good, useful, replicated psychology out there, and to dismiss it would be foolish. These same problems plague medical research — in some cases, the problem is much more dramatic because of the amount of money associated with the findings — but we would never dismiss medicine as cute and press reset.

      Having said that, the need for reform is clear. I think it’s a positive sign of a healthy field that it is undertaking an improvement in its methods. There are a lot of sources of information out there that we rely on regularly that are on much flimsier foundations and completely uninterested in improving things.

      However, one of the costs of recognizing the field’s shortcomings and discussing them publicly is that people will inevitably jump from seeing that there are problems to dismissing the whole field, without pausing in between. I think that’s a mistake. I think there is a lot of good and important research going on all the time. But we should still be striving to make things better as soon as possible.

  8. Greg Francis

    I agree with the idea that there are different continua regarding intent and effect. Ignorantly using QRPs is not even slightly like Stapel’s behavior. On the other hand, the produced result is similar: untrustworthy data. I also agree that a scientists who knowingly practices QRPs is behaving somewhat like Stapel.

    A more challenging issue is what to do with data that has already been published. Suppose a scientist ignorantly (but honestly) used QRPs in the past but has since learned of the problems with those approaches and decided to avoid them in the future. The decision to reform is good, but what about the past data that is still in the academic journals? Doesn’t such a scientist have a responsibility to correct the past? To not at least write an erratum seems like a tacit continuation of the QRPs.

    1. Dave Nussbaum

      I think we ignore past mistakes, unintentional or not, at our own peril. But the solution may be tricky — it seems like a collective action problem. What’s the incentive for a researcher who thinks she remembers adding a post-hoc covariate, or leaving a study in his file drawer 20 years ago, to go back and post a disclosure? It seems like a lot to ask of an individual actor, especially when the costs are individual and the benefits collective. We can hope for integrity to be a motivator, or the integrity of the field, but that’s unlikely to solve the problem perfectly — although it would be a good start.

      In fact, it seems like a start in this direction has already been made by Psych Disclosure. Check out Hal Pashler’s recent tweet:

  9. PMN

    Good post! I also follow this closely, and submitted this post after reading the NYT article:


    1. Dave Nussbaum

      Thanks for passing this along, a very interesting perspective. I passed it along on twitter, I think lots of people would be interested in reading it.

  1. A good post from Dave Nussbaum on how to think about Fraud and QRP | Åse Fixes Science

    […] you shouldn’t do, but, then again, not too many people pay attention to my blog. But, I thought Dave Nussbaum’s post, in light of the recent NYT expose of said ex-professor, very interesting, and it has a good […]

  2. Toward a More Perfect Ethical Community of Scientists | Scientific News

    […] My article on Stapel: http://www.davenussbaum.com/th… […]

  3. I’ve got your missing links right here (4 May 2013) – Phenomena: Not Exactly Rocket Science

    […] places, but also arguably allows him to seek fame through infamy. Dave Nussbaum responds that the Stapel Continuum is a myth. And fraudbuster Uri Simonsohn has a new paper on how to evaluate replication attempts. NB: needs […]

  4. It’s not a failure when you fail to replicate › Counterbalanced

    […] questionable research practices (QRP) are leading to over-inflated and erroneous results. Again, others have gone into excellent details on these matters recently, but it’s worth remembering that this isn’t a two-way street. Replication is part of […]

Leave a Reply