Science Policy For All

Because science policy affects everyone.

Psychology’s Reproducibility Crisis

leave a comment »

By: David Pagliaccio, Ph.D.

Photo source:

Findings from the collaborative Reproducibility Project coordinated by the Center for Open Science were recently published in the journal, Science. This report raised a stir among both the scientific and lay communities. The study summarized an effort to replicate 100 studies previously published in three major psychology journals. The project found that only around one third of replication studies yielded significant results where the size of the effects observed in the replications was about half that of the original reports on average. The study was quickly met with statistical and methodological critique, and in turn by a criticism of this critique. With the concerns raised by the Reproducibility Project and the intense debate in the field, various organizations and media outlets have begun to spotlight this issue, noting that psychology may be experiencing a “reproducibility crisis.”

The authors of this recent study importantly indicted systemic forces that incentivize “scientists to prioritize novelty over replication” and journals for disregarding replication as unoriginal. Typically, a scientist’s career is most easily advanced through high profile and highly cited publications in well-respected journals. This rarely includes replication attempts as they are generally hailed as not novel or not progressing the field. Further, statistically significant results are prized while null results are often chalked up to insufficient power and are not given a forum to help shape the literature. These factors lead to a large publication bias towards often underpowered but significant findings, a rampant ‘file drawer problem’ of not being able to publish non-significant results, and what is known as “p-hacking”, where authors can analyze and reanalyze a given dataset in different ways to push towards significance of a desired result.

Several initiatives have been put forth to try to alleviate this “reproducibility crisis.” For example, the American Statistical Association released a statement regarding the use of p-values in scientific research. This served both as a general clarification on the use and interpretation of p-values in null hypothesis significance testing and an impetus to potentially include other measures in our understanding of effect size. This type of reframing is helpful in assuring good statistical practice and resisting the tendency to inaccurately interpret our arbitrary statistical threshold of p<0.05 as a marker of truth, which often biases scientific findings and reporting. Additionally, the National Institutes of Health (NIH) have adopted new guidelines for grant submissions to try to enhance rigor and reproducibility in science, for example by increasing transparency and setting basic expectations for biological covariates. Yet, the main investigator-initiated research grant from the NIH, the R01, still includes novelty as a main scoring criteria and does not have any specific provisions for including replication studies.

Publication-related initiatives have begun to be designed to help incentivize replication. For example, the Association for Psychological Science has created a registered replication report where, before data collection even begins, scientists can pre-register a replication study to be published in Perspectives on Psychological Science. This saves scientists from struggling to publish a direct replication and reframes the focus of replication away from whether a prior study was ‘true’ or ‘false’ but rather focuses on cumulative effect size across studies. While this is a step forward, few have yet to make use of this opportunity. Importantly, while the rare journal, like PLOS ONE, explicitly states that it accepts replication submissions, top-tier journals have generally not joined in on allowing for registered replications or for creating specific article submission formats to allow for replications that otherwise would not be considered ‘novel.’ Other interesting avenues for addressing this issue have begun to spring up, for example, the website,, was created as an archive of attempts to replicate psychology studies. While this does provide a way to publicize failures to replicate that may otherwise not be publishable, these reports currently do not seem to be indexed by standard databases, like PubMed or PsychNet. Thus, while more failures to replicate can be made available and could help the field, the unofficial nature of this website does not easily help or incentivize investigators in terms of publication counts, citations, or other metrics often considered for hiring, tenure, etc.

Importantly, issues of reproducibility and publication bias can have vast consequences on society and policy as well as potentially eroding public trust in science and the scientific process. While an extreme case involving falsification, the long lasting consequences of Andrew Wakefield’s erroneous, retracted, and unreplicated paper linking the MMR vaccine to autism spectrum disorders truly underscore the potential impacts of unchecked scientific findings. A much more benign example was profiled in The Atlantic, concerning findings that bilingual individuals show better cognitive performance than monolingual individuals. While many published studies confirmed this effect in various ways, several large studies found no significant difference and many negative studies went unpublished. Similarly, as detailed in Slate, doubt has recently been cast upon a long research track examining ego depletion. Failed replications of this ego depletion effects are now coming to a head. This is after, for example, this research was formed into a book exploring willpower and how individuals can use this science to flex their self-control and willpower.

While these findings have not shaped major policy, it is not a far leap to see how difficult it may be to undo the effects of publications biases towards novel, but unreplicated research findings on a variety of policies. For example, education research also suffers from replication issues. One study pointed out that replication studies represented less than 1% of published studies in the top 100 education research journals. While many of these studies did replicate the original work, most were conceptual rather than direct replications, and replication success was somewhat lower when performed by a third-party rather than including authors from the original work. While the higher successful replication rate is encouraging, this study does call strongly for an increase in the number of replication studies performed.

Despite debates over the extent of the reproducibility problem, it is clear that psychology and science more broadly would benefit from greater attempts to replicate published findings. This will involve large-scale shifts in policies ranging from journal practices to tenure decisions and governmental funding to help alleviate these issues and to support and motivate high quality science and replication of published studies. These efforts will in turn have long-term benefits on the development of policies based on research in education, social psychology, mental health, and many other domains.


Written by sciencepolicyforall

March 30, 2016 at 9:00 am

Posted in Essays

Tagged with , ,

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: