Author is cognitive neuroscience professor at Cardiff.

If we ignore the warning signs now, then in a hundred years or less, psychology may be regarded as one in a long line of quaint scholarly indulgences, much as we now regard alchemy or phrenology …we found ourselves trapped within a culture where the appearance of science was seen as an appropriate replacement for the practice of science


Confirmation bias.

NSHT emphasizes predicting positive results. Null hypothesis is seen as a failed experiment.

Publication bias. Especially towards positive and novel results. (PLOS ONE claims to publish any methodologically sound research, regardless of novelty or importance.)

Reliance on conceptual replication. Subjective judgment whether one experiment conceptually replicates another. Conceptual replication attempts are never seen to falsify the original - if no results are obtained it must be a different phenomenon - fuels confirmation bias.

HARKing (hypothesizing after results are known) presents post hoc hypothesis as a priori. Regardless of results, HARKing makes experiments look like they confirm the broader theory. Estimated rate varies, but self-reported rates are already as high as 35%.


Can observe p-hacking in distributions of p values.

Researcher degrees of freedom. Subgroup analysis, excluding measures, optional stopping. Even small things like choice of outliers, rounding, numerical methods etc provide freedom.

Phantom replication - ‘successful’ replication via subgroup analysis and HARKing.

Biased debugging - look for bugs only when results are unexpected.


Negative attitudes towards direct replication. “Replication police.”

Insufficient power. Causes type 2 errors, but also inflates reported effect size. Usually no power analysis done because no effect size is predicted either. Researchers often don’t understand how adding complexity to experiment designs reduces power.

Failure to disclose methods. Generally not enough detail is published to be able to critique or accurately replicate an experiment.

Misunderstandings of the meaning of p-values. Cannot compare the value of two interventions by comparing their p-values.

Failure to retract. Even horrifically flawed studies are generally not retracted by the authors.


Hiding data helps hide mistakes and misconduct. Very few researchers share their data, even privately with other researchers. Some journals require public data deposition, but still don’t enforce it.

Ethical concerns are valid, but not as common or difficult as claimed.


Massive career pressure to produce positive results.

No protection for whistle-blowers - often career-destroying.


Paywalls. Journalists, business psychologists, policy-makers, even researchers at poorer universities can’t afford access to publicly funded research.

Prices increase hugely over time. Profit margins for publishers are insane.

No prestige for Open Access journals.

Resistance to preprint archives because they obstruct HARKing?


Bean counting

Researchers judged by shallow metrics, eg impact factor or amount of grant money obtained, which produce damaging incentives.

Grants are particularly weird, because a researcher who obtains the same results using less resources would see worse career prospects. Grants perceived as prestigious in themselves, rather than as a responsibility.

Impact factor is not remotely objective - many degrees of freedom and journals lobby to change their calculations.

Impact factor not found to correlate with statistical power, so presumably does not correlate with reliability.

Assigning authorship purely by order is difficult. Would make much more sense to adopt ideas from physics - list specific contributions per author. As a bonus, would allow researchers to specialize in eg experimental design or data analysis without being penalized.


Ideas for possible improvements:

Preregistered reports. Peer review and publish the experimental design before running the experiment. Commit to publishing the results, including full data, regardless of novelty or significance. Allow exploratory analysis, but clearly label it and demarcate it from a priori hypothesis testing.

Have journals commit to publishing any replications of papers they have previously published.

Sidestep journals entirely - create a public platform for pre- and post-publication review. Could include simple ratings, reviews of statistical validity, replications of data processing etc. Allow journals to bid for publication of popular articles - making journals into curators rather than gate-keepers.

Judge researchers by successful/unsuccessful replications of their work - incentivizes higher power and better experimental design.

Judge researchers by the opinions of expert peers, rather than by numerical metrics.

Disclosure statements eg “We report how we determined our sample size, all data exclusions (if any), all manipulations, and all measures in the study.”

Data sharing. Peer Reviewer Openness initiative - reviewers commit to insisting that authors at least give a public reason for not sharing their data.

Standardization of research practices - standard analysis technique for a given field reduces degrees of freedom.

Use Bayesian hypothesis testing. The resulting numbers are more intuitive to interpret than p-values. Reduces p-hacking incentives by removing the strict cutoff. Allows for optional stopping without bias. Allows comparing non-null hypotheses.

Adversarial collaborations.

Funding bodies can require Open Access publication, public data deposition etc.

Random data audits.

Protect whistle-blowers.

Prosecute academic fraud as a criminal offense - corrupting the body of knowledge and wasting public funds is directly harmful to the public.

Advice for junior researchers:


Nothing new, but it’s really useful to have this all in one place so I can point people at it.