What institutional reforms will make science more efficient and reliable?

L. Tiokhin

January 2021

Link to this post on nature.com

"The winner takes it all

The loser’s standing small

Beside the victory

That’s her destiny.”

The winner takes it all, ABBA

I have a tendency to get overly excited about things. As an undergraduate, the target of my excitement was the application of evolutionary theory to human behavior and psychology.¹ I found an advisor who shared my interests, and before I knew it, I was running study after study to test novel hypotheses derived from evolutionary theory. It was an exciting time. My eyes had been opened to a new way of seeing the world. Oh, evolution - I’ve been dreaming through my lonely past. Now I just made it - I found you at last.

At the same time, I lived with a persistent anxiety. The anxiety that maybe my advisor and I weren’t the only ones working on these ideas. The anxiety that maybe we’d dedicate a year to a project, only to get “scooped” by a competing lab, leaving us to drown our sorrows with frozen pizza and a pint² of Ben & Jerrys.

As it turns out, many scientists share this fear of being “scooped”. Scientists are rewarded for novel discoveries, and the longstanding priority rule dictates that scientists who are first to publish their findings get the bulk of the credit and rewards. There’s a certain logic to rewarding priority: it incentivizes scientists to quickly share their findings with the scientific community, and can lead to an efficient distribution of scientists across multiple research problems.

But are there any downsides? For instance, might scientists be so worried about getting scooped that they conduct “sloppy” research in their race to beat competitors? Some journals certainly think so: eLife and all PLoS journals recently began offering “scoop protection” (allowing researchers to publish findings identical to those already published) in an attempt to reduce the disproportionate payoffs to scientists who publish first.

But do such policies make sense?

Modeling the priority rule

To gain insight into this question, we took a formal theoretical approach by developing an evolutionary agent-based model of competition for priority in science. In our model, we simulated a population of scientists that varied in their research strategies as they went about investigating questions. Successful scientists’ strategies were preferentially imitated by scientists in subsequent generations. In this way, a selection process identified the most successful research strategies, and we evaluated what strategies evolved in the long run under different conditions.

The key evolving parameter was the sample size of scientists’ studies. Larger samples led to higher average statistical power, but took more time to collect, leaving scientists more vulnerable to being “scooped” by competitors. Consistent with many scientists’ intuitions, we found that rewarding priority incentivized lower-quality research (and increasing the intensity of competition exacerbated this effect). We also found that scoop protection had a beneficial effect by reducing the incentive to conduct small, low-quality studies, leading to better scientific outcomes at the population level.

Unexpectedly, we also found that larger rewards for negative results resulted in lower-quality research. This might seem to go against the current trend to incentivize the publication of negative results. Isn’t science better off when negative findings are rewarded alongside positive ones? Not always. In our model, the rewards that scientists received from a publication were independent of the study’s statistical power. So, as negative findings became more valuable, scientists had little incentive to increase their statistical power by conducting large, costly studies. Instead, their best strategy was to run small, underpowered studies. These small studies were more likely to produce negative results (often false-negatives), but resulted in decent payoffs when negative results were highly valued. This finding suggests that, without controls on research quality, reforms to increase rewards for negative results will backfire by incentivizing lower-quality research.

Finally, we found that larger startup-costs to single studies (e.g., a mandate to always pre-register studies) incentivized higher-quality research by disproportionately penalizing scientists who performed large numbers of low-quality studies. This finding suggests that some inefficiencies in the scientific process can have important functions, and that we need to consider the costs and benefits of inefficiency before implementing reforms.

Reflections

"When one has finished a substantial paper there is commonly a mood in which it seems that there is really nothing in it. Do not worry, later on you will be thinking “At least I could do something good then.”

J. E. Littlewood

Littlewood isn’t far off. We began this project in 2017. That’s over 3 years between inception and publication.³

We initially submitted to PLOS Biology, after an editor reached out expressing interest. Unfortunately, the reviewers didn’t throw us the party that we hoped. Both were fundamentally positive and gave constructive feedback. Still, one reviewer noted, “None of the results surprise, given how the ABM was constructed,” and “Both of the major results struck me as a bit inevitable given the assumptions of the model.” In contrast, a reviewer at Nature Human Behaviour said, “To my eyes, this paper is a great contribution to this new literature and makes a variety of interesting and novel points that deserve to see the light of day.”

What’s the lesson here? Well, peer review is. noisy.⁴ Often, there is little agreement between reviewers. This can be OK if reviewers focus on different aspects of a paper and an editor can integrate reviewers’ disparate inputs. But reviewer disagreement makes it difficult to know a paper’s quality (in another model, we show that improving the reliability of the review process can help to prevent authors from misleading journals about the quality of their work).

In some respects though, traditional peer review is just not up to par. For example, no reviewer checked our code. One reviewer even apologetically mentioned, “I have not had the opportunity (due to pressures of time) to check that the code published on OSF works as advertised.” Wait, you might be thinking: A computational model published in a high-impact journal didn’t have every nook and cranny rigorously vetted by leading experts? Yup. That’s right. Instead, we voluntarily hired an external party to review our code (see the SI). This helped to ensure that our results were computationally reproducible and that the code was doing what we claimed (although it’s no guarantee). Now just think about all those other Nature, Science, and PNAS papers out there that haven’t been properly vetted.⁵

What now?

“At least I could do something good then” only partly captures my sentiments. I learned a lot from this project and am proud of the work. No single paper can provide a definitive answer to a topic as broad as how to reform scientific institutions. That said, our model is not a bad start. It adds to the growing body of evidence that competition for priority incentivizes lower-quality work ( 1, 2, 3). And it highlights some counterintuitive possibilities, including the fact that rewarding negative results can incentivize lower-quality research and that scientific inefficiencies can have useful functions.

My experiences with peer review have also inspired me to think more about the problem of quality control in science. Is traditional peer review really the best we can do? I don’t think so. Peer review is slow and unreliable, in part, because reviewers are volunteers who have little incentive to do a good job. A solution to this problem would be to create a marketplace for scientific criticism. All of a sudden, scientists would have financial and reputational incentives to provide high-quality criticism, scientists could get criticism at any point in the research pipeline (not just at the end of a project), and scientific criticism would be liberated from academic journals. Several meta-scientists and I recently founded a startup - Red Team Market - to build such a marketplace. We’re still in the early stages, but I’m optimistic about the potential of Red Team Market to improve the speed, reliability, and broad availability of scientific criticism.

What now?

Initially, I considered channeling my undergraduate advisor. He’d have said something like, “Kudos. Bask in the glow for a moment…OK, now get back to work.”

But..it is Friday night. The lights are low. And i’m kind of looking for a place to go. So. For now. Goodbye work. Hello Dancing Queen.

How anyone could not find this exciting is beyond me :) ↩︎
Pints. ↩︎
Yes, I realize that this may seem like tiddlywinks to scientists who collect longitudinal data over decades. ↩︎
By “noisy”, I mean that it sucks. ↩︎
Flawed papers. I see flawed papers. - In your dreams? - No. - Flawed papers, like a typo or a wrong reference? - Worse. Walking around among regular papers. They don’t see each other. They only see what they want to see. They don’t know they’re flawed. - How often do you see them? - All the time. They’re everywhere. ↩︎