Standards in Behavioral Science | Science-Based Medicine

There is an excellent review in The Independent (by Helen Coffey) about the recent cultural shift within behavioral science. It reflects the exact same issues that we address here with medical science – identifying and eliminating shoddy scientific practices. It’s worth going over, providing examples from the medical side.

Perverse Incentives

The goal of science should be to discover the truth, regardless of what it is. This is especially important for an applied science like medicine – we want our interventions to be safe, effective, cost effective, efficient, and minimally invasive and disruptive. To achieve this we need to know what actually works – we need the best, most reliable science possible.

But there are other incentives that get in the way. Researchers want positive results that confirm their biases. Positive results are also more likely to be published and advance one’s career. It helps if the results are surprising and interesting, what Coffey calls “sexy”. Journal editors also like sexy results, because that increases the visibility, prestige, and impact factor of their journal. The press adores sexy results because they are very media friendly.

I would extend this idea of perverse incentives to include play to pay journals that just want to publish lots of stuff, regardless of quality. And of course there are ideologues who want to promote their particular world view. This gets entangled with the financial and career goals above as well. The result is that, if you, say, are an acupuncturist, you want to publish studies that show acupuncture works. You cite other studies that show acupuncture works. You can create in the literature an acupuncture fiction that has nothing to do with reality and is built on all the shoddy research that Coffey discusses.

It goes even further than Coffey realizes, because the sexy but shoddy research is not just a one-off. It can be part of a campaign promoting an entire false idea, even a false cultural institution. Things like homeopathy, acupuncture, Reiki, megavitamins, antioxidants, and chiropractic take on a life of their own. You get journals, institutions, and even entire professions dedicated to nonsense.

But at the core of all of it is the bad study. So let’s review Coffey’s points and even add some.

Fraudulent Studies

I don’t need to say much about fraudulent studies except that, unfortunately, they do exist. We can collectively do a better job of policing against fraud, detecting it, and weeding it out quickly, preferably before it ever gets published. This is mostly on journals and their editors, who need better fraud detection practices.

But perhaps the most important thing to realize about fraudulent scientific research is that this is not the main problem. It is a destructive and horrible thing, but relatively small compared to good-faith but sloppy research.

P-Hacking

P-hacking refers to superficially fine but ultimately questionable research practices that essentially amount to statistical cheating. The p-value is a rough statistic that is used to see if a study is even interesting – are the results likely to be a statistical fluke or the result of a real phenomenon. But the p-value is widely misunderstood and overused. Even worse, it has become too much of a focus of research, and has led to (sometimes inadvertent) hacking to get significant results.

Essentially these are methods that distort the statistical results by giving more throws of the dice (often without disclosing this fact). So, you can collect data until the results become significant, or make multiple comparisons, or look at multiple outcomes.

There are several basic fixes for p-hacking. One is to simply educate researchers about methods of p-hacking to make sure they don’t accidentally do it. But also editors can specific try to detect p-hacking and demand data from submissions that would help them do it. But there are two more definitive fixes. One is pre-registration of study methods. You cannot p-hack if you determine all research methods before collecting data. The other is replication, which follows the original methods to see if the same results occur.

Fragile Studies

Even if a study is honest and does not engage in p-hacking, the results may still not be reliable or generalizable because they are “fragile” (which you can think of as the opposite of being robust). A fragile study, for example, looks at a study population which is not representative for some reason. Coffey uses the example, common in behavioral psychology, of only using college students. But any study with narrow inclusion and exclusion criteria can also be fragile. Perhaps the results are only positive in certain cultures or subcultures (such as the fact that acupuncture studies are way more likely to be positive if conducted in an Asian country).

Another source of fragility is a small sample size. Fifty subjects in each arm of a study is generally considered to be a good minimum for a statistically robust study, and less than that should be immediately suspect. This depends on the outcome, however. More objective outcomes, like death, can get away with smaller sample sizes, while subjective outcomes like pain perception require even larger studies.

I would also consider another sign of fragility that only one laboratory or researcher can seem to generate positive results. Until a result reliably replicates, it is suspect.

Also, all the little details of studies that we often discuss in our specific reviews can be filed under fragility. A study may have a large drop-out rate, or not be properly blinded, or use dubious outcome measures, or a host of other weaknesses in the protocol.

Salami Slicing

Coffey uses the term “salami slicing” to refer to what is also called the sharpshooters fallacy, or more generically the problem of hypothesizing after you look at data. The sharpshooters fallacy refers to determining what a positive outcome is after seeing the result you already have, like shooting at the side of a barn and then drawing the target around your bullet hole, declaring who got a bullseye.

Ideally a research study would start with a clear hypothesis, and a specific method for gathering data that will test that hypothesis in a way that makes a-priori sense. Salami slicing is the practice of collecting a lot of data, then slicing and dicing up the data in different ways until you find some correlation that is significant, then backfilling some justification for why that’s the case.

Results of this approach often lack what we call face validity – they don’t seem to make sense on their face. But the results can be statistically significant, or at least appear significant if you don’t know and correct for the fact that multiple comparisons (slicing of the data) were done.

Importantly, the results can often be surprising and sexy – sure, because they also happen to be bullshit.

The Future

Coffey ends her piece on a positive note, saying that exposing all these dubious methods is having an impact, improving the overall rigor of the scientific literature. I agree that this is happening, although I would argue in needs to happen at least an order of magnitude more than we are currently seeing.

But there are also some counter-trends. At the same time we are trying to shore up the rigor of biomedical science, there are forces trying to weaken those standards, or at least carve out exceptions for their prefers beliefs. They have political allies, and lots of money.

Also, the media appears to be working against us. To make this point, just look at the ads below Coffey’s article. They represent the exact thing she is discussing. Social media seems to be designed, even more than mainstream media, to favor sexy results. There is now a cottage industry of influencers, self-help gurus, self-appointed pseudoexperts, contrarians, snake oil peddlers, and ideologues leveraging social media to spread the absolute worst shoddy science.

The fact that, behind the scenes, we are incrementally improving the rigor of our science is great. But it is overwhelmed by the deluge of misinformation and shoddy science out there. We need to tackle that realm as well. This requires increased standards not just for published studies, but for press releases, public communication, scientific journals, and academia. And we need to dramatically increase the amount and quality of our public science communication.

We also need to dramatically improve the quality of our regulations, which are steadily being ratcheted in the direction of snake oil. Don’t expect any improvement in the next four years, but this is an endless struggle we must keep up.




  • Founder and currently Executive Editor of Science-Based Medicine Steven Novella, MD is an academic clinical neurologist at the Yale University School of Medicine. He is also the host and producer of the popular weekly science podcast, The Skeptics’ Guide to the Universe, and the author of the NeuroLogicaBlog, a daily blog that covers news and issues in neuroscience, but also general science, scientific skepticism, philosophy of science, critical thinking, and the intersection of science with the media and society. Dr. Novella also has produced two courses with The Great Courses, and published a book on critical thinking – also called The Skeptics Guide to the Universe.



    View all posts



Source link

Hot this week

Jin’s “RUNSEOKJIN_EP.TOUR” Recap: Stanning BTS Podcast

 Listen via Apple Podcasts | Spotify | Amazon Music | More...

Let’s Dive In (Literally) – Joy the Baker

In lieu of a summer bucket list, I’ve made...

How Trump’s Federal Funding Cuts Are Hurting Early-Career Researchers and American Health

As a young doctoral researcher at a university in...

Topics

Dave Grohl, Wife Jordyn Make 1st Appearance Together After Secret Baby

Dave Grohl and Jordyn Blum have made their first...

Blackberry Peach Crisp with Oatmeal Cookie Crumble

We adore this blackberry peach crisp! It’s juicy and...

Air pollution linked to lung cancer-driving DNA mutations, study finds – The Guardian

Air pollution linked to lung cancer-driving DNA mutations, study...

Song Exploder – Little Simz

“Free” Little Simz is a rapper from England who put...

Tesla shares tumble after Trump says DOGE should look at Elon Musk’s subsidies

Elon Musk speaks with U.S. President-elect Donald Trump as...

Student Solves a Long-Standing Problem About the Limits of Addition

The original version of this story appeared in Quanta...
spot_img

Related Articles

Popular Categories

spot_imgspot_img