The limits of modern science and the notion of “best explanation”

Here’s a New Yorker article about the shortcomings of the scientific method. Basically it looks at the idea that heaps of scientific ideas are greatly exaggerated by early results without adequate testing and as experiments continue the results become less impressive.

“The test of replicability, as it’s known, is the foundation of modern research. Replicability is how the community enforces itself. It’s a safeguard for the creep of subjectivity. Most of the time, scientists know what results they want, and that can influence the results they get. The premise of replicability is that the scientific community can correct for these flaws…

“Once I realized that selective reporting is everywhere in science, I got quite depressed,” Palmer told me. “As a researcher, you’re always aware that there might be some nonrandom patterns, but I had no idea how widespread it is.” In a recent review article, Palmer summarized the impact of selective reporting on his field: “We cannot escape the troubling conclusion that some—perhaps many—cherished generalities are at best exaggerated in their biological significance and at worst a collective illusion nurtured by strong a-priori beliefs often repeated.”

Such anomalies demonstrate the slipperiness of empiricism. Although many scientific ideas generate conflicting results and suffer from falling effect sizes, they continue to get cited in the textbooks and drive standard medical practice. Why? Because these ideas seem true. Because they make sense. Because we can’t bear to let them go. And this is why the decline effect is so troubling. Not because it reveals the human fallibility of science, in which data are tweaked and beliefs shape perceptions.”

So what does this all mean. Obviously science has great explanatory power. And can largely be trusted, over time, to give us an improved understanding of the world we live in, and how it works. But we’ve got to keep remembering that science is a human tool used by stupid people and subject to mistakes, and external pressures like the need to publish in order to secure research funding. There’s a really interesting quote at the end of the article that dovetails nicely into this thread on the Friendly Atheist, where the problem with holus bolus acceptance of scientific naturalism is demonstrated once again (particularly see my question in the comments). Arguments to best explanation are great. And necessary. But we’ve got to keep a bit of epistemic humility and perspective and remember that one generation’s best explanation is the next generation’s $2 archaic science textbook on sale in the nostalgia section of the op shop.

“We like to pretend that our experiments define the truth for us. But that’s often not the case. Just because an idea is true doesn’t mean it can be proved. And just because an idea can be proved doesn’t mean it’s true. When the experiments are done, we still have to choose what to believe.”

I said science again

I realise that when a Christian starts out a post about flaws in any part of science by saying “I love science” some people see that as analogous to someone preambling the telling of a racist joke with the line “I have a black friend so it’s ok for me to think this is funny.”

I like science – but I think buying into it as a holus-bolus solution to everything is unhelpful. The scientific method involves flawed human agents who sometimes reach dud conclusions. It involves agendas that sometimes make these conclusions commercially biased. I’m not one of those people who think that the word “theory” means that something is a concept or an idea. I’m happy to accept “theories” as “our best understanding of fact”… and I know that the word is used because science has an innate humility that admits its fallibility. These dud conclusions are often ironed out – but it can take longer than it should.

That’s my disclaimer – here are some bits and pieces from two stories I’ve read today…

Science and statistics

It seems one of our fundamental assumptions about science is based on a false premise. The idea that showing a particular result is a rule based on it occuring a “statistically significant” number of times seems to have been based on an arbitrary decision in the field of agriculture in eons past. Picking a null hypothesis and finding an exception is a really fast way to establish theories. It’s just a bit flawed.

ScienceNews reports:

“The “scientific method” of testing hypotheses by statistical analysis stands on a flimsy foundation. Statistical tests are supposed to guide scientists in judging whether an experimental result reflects some real effect or is merely a random fluke, but the standard methods mix mutually inconsistent philosophies and offer no meaningful basis for making such decisions. Even when performed correctly, statistical tests are widely misunderstood and frequently misinterpreted. As a result, countless conclusions in the scientific literature are erroneous, and tests of medical dangers or treatments are often contradictory and confusing.”

Did you know that our scientific approach, which now works on the premise of rejecting a “null hypothesis” based on “statistical significance” came from a guy testing fertiliser? And we now use it everywhere.

The basic idea (if you’re like me and have forgotten everything you learned in chemistry at high school) is that you start by assuming that something has no effect (your null hypothesis) and if you can show that it does more than five percent of the time you conclude that the thing actually does have an effect… because you apply statistics to scientific observation… here’s the story.

While its [“statistical significance”] origins stretch back at least to the 19th century, the modern notion was pioneered by the mathematician Ronald A. Fisher in the 1920s. His original interest was agriculture. He sought a test of whether variation in crop yields was due to some specific intervention (say, fertilizer) or merely reflected random factors beyond experimental control.

Fisher first assumed that fertilizer caused no difference — the “no effect” or “null” hypothesis. He then calculated a number called the P value, the probability that an observed yield in a fertilized field would occur if fertilizer had no real effect. If P is less than .05 — meaning the chance of a fluke is less than 5 percent — the result should be declared “statistically significant,” Fisher arbitrarily declared, and the no effect hypothesis should be rejected, supposedly confirming that fertilizer works.

Fisher’s P value eventually became the ultimate arbiter of credibility for science results of all sorts — whether testing the health effects of pollutants, the curative powers of new drugs or the effect of genes on behavior. In various forms, testing for statistical significance pervades most of scientific and medical research to this day.

A better starting point

Thomas Bayes, a clergyman in the 18th century came up with a better model of hypothesising. It basically involves starting with an educated guess, conducting experiments and your premise as a filter for results. This introduces the murky realm of “subjectivity” into science – so some purists don’t like this.

Bayesians treat probabilities as “degrees of belief” based in part on a personal assessment or subjective decision about what to include in the calculation. That’s a tough placebo to swallow for scientists wedded to the “objective” ideal of standard statistics.

“Subjective prior beliefs are anathema to the frequentist, who relies instead on a series of ad hoc algorithms that maintain the facade of scientific objectivity.”

Luckily for those advocating this Bayesian method it seems, based on separate research, that objectivity is impossible.

Doing science on science

Objectivity is particularly difficult to attain because scientists are apparently prone to rejecting findings that don’t fit with their hypothetical expectations.

Kevin Dunbar is a scientist researcher (a researcher who studies scientists) – he has spent a significant amount of time studying the practices of scientists, having been given full access to teams from four laboratories. He read grant submissions, reports, and notebooks, he spoke to scientists, sat in on meetings, eavesdropped… his research was exhaustive.

These were some of his findings (as reported in a Wired story on the “neuroscience of screwing up”):

“Although the researchers were mostly using established techniques, more than 50 percent of their data was unexpected. (In some labs, the figure exceeded 75 percent.) “The scientists had these elaborate theories about what was supposed to happen,” Dunbar says. “But the results kept contradicting their theories. It wasn’t uncommon for someone to spend a month on a project and then just discard all their data because the data didn’t make sense.””

It seems the Bayseian model has been taken slightly too far…

The scientific process, after all, is supposed to be an orderly pursuit of the truth, full of elegant hypotheses and control variables. Twentieth-century science philosopher Thomas Kuhn, for instance, defined normal science as the kind of research in which “everything but the most esoteric detail of the result is known in advance.”

You’d think that the objective scientists would accept these anomalies and change their theories to match the facts… but the arrogance of humanity creeps in a little at this point… if an anomaly arose consistently the scientists would blame the equipment, they’d look for an excuse, or they’d dump the findings.

Wired explains:

Over the past few decades, psychologists have dismantled the myth of objectivity. The fact is, we carefully edit our reality, searching for evidence that confirms what we already believe. Although we pretend we’re empiricists — our views dictated by nothing but the facts — we’re actually blinkered, especially when it comes to information that contradicts our theories. The problem with science, then, isn’t that most experiments fail — it’s that most failures are ignored.

Dunbar’s research suggested that the solution to this problem comes through a committee approach, rather than through the individual (which I guess is why peer review is where it’s at)…

Dunbar found that most new scientific ideas emerged from lab meetings, those weekly sessions in which people publicly present their data. Interestingly, the most important element of the lab meeting wasn’t the presentation — it was the debate that followed. Dunbar observed that the skeptical (and sometimes heated) questions asked during a group session frequently triggered breakthroughs, as the scientists were forced to reconsider data they’d previously ignored.

What turned out to be so important, of course, was the unexpected result, the experimental error that felt like a failure. The answer had been there all along — it was just obscured by the imperfect theory, rendered invisible by our small-minded brain. It’s not until we talk to a colleague or translate our idea into an analogy that we glimpse the meaning in our mistake.

Fascinating stuff. Make sure you read both stories if you’re into that sort of thing.