Is Statistical Significance Really Significant?

While statistical significance has long determined the validity of findings, the concept has numerous limitations that have scientists worldwide calling for its abandonment.

Reading Time: 4 minutes

Chances are that you’ve stumbled across the terms “statistical significance” at some point, whether while reading the results of an experiment or in your science class. This term has become a hot topic in the scientific community, as it is often the be-all and end-all of scientific conclusions.

Statistical significance, expressed as the variable “p” and ranging between zero and one, attempts to quantify the extent to which the result of a study is due to chance or to actual correlation. A low p-value, typically less than 0.05, rejects the null hypothesis (the notion that observations are due to chance) in favor of the alternative hypothesis(the notion that observations are influenced by a non-random cause). Conversely, a higher p-value provides evidence supporting the null hypothesis—that there is no relationship between the variables tested. While statistical significance has served as the threshold for the validity of studies for years, it may simply be statistical noise with no actual basis of fact. Though the concept has tangled into a dangerous mess, it all began with a cup of tea.

In the 1920s, biologist Dr. Blanche Muriel Bristol and statistician Dr. Ronald Aylmer Fisher had gathered for tea. Bristol turned down a cup of tea because the milk had been added after the tea, and she preferred that the milk be poured first. Fisher insisted that she could not tell the difference and his colleague, William Roach, proposed a test in which Fisher would prepare eight cups of tea. Four of the cups would have the tea poured first while the other four cups would have the milk poured first; Bristol would then guess which was which. In doing so, Fisher proposed the null hypothesis that she would not be able to guess the cups correctly. Against all odds, she guessed them all correctly; therefore, the null hypothesis was rejected, and testing for statistical significance was born. Fisher published these results in “The Design of Experiments”, his groundbreaking book on scientific methodology, and from then on, science moved toward the statistical analysis of experiments.

Since then, statistical significance testing has expanded to several applications, from testing the effectiveness of drugs to reporting the success of companies. In fact, a single announcement about the statistical significance of a pharmaceutical product can cause the company’s stock price to soar or plummet. Recently, Adial Pharmaceuticals Inc. witnessed a 13.82 percent increase in stock price immediately after the company reported promising and statistically significant results for its new drug trials. However, as widely as statistical significance is used, the concept is usually misused and of limited scientific value.

In 2019, three scientists beseeched the scientific community to drop the use of statistical significance. Hundreds of other statisticians and scientists joined in to challenge the use of statistical significance to solely determine relationships. These scientists emphasized stopping the use of p-values to refute or support hypotheses, and instead proposed being more thoughtful in interpreting data to reflect the complex and non-binary nature of the real world. Contrasting with the moderate opinions of these scientists, the American Statistical Association (ASA) has firmly pushed to end significance testing, stating that it has become “useless.”

Statistical significance leads to problems outside of data analysis as well. Scientific journals are typically inclined to publish statistically significant findings over equally important results that may miss the threshold. A recent study revealed that 88 percent of scientific journal articles contained significant data while a mere 12 percent reported insignificant findings. As a result of this bias, scientists may manipulate the data or selectively pick methods that would yield significant results. Statistical testing misuse has also resulted in findings in published papers that cannot be reproduced. This lack of replicability reduces transparency and consistency with the experiment, diminishing the reliability of statistical testing and causing inaccurate scientific conclusions to be made. One particular offender, Brian Wansink, allegedly manipulated the data for 52 publications with nearly 4,000 citations across 25 different scientific journals. Wansink only focused on “statistically significant” findings and omitted a majority of his data, possibly leading to dozens of inaccurate claims. He was ultimately suspended from his university for his dishonest use of statistical tests and unscientific approach.

Despite vocal arguments against the usage of statistical significance, much of the scientific world is far from renouncing it. Many argue that banning it would remove accountability and allow scientists to trivialize negative results. Others contend that the concept is heavily entrenched in research where an alternative method would be inadequate.

In light of this controversy, the ASA released a statement to clarify the uses and misuses of significance testing. Hoping to stabilize the currently unclear relevance of statistical significance, the ASA suggested complete reporting and comprehensive analysis of the data to achieve a line of scientific reasoning. They also warned of random variation, which can account for substantial discrepancies in significance testing results.

Statistical significance has its merits, but only when used correctly. It is time to amend the question of “Is this statistically significant?” as, after all, the concept was never meant to replace fact; it merely highlighted patterns to discern between chance and a factor of interest. The binary thinking encouraged by the use of statistical significance is in no way reflective of the real world. Instead, it is important to consider the complexities and imperfections that make science meaningful when we use statistical significance. While it grants a level of certainty, it is just as imperative to embrace the uncertainty that is inseparable from the scientific process. Though a significant portion of the scientific community has fought to abandon the p-value, the factual illusion of statistical significance continues to appeal to many, and for good reason: nothing can replace it.