"

Unit 6. Brief Introduction to Statistical Significance

Brief Introduction to Statistical Significance

Leyre Castro and J Toby Mordkoff

Summary. In this unit we will give you a brief introduction to null-hypothesis testing and the concept of statistical significance.

Prerequisite Units

Unit 1. Introduction to Statistics for Psychological Science

Unit 2. Managing Data

Unit 3. Descriptive Statistics for Psychological Research

Null-Hypothesis Testing and Probability Value

Null-hypothesis testing belongs to the area of inferential statistics (not part of this version of Data Analysis in the Psychological Sciences), but you need to know some basic notions to be able to read scientific articles and to advance in your study of statistical analysis of psychological research in the following units.

When you do research, you study samples that are selected from a population.  The data collected from samples are used to make inferences about that population.  Thus, you need to have some way of deciding how meaningful the sample data are.

One common tool to help make that decision is testing for statistical significance, technically named null-hypothesis testing.  You are testing your hypothesis against the null hypothesis, that states that there are no differences between groups, or no association between your variables of interest, in the population.  The output of this statistical analysis includes what is called the p (probability) value.  So, in research papers, you may find statements like the these:

  • Participants who exercised remembered significantly more words than those who did not, t(48) = 5.63, p < .001.
  • Multi-tasking activity was significantly correlated with sensation seeking as measured by the Sensation Seeking Inventory, r(275)= .45, p =.01.
  • Levels of social support were negatively associated with levels of depression at 12-month follow-up, r(128)=−.32, p=.003.

The p value tells you the probability of finding a result (a difference between groups or a correlation) in your sample, when that difference or that correlation DOES NOT exist in the population.

To be correct (that is, to correctly infer that a result in your sample can be assumed in the population), that probability has to be low. How low?  In social sciences, the typical cut-off value is .05; that is, 5 out of 100.  Or, as it is typically written, p < .05.  So, less than .05 is the probability of finding a difference or a correlation in your sample when that difference or correlation does not exist in the population (so, you are finding it by chance, because peculiar circumstances related to your study or to your sample).  In other words, if your p value is less than .05, you would expect that less than 5 out of 100 times that you were to replicate your study (with different samples) you would find a difference or a correlation just by chance and not because it actually exists in the population.  When p < .05, you will say that your result is statistically significant.

Other Tools to Evaluate your Research Results

Statistics in psychological research, the same as in any other scientific area, are subject to criticism and reevaluation.  Some practices are well established, but they may have some flaws, and it may be desirable to move forward and find better a way.  Still, those better ways need to be explored, widely adopted, and become part of the well-established set of tools to do psychological research.

In the last years, a debate has developed as to what is the best way to analyze and present the results of psychological research.  \Some people have criticized null-hypothesis testing because it encourages dichotomous thinking.  That is, an effect is statistically significant or not.  But this may not be the best way to approach our results.  Some results may be very close to the cut-off value of .05.  The p value may be = .04 and then we say that the result was statistically significant, but it may be = .06 and then we say that the result was not statistically significant.  In the first case, we conclude that an effect exists, whereas in the second case we conclude that it does not.  But is this fair?  A minimal difference can result in concluding two opposite things.  This, among many other issues, is one of the reasons why some researchers favor to include other techniques (e.g., confidence intervals and effect sizes) to evaluate research results.

One of those techniques, confidence intervals, will be introduced in Unit 7.  The confidence interval provides the range of likely values for a population parameter such as the population mean or the correlation between two variables in the population, calculated from our sample data.  A confidence interval with a 95 percent confidence level (95% CI) has a 95 percent chance of capturing the population mean.  We may find that the mean grade in our sample of students is 86.46, and that the 95% CI ranges from 83.40 to 89.53.  We will report it like this:

  • The grade in our sample of students was approximately a B letter grade, M = 86.46, 95% CI [83.40, 89.53].

In this case, we will expect that the true mean in the population will be no lower than 83.40, and no higher than 89.53.  So, we can conclude that the mean grade of our sample is a quite accurate estimation of the mean grade of our population.

How narrow or wide the confidence interval is will allow us to assess how accurate we are in our estimation.  If the 95% CI for our mean of 86.46 were from 68.72 to 96.45, then the true mean grade in the population could be as low as a D, or as high as an A.  Thus, we cannot very well estimate our population’s mean grade.  In general, a narrower confidence interval will allow us for a more accurate estimation of the correlation in the population.

Conclusions

Statistical analyses include ways to evaluate how reliable or meaningful they are.  Statistical software (like R, SPSS, jamovi, etc.) will give you a p value when you compute correlations (Unit 7 and 8) and linear regression (Unit 9 and 10) analyses, so you need to be able to interpret those p values.  Typically, you will also obtain confidence intervals for the diverse estimations that are calculated.  Keep in mind that, whereas the p value leads you to conclude that, for example, a correlation is statistically significant or not, a confidence interval for the same correlation value will give you a possible range of values that are more or less likely to be true in the population.

 

definition