Inferential Statistics: Basics of Significance Testing

10/21/2012

I mentioned Inferential Statistics in earlier posts, but let's recap briefly: it is a set of rules which allow us to imply that results obtained from a sample are true for the population.
Now, imagine that we got a correlation of 0.4 after testing the sample of 30 people. How do we know that such correlation occurs in the population as well? And, how do we know if the size of our sample was sufficiently big to be representative?
In this post, I will explain what is meant by significance testing. In practice, it does not involve any calculation; in fact, there is a ready-made table of significance which simply tells you whether your result is significant enough or not depending on your sample size and correlation coefficient. However, I think it is important to understand the logic behind this table - and this is exactly what I will attempt to explain in this post. I will give the table in the end as well, so if you are not really interested in theoretical considerations at the moment you can safely go straight there.

Logic behind significance testing

Lets imagine a population of 60 pairs of scores, with an overall correlation of 0.00. However, what if we randomly draw numerous samples each consisting of, say, 10 pairs of scores from this population? Obviously, the correlation coefficients of these individual samples will vary a lot. Most of them will lean towards zero (which is the population correlation) - there will be a lot of numbers like 0.07, -0.2 etc. However, some of these coefficients will differ significantly from 0.00; there well could be couple of 0.7 or -0.58 coefficients. If scattered on a histogram, we will see that these coefficients will form more or less a normal distribution with a mean correlation of a zero, with the majority of scores near that zero and couple of extreme cases - in the ends of the histogram. This shows us that random sample can have a correlation coefficient which departs significantly from the population correlation coefficient.

Almost anything is possible in a sample - although only certain things are likely. Thus, we try to stipulate which correlations are LIKELY correlations in samples of a given size and which are UNLIKELY (if the population correlation equals zero). So, if the population correlation does equal zero, then the correlations in the middle 95% of the sample distributions are likely ones. Consequently, correlations which fall in the extreme 5% of the sample distribution (2.5% in each direction) are unlikely.

So, when our sample correlation coefficient falls into these 5% of the sample distribution from a population where the correlation is zero, we deem it to be statistically significant. In other words, we say that it is unlikely that we obtain these results if the population correlation was zero - therefore, it is probably NOT a zero which gives us a right to reject a null hypothesis and accept an alternative hypothesis (see next section).

The Null Hypothesis

In almost any experiment, the starting point is to assume that the population correlation coefficient is zero, in other words, that in population there is no relationship between the variables that we observe. This is what is called a Null Hypothesis (Ho).
An alternative hypothesis suggests that there IS a relationship between the variables; to be able to accept it we need to reject the Null Hypothesis first.
This is where significance testing comes into play. We assume that the null hypothesis is true; then we take our correlation coefficient and check whether its value was likely to occur in a sample from such a population where the null hypothesis is true (i. e., where the correlation is zero). If it falls in the extreme 5% of the sample distributions then it is unlikely - and so, we can reject the null hypothesis.

So how do we check for it?
We simply consult the Significance Tables according to the size of our sample. Note that there are two different tables for Pearson Correlation Coefficient and Spearman's rho. These tables will be given in any statistics and psychology textbooks - have fun!

0 Comments

Author

I am a 21 years old Psychology undergraduate in Edinburgh University. The idea behind this site is to provide some help to fellow students, to make studying psychology (including stats...) as much fun as possible and motivate me not to skip those 9am lectures!